top of page
Search

# Covid-19 Data Analysis in INDIA

COVID-19, a novel coronavirus, is currently a major worldwide threat. It has infected more than five million people globally leading to thousands of deaths. In such grave circumstances, it is very important to predict the future infected cases to support the prevention of the disease and aid in the healthcare service preparation. Following that notion, we have developed a model and then employed it for time-series analysis of COVID-19 cases in India. The study indicates an ascending trend for the cases in the coming days.

In this blog, we'll see different types of analysis of the Corona Virus in India.

Dependencies

• Pandas

• Numpy

• Matplotlib

• Seaborn

• Plotly

• Folium

1. Importing The Libraries

```importÂ pandasÂ asÂ pd
importÂ matplotlib.pyplotÂ asÂ plt
%matplotlibÂ inline
importÂ seabornÂ asÂ sns
importÂ plotly.expressÂ asÂ px
importÂ plotly.graph_objectsÂ asÂ go
importÂ foliumÂ
fromÂ foliumÂ importÂ plugins
plt.rcParams['figure.figsize']Â =Â 10,Â 12
importÂ warnings
warnings.filterwarnings('ignore')```

In this Blog, we need to import all the libraries with all the dependencies.

The next step is to import the CSV files.

`df=Â pd.read_csv('covid_19_india.csv')`

We've read the CSV file in the above lines of code.

`df.drop(['ConfirmedIndianNational','ConfirmedForeignNational'],axis=1,inplace=True)`

We've dropped the Unnecessary columns in the above lines of code.

## In this step, we will analyze some visualizations.

In this above visualization, we can see 2 bar charts of total and cured legend. The X-axis represents the number of cases whereas y-axis represents the state. The total defines the number of total cases reported in that state holds red colour, whereas the cured defines the total number of people cured of coronavirus in that state holds green colour.

We can see that Maharastra holds the most number of total confirmed cases as well as the highest number of cured people whereas some Union-territories like Dadra Nagar Haveli holds the least number of cases.

## Let's Check/Visualize wrt Age.

`age_details=pd.read_csv('AgeGroupDetails.csv')`

Here we have imported the dataset AgegroupDetails in the data frame age_details.

In the above figure, we can see the group-wise age distributions in a pie-chart. From the above figure we've concluded that the age group between 20-29 has reported the maximum number of cases, so we can say mostly the youths and after that, the age range between 30-39 has reported the second maximum number of cases.

## Gender-wise

`individual_details=pd.read_csv('IndividualDetails.csv')`

we can see the gender difference in the number of cases, here we can see the % of males is more than females, the reason might be the number of males is more than females or the females are following the lockdown rules more precisely than females.

## ICMR Testing Details

`ICMR_labs=pd.read_csv('ICMRTestingLabs.csv')`

In these above lines of code, we have imported the dataset of ICMRtestinglabs.

```valuesÂ =Â list(ICMR_labs['state'].value_counts())
namesÂ =Â list(ICMR_labs['state'].value_counts().index)

plt.figure(figsize=(15,10))
sns.set_color_codes("pastel")
plt.title('ICMRÂ TestingÂ CentersÂ inÂ eachÂ State',Â fontsizeÂ =Â 20)
sns.barplot(x=Â values,Â y=Â names,colorÂ =Â '#9370db');```

IN this above fig we can see that Maharashtra has the most number of ICMR testing labs and it is pretty obvious also as that state reported the maximum number of cases.

## Let's check for all the states

``` all_stateÂ =Â list(df['State/UnionTerritory'].unique())
all_state.remove('Unassigned')
#all_state.remove('Nagaland#')
#all_state.remove('Nagaland')
latestÂ =Â df[df['Date']Â >Â '10-08-20']
state_casesÂ =Â latest.groupby('State/UnionTerritory')['Confirmed','Deaths','Cured'].max().reset_index()
latest['Active']Â =Â latest['Confirmed']Â -Â (latest['Deaths']-Â latest['Cured'])
state_casesÂ =Â state_cases.sort_values('Confirmed',Â ascending=Â False).fillna(0)
statesÂ =list(state_cases['State/UnionTerritory'][0:15])

states_confirmedÂ =Â {}
states_deathsÂ =Â {}
states_recoveredÂ =Â {}
states_activeÂ =Â {}
states_datesÂ =Â {}

forÂ stateÂ inÂ states:
Â Â Â Â dfÂ =Â latest[latest['State/UnionTerritory']Â ==Â state].reset_index()
Â Â Â Â kÂ =Â []
Â Â Â Â lÂ =Â []
Â Â Â Â mÂ =Â []
Â Â Â Â nÂ =Â []
forÂ iÂ in range(1,len(df)):
Â Â Â Â Â Â Â Â k.append(df['Confirmed'][i]-df['Confirmed'][i-1])
Â Â Â Â Â Â Â Â l.append(df['Deaths'][i]-df['Deaths'][i-1])
Â Â Â Â Â Â Â Â m.append(df['Cured'][i]-df['Cured'][i-1])
Â Â Â Â Â Â Â Â n.append(df['Active'][i]-df['Active'][i-1])
Â Â Â Â states_confirmed[state]Â =Â k
Â Â Â Â states_deaths[state]Â =Â l
Â Â Â Â states_recovered[state]Â =Â m
Â Â Â Â states_active[state]Â =Â n
Â Â Â Â dateÂ =Â list(df['Date'])
Â Â Â Â states_dates[state]Â =Â date[1:]```
```colors_listÂ =Â ['cyan','teal']
statesÂ =Â individual_details['detected_state'].unique()
if len(states)%2==0:
Â Â Â Â n_rowsÂ =Â int(len(states)/2)
else:
Â Â Â Â n_rowsÂ =Â int((len(states)+1)/3)Â Â Â Â
plt.figure(figsize=(14,60))

forÂ idx,stateÂ in enumerate(states):Â Â Â Â
Â Â Â Â plt.subplot(n_rows,3,idx+1)
Â Â Â Â y_orderÂ =Â individual_details[individual_details['detected_state']==state]['detected_district'].value_counts().index
try:
Â Â Â Â Â Â Â Â gÂ =Â sns.countplot(data=individual_details[individual_details['detected_state']==state],y='detected_district',orient='v',color=colors_list[idx%2],order=y_order)
Â Â Â Â Â Â Â Â plt.xlabel('NumberÂ ofÂ Cases')
Â Â Â Â Â Â Â Â plt.ylabel('')
Â Â Â Â Â Â Â Â plt.title(state)
Â Â Â Â Â Â Â Â plt.ylim(14,-1)
except:
pass
plt.tight_layout()
plt.show()

```

The above figure shows the cases of all the states with their respective districts.

Analysis in a Nutshell

Coderarts
.mp4