COVID-19 Visualization: Part 1

Akshat Dubey
Nerd For Tech
Published in
5 min readSep 26, 2020

--

It’s been about 8 months since the first case of COVID-19 was reported in the majority of the countries. This is a very unfortunate moment for all of us and we should support our friends, relatives, and family members in this situation. In this article, we will perform a data analysis of COVID-19 live data which is being updated daily.

Prerequisites: Familiarity with Python at an intermediate level, a basic knowledge of pandas, NumPy, plotly, seaborn, matplotlib and jupyter notebook.

Dataset: The dataset which will be using here is provided by the John Hopkins University and I don’t claim any rights on this dataset. This dataset contains the details of the majority of the countries including India and it is being updated daily.

Let’s start our work:

  1. Importing the required libraries:
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt

2. Loading the dataset:

dataset_url='https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv'
df=pd.read_csv(dataset_url)

3. Having a look at the data:

df.head()
Here is how the first five rows of our dataset look like
df.tail()
This is how the last five rows of our data look like.

Note: The df.tail() shows the last rows of our dataset. The last rows will be showing the data on which date the dataset is being loaded ( remember the dataset is being updated daily). For eg: I loaded this dataset on 2020–09–23, hence the last rows contain data for the date 2020–09–23.

4: Performing some pre-processing:

We will be starting the analysis from the date when every country in this dataset has already reported its first case.

df=df[df['Confirmed']>0]
df.head()
A quick view of the data on which we will be working

5: Looking at the data from some countries:

df[df.Country=='Italy']
Data from Italy

Italy reported it’s the first case on 2020–01–31 and since then 302,537 people have tested positive for COVID-19, about 220,665 have recovered. About 35,758 have died till now.

6: Plotting the global spread of COVID-19 on the world map:

fig=px.choropleth(df,locations='Country',locationmode='country names',color='Confirmed',animation_frame='Date')fig.update_layout(title_text="Global spread of COVID-19")fig.show()
An interactive map which shows the global spread of COVID-19

As, we can see that about 80,000 cases were reported in China by February 2020. We can also observe the reported cases in U.S.A, India, Brazil, and India too. Further evidence from the above map we can clearly conclude that India is at the second number in the total number of cases with the U.S.A being at the first position. To date, U.S.A has reported more than 6 million cases, and India has reported around 5 million cases.

7. Plotting the number of deaths due to COVID-19 on the world map:

fig=px.choropleth(df,locations="Country",locationmode='country names',color='Deaths', animation_frame='Date')
fig.update_layout(title_text='Global Deaths because of COVID-19')
fig.show()
Global Deaths because of COVID-19

We can clearly see that indeed the number of reported cases in India is the second-highest but still the number of deaths is pretty low keeping in mind the deaths per million, the population of India, and a number of reported cases.

8. Calculating maximum infection rate of all the countries:

countries=list(df["Country"].unique())
max_infection_rates=[]
for c in countries:
max_infected=df[df.Country==c].Confirmed.diff().max()
max_infection_rates.append(max_infected)
df_MIR=pd.DataFrame()
df_MIR["Country"]=countries
df_MIR['Max Infection Rate']=max_infection_rates
df_MIR.head()
Max infection rate for different countries

9. Plotting the maximum infection rate of all the countries:

px.bar(df_MIR,x='Country',y='Max Infection Rate',color='Country',title='Global Max infection Rate',log_y=True)
A bar plot showing the maximum infection rate

10. Now, let us check the impact of lockdown in Italy

On 9 March 2020, the government of Italy under Prime Minister Giuseppe Conte imposed a national quarantine, restricting the movement of the population except for necessity, work, and health circumstances, in response to the growing pandemic of COVID-19 in the country. (source)

italy_lockdown_start_date = '2020-03-09'
italy_lockdown_a_month_later = '2020-04-09'

Getting data related to Italy:

df_italy=df[df.Country=='Italy']

Calculating the infection rate:

df_italy["Infection Rate"]=df_italy["Confirmed"].diff()

Now, we will visualize the graph:

fig=px.line(df_italy,x='Date',y='Infection Rate',title="Before and After Lockdown")fig.add_shape(dict( type ='line',x0=italy_lockdown_start_date,y0=0,
x1=italy_lockdown_start_date,y1=df_italy["Infection Rate"].max(),
line=dict(color='red',width=2)))
fig.show()fig.add_annotation(dict(x=italy_lockdown_start_date,y=df_italy["Infection Rate"].max(),text='starting date of the lockdown'))
A line plot showing the infection rate with respect to the date.

As we can clearly see that after the lockdown was imposed in Italy the infection rate gradually started decreasing after some days. So, we can say that the lockdown has been successful in Italy.

11. Now, we will observe the infection rate and death rate in Italy.

fig=px.line(df_italy,x='Date',y=["Infection Rate","Deaths Rate"])
fig.show()
Infection and death rate with respect date.

Also, we can clearly observe that after the lockdown in Italy the number of deaths started decreasing too.

This article was all about the basic visualization of COVID-19 data and how we can easily draw quick basic conclusions by just looking at the plot.

COVID-19 Analysis: Part 2 will be coming soon. In part 2 we will check and analyze the performance of India during the COVID-19 pandemic.

Stay Tuned and Keep Analyzing!

--

--

Akshat Dubey
Nerd For Tech

I love exploring and discovering. Also, I am a keen observer, who loves writing about Machine Learning, Deep Learning, Computer Vision, NLP. My focus is XAI.