Unveiling COVID-19’s Impact: From Pandemic Reports to 3D Visualizations

Abhay Pokhriyal

Cover Image for Unveiling COVID-19’s Impact: From Pandemic Reports to 3D Visualizations

Abhay Pokhriyal

September 10, 2023

COVID-19, a pandemic that has taken the lives of millions, has profoundly shaped the global health industry and economy. The outbreak has disrupted healthcare organizations, caused an economic recession, and put many people off work. However, COVID-19 has a positive impact on some people as it has led people to be more hygiene-aware, and for some, quit or control the habit of smoking.

The power of machine learning has led data scientists to find trends and correlations in the COVID-19 data and evaluate them to help make public decisions. Publicly available data sourced by the COVID-19 Data Hub can be found here. In the remainder of this journal, I will investigate relationships in the data by manipulating and graphing the data. From here on out, I will use a Python notebook and powerful machine learning libraries such as Pandas for creating DataFrames, Seaborn for generating insightful graphs, and eventually Plotly for graphing in 3 dimensions.

Firstly, we start out by downloading the COVID-19 dataset as a .csv file. Next, we import pandas and seaborn. After importing we must convert the .csv file into a pandas DataFrame object:

covid = pandas.read_csv(".CSV FILE PATH HERE")

To display the first ten rows of the DataFrame object, we use print(covid.head(10)) or just covid.head(10) from the Python shell or notebook. There are numerous ways to display the data such as covid.describe(). We can view the average amount of cases from each of US’s countries, and California, Arizona, and District of Columbia seem to have the highest mean number of cases, all having over 45 thousand positive cases on average. Most countries have less than 10 thousand cases on average. A tip for graphing such a huge dataset would be to sample only some of the dataset; 0.2% of the dataset is good for graphing most plots, but some graphs, sampling only the first 1000 rows would be good enough without consuming much time.

Above shown is a KDE plot, with the sampled rows of the COVID-19 reports. We can see that only some reports hit a casualty county of 30,000, but the casualty counts of the reports have a mean of 800 as found from taking the mean of the death count column.

In the dataset, there is a column giving the report date, but in order to translate that to a number to make it plot-able on a graph, I decided to add a new column on the dataset that would describe the number of days after the COVID-19 was declared a pandemic. Values ranged from approximately -50 days to 800 days.

Utilizing Plotly, we can create an intractable graph that display a 3D scatterplot. The x-axis (shown right) represents the number positive cases, y-axis (shown left) represents the number of new positive cases, and z-axis (shown as the vertical axis) represents the casualty count.

We can get such a plot by using the following code:

import plotly.express as px

covid['markersize'] = 1
fig = px.scatter_3d(data_frame = covid.head(1_000),

x = 'PEOPLE_POSITIVE_CASES_COUNT',
y = 'PEOPLE_POSITIVE_NEW_CASES_COUNT',
z = 'PEOPLE_DEATH_COUNT',
size = 'markersize',
size_max = 15,
color = 'DAYS_SINCE_PANDEMIC_START',
opacity = 0.8,
title = 'Covid-19 Visualizations')

# Set 3D graph bounds
fig.update_layout(scene=dict(
  xaxis = dict(range=[0, 2_000]),
  yaxis = dict(range=[0, 60]),
  zaxis = dict(range=[0, 60]),
  aspectmode = 'cube'
))

fig.show()

We can modify the size of the markers (or dots) by editing the size_max argument in the px.scatter_3d function. The data frame column that you want to color-code can be changed by editing the color argument to a different column name. The bounds of the axes can be edited in the fig.update_layout function.

A trend we see in the dataset as shown in the 3D scatterplot is as the duration of the pandemic increases, the number of casualties and positive cases and positive new cases go up. We can see that the number of cases isn’t at its peak when the days since the start of the pandemic are the highest, but rather when the days since the beginning of the pandemic are about 500 to 600 days. Another insight we see is that the number of casualties to the number of cases rate went up in the first few months of the pandemic, likely due to incomplete knowledge of the treatment of coronavirus.

Data investigation is paramount for driving further research in the realm of statistics; Python offers many powerful libraries for analyzing trends within data. In the context of the pandemic, the ability to predict trends played a pivotal role in mitigating its impact. Data science is important in all research fields, including chemistry, medical sciences, architecture, and economics. Given its increasing relevance and demand today, data science stands as a compelling subject of study.

SMLC Blog.

Unveiling COVID-19’s Impact: From Pandemic Reports to 3D Visualizations

More Stories

Bias

Chemical Engineering with Machine Learning