{{announcement.body}}
{{announcement.title}}

Is Georgia Ready to Open Up? A Data Science Perspective

DZone 's Guide to

Is Georgia Ready to Open Up? A Data Science Perspective

This walkthrough uses data science techniques and tools like Facebook Prophet and R to determine whether one US state should open up.

· Big Data Zone ·
Free Resource

This analytical article uses data science and predictive models to study the spread of virus in Georgia by reviewing various statistical measures and predict the growth for the next 10 days. We also compare the values to other states / country.

Please note that this article is primarily to illustrate an example of using machine learning algorithms for prediction with the limited data that is publicly available and the opinions of this article should not be interpreted as professional advice.

Exponential Growth

The virus outbreak began in Wuhan, China in late December. Since then it has sickened more than 3 million people worldwide. As of 4/27/20, the coronavirus outbreak in the United States has grown to at least 1 million cases. New York, New Jersey and Massachusetts are the primary outbreak clusters at this point of time. 



The primary goal for this article is to apply data science to review various growth / statistical metrics from a data science perspective and predict the number of cases for the next 10 days. 

Key Stats

Let’s look at the current status as of 04/27/2020.

The question I was looking to answer are

  1. What is the total number of tests, positive cases, hospitalization rate and deaths

  2. Extend of increase of those numbers by day

  3. Predict the cases, deaths and hospitalizations for the next 10 days



Confirmed cases analysis


Out of 127,169 tests done, 24,226 has been confirmed positive for Covid. This means, 19% of the test cases are confirmed positive. That is not a good number.

In comparison the USA is around 20%, South Korea, Australia & New Zealand are around 2%. Canada, Germany and Denmark are around 6 to 8 %. Italy is around 15%. So in that context, the 19% positive rate for Georgia is quite high. The good news is this number has been going down. So getting to 3 to 10% would be ideal (as low as possible).


Percentage deaths in comparison to confirmed cases is comparable to other states. This number is calculated using number of deaths / number of confirmed cases.

For example, New York, New Jersey & Massachusetts have 5 to 6% rate. In other countries, its 1 or 2 % of the confirmed cases.


Percentage of Testing Done

For the month of April, Georgia is testing on an average 4000 tests per day. 

With a population of 10.62 million, that comes to 38 per 100,0000 (=4000/(10.62*1000000)*100000)

Harvard Global health institute recommends 152 tests per 100,000 people.

In comparison, Massachusetts and New York are doing over 100 tests per 100,0000 per day.

To get to 100 tests per 100,000 people, Georgia has to do 10620 tests per day. To get to 152 tests per 100,000 people, Georgia has to do 16143 tests per day.  

Georgia has a long way to go to get to that number.

Confirmed Case Increase Stats in the Last 30 days

Number of confirmed cases values are calculated by comparing the current value to the values that were 1 or 2 or 3 or 4 or 5 or 10 or 15 or 20 or 25 or 30 days ago as well as on April 1st.

The key numbers to look for are how many days does it take to double, increase by 4 or 5 fold etc. Those numbers are highlighted in the caption in the chart.


Hospitalization Increase Stats in the Last 30 days

Hospitalization increase values are calculated by comparing the current value to the values that were 1 or 2 or 3 or 4 or 5 or 10 or 15 or 20 or 25 or 30 days ago as well as on April 1st.

The key numbers to look for are how many days does it take to double, increase by 4 or 5 fold etc. Those numbers are highlighted in the caption in the chart.


Deaths Increase Stats in the Last 30 days

Numbers quoted below are calculated by comparing the current value to the values that were 1 or 2 or 3 or 4 or 5 or 10 or 15 or 20 or 25 or 30 days ago as well as on April 1st.

The key numbers to look for are how many days does it take to double, increase by 4 or 5 fold etc. Those numbers are highlighted in the caption in the chart.


Predictions for the Next 10 Days

These predictions are done using Facebook Prophet algorithm using R code using Time Series data. Time series forecasts are quite different from other supervised regression problems. Even though forecasting can be considered a subset of supervised regression problems, some specific tools are necessary due to the temporal nature of observations. 

There are many approaches, such as exponential smoothing, Arima, Sarima, LSTM, etc.  The approach I have taken here is to treat this as time series data and used the popular Facebook Prophet algorithm for the daily data. One can learn more about Facebook Prophet here. The Facebook Prophet algorithm has a lot more control over how one handles trends and seasonality. It has good controls for modeling growth as linear or logistic, handling points where there are abrupt changes in the trend.


The trend has been going up especially the 14 day trend. Cases and hospitalizations seem to double in 17 days. Important to slower the rate of increase (flatten the curve) and possibly bring it down.

The confirmed case predictions using reproduction number is beyond the scope of this article. Please refer to this article for an example of that approach.

Summary 


The current metrics does suggest, Georgia has to take lot of precautions (social distancing) to keep the spread low & significantly increase the testing coverage. The cases, hospitalization and death rate has been going up especially the 14 day trend. Measures have to be taken to flatten the trend (curve).

Topics:
ai ,analytics ,coronavirus ,data science ,facebook prophet ,machine learning

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}