Predicting Confirmed Coronavirus Cases in the USA

DZone 's Guide to

Predicting Confirmed Coronavirus Cases in the USA

Let's look at some machine learning with Facebook Prophet to predict how many coronavirus cases there will be in the United States in 5 days.

· AI Zone ·
Free Resource

“Everyone who is affected, may God help and guide the victims and their families through this tough time. Please follow the guidelines in WHO and CDC web site to stay safe.”

Exponential Growth

The virus outbreak began in Wuhan, China in late December. Since then it has sickened more than 1.5 million people worldwide. The countries that took the biggest brunt are China, the USA, Italy, Spain, France, Iran, and the UK. As per JHU, at least 328,661 people have recovered globally.

As of 4/8/20, the coronavirus outbreak in the United States has grown to at least 429,052 cases. New York, Washington state and California are the primary outbreak clusters at this point of time. 

The primary goal for this article is to predict the number of cases in the USA for the next 5 days. We will also apply a correction to take into account the impact of social distancing (external factors). 

Just to set expectations right, I am not an epidemiologist, and the opinions of this article should not be interpreted as professional advice. This article is primarily to illustrate an example of using machine learning algorithms for predicting time series exponential growth.

Prediction Approaches to Take

The data we are talking about here is time series data. Time series forecasts are quite different from other supervised regression problems. Even though forecasting can be considered a subset of supervised regression problems, some specific tools are necessary due to the temporal nature of observations. Time series forecasting has many possible applications, such as stock price forecasting, weather forecasting, business planning, resource allocation, and many others

We can model this as an exponential growth model. To do that, one can apply a log to the values and use a linear regression model to extend that future values. There are many approaches, such as exponential smoothing, Arima, Sarima, LSTM, etc. 

The approach I have taken here is to treat this as time series data and use the popular Facebook Prophet algorithm for the daily data. One can learn more about Facebook Prophet here. The Facebook Prophet algorithm has a lot more control over how one handles trends and seasonality. It has good controls for modeling growth as linear or logistic, handling points where there are abrupt changes in the trend.

Data Trend Using Raw Data

At first glance at the data, it’s obvious the trend is exponential. I tried to use this data directly for the predictions, but the accuracy was not that great. So I decided to apply a log transformation.

Data Trend After Log Transformation

This is the chart after I applied the log transformation, which is easier to manage and use with the Prophet algorithm.

Prediction Fit (Log)

This transformed log data was used as the input for the Facebook Prophet algorithm to predict the cases for the next 5 days. Decent fit as shown. Note that the graph is based on log data so one has to convert that in to the number ofcases by applying exp to the log data.

Prediction for the Next 5 Days

The prediction for the next 5 days is shown in yellow. This is a steep curve if the virus intensity continues to grow. However, there are a lot of external factors that impact the spread. This includes aggressive social distancing, more countries requesting their residents to stay at home, and educating people about social distancing to get them to take social distancing seriously. We are not doing contact tracing in the USA yet, which has proven to be an invaluable tool in other countries.

The factors discussed above help reduce the spread of the virus to a solid extent. To account for these external factors, I applied a corrective factor in the prediction. I applied a 5% (red) and a 10% correction (grey) to reduce the intensity of the growth. I did that assuming that the social distance measures are helping to reduce the intensity of the spread and also was hoping the intensity will go down in the coming days / weeks.

Raw Numbers of the Prediction

The raw data of the prediction is shown below along with the corrective factors applied based on social distancing and other measures starting to kick in.

A Few Takeaways

Few takeaways from the analysis

  1. If the current exponential trend continues, we will hit above 700k cases in 5 days
  2. If the social measures are starting to kick in, applying the corrective measures, cases would hit around 660k with a 10% correction and around 580k with a 20% correction. 
  3. If the daily increase (velocity) remains fairly the same, I would expect us to be around 550k in 5 days.

I will try to publish an update in 5 days. 

AI, big data, coronavirus, facebook prophet, machine learning, time series

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}