Predicting Future COVID-19 Cases in the US using Curve Fitting in SPSS: Part 1

Curve fitting fits a series of equations, or models, to a set of data, which can then be extrapolated in order to predict values outside of the range of the present data. Most typically, this is used for the forecasting of future data using present and past data.

The US is currently (as of November 10th, 2020) experiencing the third of wave of new COVID-19 cases, with the pandemic first reaching American soil around March 1st of this year. We have experienced two waves which have already peaked and subsided, corresponding with peak dates of April 9th (34,632 new cases) and July 17th (76,157 new cases). Presently, daily new cases in the United States has breached 130,000, which has never happened before. Here, we will examine how curve fitting can be used in order to predict future new cases of COVID-19 in the United States.

Dates, daily new cases, and moving average data were downloaded fromĀ https://covidtracking.com/data/charts/us-daily-positive in CSV format, and imported into SPSS. First, we can create a line graph to illustrate the trends in daily new cases:

Here, we see the first two waves, along with our current third wave. Both of the first two waves are characterized by a sharp increase in daily new cases, followed by a peak and a more gradual decline. The first peak of 34,632 new cases along with the second peak of 76,157 new cases produces a ratio of 2.199; if this is a pattern, we may predict the third peak to consist of 76,157 * 2.199 = 167,472 new cases. In addition, there were 99 days between the first and second peaks. Again, if we take this as a pattern, this would predict the third peak to present itself on October 24th, 2020. Based on these data, it appears that this third peak is either happening now, or will happen in the very near future, suggesting that this predicted date is not too far off.

To predict future cases, we will take the last 36 days of data, ending on November 10th, and run curve fitting on this trend. The moving average is used here to smooth out the erratic nature of the daily new cases. Using every available model, we get the following:

As shown, some models obviously fit these data poorly; this consists of the linear, inverse, power, S, and logarithmic models. After removing these plotted equations, we get the following (here, I have color-coded the lines to make it easier to read):

Just by looking at this figure, the cubic model appears to be superior to the others. However, we will numerically examine which of these six models best fit the data. Using the adjusted R-squared measure to determine this – which tells you the proportion of the variation in the dependent variable (the moving average of the total new cases) which is explained by the model – we get .988 for quadratic, .996 for cubic, .983 for compound, growth, and exponential, and .977 for logistic. This indicates that cubic is the best fitting model. Extrapolating this into the future:

Numerically, this equation predicts the following number of daily new cases (again, using the moving average), starting November 11th, 2020: 121,715, 127,285, 133,187, 139,432, and 146,032. Now, will this happen? It is difficult to tell, as curve fitting simply fits an equation to your data, and extrapolates that. We may have experienced our wave 3 peak today, or tomorrow, and these data may shortly plateau and decrease, instead of continuing to increase in this fashion. Regardless, it is obvious that this cubic model will not apply to these data forever; we will reach a peak, plateau, and decrease as we have twice before; furthermore, the number of new cases that the cubic model would predict quickly becomes astronomical to the point of being impossible. While curve fitting is an extremely useful and powerful tool, it must be kept in mind that it only extends a trend. Another post made in the near future will follow up on this one and will continue to explore how COVID-19 is impacting the United States.