COVID19 WAVEWATCHER STUDY

May 09, 2020 — Wes Monceaux

Like most people, the COVID19 pandemic is unlike anything I have ever experienced. I wanted to be able to draw my own conclusions about the effectiveness of the mitigations we (in the United States) are taking. I decided to take available data and do some basic linear regressions on them.


BACKGROUND

In an effort to get businesses open, I am wondering if relaxation of the stay-at-home orders around the country will cause a second wave of infections. With no vaccine nor existing herd immunity, it would seem the cases would return to pre-stay-at-home levels if people return to socializing as they did before. With many people still following some level of social distancing, I would expect the number of cases to increase, but not necessarily to the levels we saw earlier.

So why not try and answer this by monitoring the daily reported data on the COVID19 cases around the country. We should be able to see if the cases are increasing, decreasing, or staying about the same.

I will go ahead and set your expectations for how rigorous the analysis I will be doing. "Statistics for Dummies" meets duct tape is about the right vibe.

LET'S GET SOME DATA

When data sets started becoming available, I ran across two that I have been playing with in spreadsheets:

They are both in GitHub and are updated daily. I decided to use two data sets as a sanity check. This does bring up the "a man with two clocks never knows what time it is" issue, but it should let me know if there was a problem with data collection. So far, the data sets more or less match up. So that is encouraging.

METHODOLOGY

The thinking goes something like this:

  1. Choose a baseline two-week period of cases to use before stay-at-home orders were relaxed
  2. Perform linear regressions on the cases and deaths for for each state
  3. Plot the line along with reported data points from that two-week baseline period forward
  4. Inspect each chart to see if any trends stand out visually
  5. After two to three weeks following the baseline period, evaluate each state and compare to their mitigation efforts

Is this the best approach? It certainly is not rigorous, but it should be possible to draw some conclusions (incorrect or otherwise).

The baseline two-week period chosen is 2020-04-17 through 2020-04-30. I wanted a time period that showed "as good as it gets" when following a stay-at-home guidance. Since the Federal stay-at-home recommendation was lifted at the end of April, this seemed like a good choice. It makes sense, however, that the rates being evaluated for that period actually reflect infections happening for the two or three weeks prior to April 17th. But stay-at-home orders and other precautions had been happening earlier in April, so it seems representative enough for my purposes.

The data used is the cumulative total of reported cases (and deaths) up to, and including, that day for each state. After being transformed, the input data for each state looks like this:

date,state,cases,deaths
2020-04-28,Louisiana,27286,1758
2020-04-29,Louisiana,27660,1802
2020-04-30,Louisiana,28001,1862
2020-05-01,Louisiana,28711,1927
2020-05-02,Louisiana,29140,1950
2020-05-03,Louisiana,29340,1969
2020-05-04,Louisiana,29673,1991
2020-05-05,Louisiana,29996,2042
2020-05-06,Louisiana,30399,2094
2020-05-07,Louisiana,30652,2135

This data for the dates desired are processed for each state as well as for the United States for each data set (JHU and NYTimes). The results are a PNG chart and a report per data set. In this case, we get two PNGs and two reports. Here are examples of the charts for Louisiana:

Louisiana JHU Chart Louisiana NYTimes Chart

The reports give some information about the linear regression and the correlation coefficient. It also provides some CSV data to be used in the future, if needed. The dates extend out into the future two weeks with the "predictions" based on the linear regression. An example of one of the reports looks like this:

Louisiana
Regression Info:
  Cases
    m = 373.96483516483516
    b = -8846.032967032963
    r = 0.9956088290733547
  Deaths
    m = 55.30989010989012
    b = -3547.0219780219795
    r = 0.9882421218338948
--------------------------------
date,predicted_cases,predicted_deaths,reported_cases,reported_deaths,diff_cases,diff_deaths,used_in_regression
2020-04-17,23314,1209,23118,1213,-196,4,True
2020-04-18,23688,1264,23580,1267,-108,3,True
2020-04-19,24062,1320,23928,1296,-134,-24,True
2020-04-20,24436,1375,24523,1328,87,-47,True
2020-04-21,24810,1430,24854,1405,44,-25,True
2020-04-22,25184,1486,25258,1473,74,-13,True
2020-04-23,25558,1541,25739,1599,181,58,True
2020-04-24,25932,1596,26140,1660,208,64,True
2020-04-25,26306,1652,26512,1707,206,55,True
2020-04-26,26680,1707,26773,1729,93,22,True
2020-04-27,27054,1762,27068,1740,14,-22,True
2020-04-28,27428,1818,27286,1801,-142,-17,True
2020-04-29,27802,1873,27660,1845,-142,-28,True
2020-04-30,28176,1928,28001,1905,-175,-23,True
2020-05-01,28550,1983,28711,1970,161,-13,False
2020-05-02,28924,2039,29140,1993,216,-46,False
2020-05-03,29298,2094,29340,2012,42,-82,False
2020-05-04,29672,2149,29673,2064,1,-85,False
2020-05-05,30046,2205,29996,2115,-50,-90,False
2020-05-06,30420,2260,30399,2167,-21,-93,False
2020-05-07,30794,2315,30652,2208,-142,-107,False
2020-05-08,31168,2371,-1,-1,-31169,-2372,False
2020-05-09,31542,2426,-1,-1,-31543,-2427,False
2020-05-10,31916,2481,-1,-1,-31917,-2482,False
2020-05-11,32290,2537,-1,-1,-32291,-2538,False
2020-05-12,32664,2592,-1,-1,-32665,-2593,False
2020-05-13,33038,2647,-1,-1,-33039,-2648,False
2020-05-14,33411,2702,-1,-1,-33412,-2703,False
2020-05-15,33785,2758,-1,-1,-33786,-2759,False
2020-05-16,34159,2813,-1,-1,-34160,-2814,False
2020-05-17,34533,2868,-1,-1,-34534,-2869,False
2020-05-18,34907,2924,-1,-1,-34908,-2925,False
2020-05-19,35281,2979,-1,-1,-35282,-2980,False
2020-05-20,35655,3034,-1,-1,-35656,-3035,False
2020-05-21,36029,3090,-1,-1,-36030,-3091,False

In the next post, I will look at the results from 2020-05-07.

Tags: covid19, covidwavewatcher

Comments? Tweet