Behind the data: Breaking down the statistical models of COVID-19

Atlanta Police Sgt. Dominique Simmons Friday, April 24, 2020, in downtown Atlanta. Coronavirus cases in Georgia continued to mount, as the state began easing restrictions on some businesses. JOHN SPINK/JSPINK@AJC.COM

Atlanta Police Sgt. Dominique Simmons Friday, April 24, 2020, in downtown Atlanta. Coronavirus cases in Georgia continued to mount, as the state began easing restrictions on some businesses. JOHN SPINK/JSPINK@AJC.COM

The public’s thirst for information about the coronavirus has sharply elevated the profiles of academic and government research institutions that analyze data about the virus. The COVID-19 tracker developed by Johns Hopkins University became a near-constant image on cable news, showing new cases glowing red in hotspots around the world.

But there are other sources that go beyond recording new cases, deaths and mapping them around the world. These sources take data about the virus and forecast the future. But they each do it in different ways and it’s important to understand the differences.

» COMPLETE COVERAGE: Coronavirus in Georgia

» RELATED: 'Confused and scared': Georgians frustrated over shifting virus data

The Centers for Disease Control and Prevention highlights all three of these models to show the range in projections about the spread of the virus.

Infectious disease models

Infectious disease models simulate the spread of disease, and are the standard in mathematical epidemiology.

At their simplest, they assume that susceptible people occasionally contract a disease after interacting with an infected person. Infected people then either recover or die.

These models provide a theory for the spread of disease, letting epidemiologists predict what will happen in many months or years, or what might happen if stringent social distancing measures were suddenly removed. But the often strong assumptions they make puts them at greater risk of being wrong in changing environments where those assumptions may no longer hold.

Columbia University has one of the most robust infectious disease models. It forecasts the spread of the virus under different scenarios of social distancing, and projects how the rate of infection under those scenarios will impact elements of a state's health care system, such as the availability of intensive care beds. The model also makes assumptions about how well the hospital is able to absorb a surge in sick patients.

In Georgia, the university’s model shows that counties around Albany, already devastated by the virus, will remain the state’s most serious infection zone, even under the strongest assumptions about social distancing and the health care system’s capacity to treat those cases.

Statistical growth models

Statistical growth models make no assumptions about how the virus is spreading; they forecast the rate at which it’s doing so. These kinds of models make fewer, less stringent assumptions than infectious disease models, so they’re less liable to make mistakes because their assumptions are wrong. They are also more likely to be sensitive to changing conditions. But because they are more sensitive they can pick up changes that may have actually nothing to do with the spread of the virus.

The Los Alamos National Laboratory, a top-tier government research lab, has developed a statistical model that has, so far, proven accurate at forecasting cases and deaths over various time spans. In Georgia, the lab's model has been particularly accurate. It shows that Georgia's death toll, currently near 900, will be about 2,300 deaths on June 3. It finds a better than 50-50 chance that the state has already passed the peak surge in confirmed cases.

Curve-fitting models

Curve-fitting models make no assumptions about how the disease spreads, and just fit a curve to the number of deaths over time, accounting for social distancing. This model is very sensitive to changing conditions because it doesn’t do anything but look at the data.

That can be good when conditions are changing rapidly within a particular phase of the pandemic. But it also means that temporary jumps, dips and plateaus can prematurely direct policy. Further, these models, unlike the models above, are unable to predict stages of the epidemic beyond what they’ve seen.

The Institute for Health Metrics and Evaluation at the University of Washington is perhaps the most widely quoted of this type of model. The IHME model is popular, but it's been relatively inaccurate, even as it's made changes to its methodology.

In Georgia, IHME forecasts that deaths from coronavirus will drop to zero in early June and that the state could safely ease social distancing measures June 22.