Epidemiological models for COVID-19: Why so many models and why all the fuss? - Dr LaRee Tracy, Head of Biometrics, San Diego

Epidemiological models for COVID-19: Why so many models and why all the fuss?

Years ago, I spent several months painfully developing a mathematical prediction model to evaluate decreases in cervical cancer prevalence assuming introduction of papillomavirus vaccination in a developing country with different sexual mixing patterns1. The project required collaboration with virologists, social epidemiologists, and mathematicians as is often necessary when constructing epidemiological models reliant on biological, social, and statistical factors, among others. Along the way, I gained a strong appreciation for the complexity of these models, but also that every model depends upon assumptions laden with variability and uncertainty. Beginning as early as December 2019, reports surfaced about clinical cases in Wuhan, China of a new disease, later termed COVID-19, due to infection with a novel coronavirus, SARS-CoV-2. On March 11, 2020, the World Health Organization declared COVID-19 a pandemic2 at a time when there were 118,000 cases in 114 countries. Now, by May 28, 2020, there are over 5.8 million COVID-19 cases contributing to more than 360,000 deaths in 213 countries and territories3. In a scramble to understand the pandemic’s trends and trajectories, several epidemiological models have been developed aimed at predicting or forecasting COVID-19 cases, burden on the healthcare system, and overall mortality, given assumptions and available information. One might ask, why so many models producing different results? The aim of this paper is to summarize the key concepts and types of epidemiological models, why they are useful and limitations.

Epidemiological models serve multiple functions including modeling the dynamics of an infectious disease, evaluate changes in disease patterns due to some targeted intervention such as vaccination, population and person-level quarantine, and animal control. Generally, the models are constructed using individual-level data intended to predict population-level patterns and dynamics. Trade-offs exist between parsimony and complexity such that the simpler models often lack precision but require fewer assumptions with few variables. In contrast, the more complex models often include multiple variables with complex interactions, which tends to lead to greater precision in outcomes. However, the attribution of each variable in the model is less well-understood in these models.

There are two categories under which these epidemiological models are classified. Models are either mechanistic built upon an understanding about the disease process and dynamics or forecasting or computational models, which are primarily statistical in nature. There are also hybrid models including aspects of both mechanism and forecasting approaches.

Mechanistic models include assumptions regarding disease dynamics and movement between mutually exclusive states. One of these is the SEIR, which is a compartmental model comprising four unique states: S=susceptible, E=exposed (infected by the pathogen but not yet infectious), I=infectious, and R=recovered, where the total population, N, is the sum (S+E+I+R) of all four states. These models are either deterministic based upon parameter values and initial assumptions solved using a set of differential equations, or stochastic allowing for randomness in chance of infection. As the number of cases of a disease increases, the amount of stochastic fluctuations decreases. Therefore, in a large population and a high disease incidence, the deterministic and stochastic models often yield similar estimates. Both deterministic and stochastic mechanistic models include various assumptions on transmission rate (β), incubation rate (time from exposure to symptom onset), latency rate (time from exposure to infectiousness, σ), and recovery rate, lambda. A simple illustration is provided in Figure 1. Another key variable included in this model is the estimated reproduction rate, Ro, which represents the average number of new infections due to exposure one infectious person. For a disease with an average infectious period given by 1/recovery rate and a transmission rate, β, the Ro is determined by β/λ. For COVID-19, Ro estimates vary considerably and are as high as 6-7.

Figure 1. Conceptual SEIR Mechanistic Model

Several of the COVID-19 models utilize the SEIR mechanistic framework including the Imperial College of London stochastic model, designed to predict the impact of non-pharmaceutical interventions on COVID-19 disease mortality and healthcare demands4. Other mechanistic models include the MIT model5, the Columbia model6 and the UCLA model7.

The second model type is the forecasting or computational model, which is built around statistical principles and ignores disease-specific dynamics such as transmission and recovery rates. These models attempt to fit available data to a line or a curve, using statistical modeling, and then using that fit extrapolate events into the near future. For example, the LANL model developed by the Los Alamos National Laboratory uses historical data on daily cases to parameterize future growth rate distribution while including uncertainty bands (prediction intervals) (Figure 2). An advantage of forecasting models is that they do not require assumptions regarding disease dynamics, e.g. the models do not require an estimate of Ro, etc., and are therefore less complex in derivation relying upon existing data and statistical fitting.

Figure 2. Short-Term Forecast of COVID-19 cases in the United States as of 2020-05-27. This figure was obtained from https://covid-19.bsvgateway.org/ on 2020-06-01. The figure was produced by the Los Alamos National Laboratory, Copyright Triad National Security, LLC. All Rights Reserved.

Some examples of current mechanistic and forecasting models are provided in Table 1.

Among infectious disease epidemiologists and other experts, it is generally understood that there is no perfect epidemiological model, particularly during the emergence of a new disease. All mechanistic models suffer from inherent variability in the assumptions within the model. This is particularly true for a novel disease, such as COVID-19, in which disease dynamics such as mode and rate of transmission, incubation and latency periods and recovery rates were unknown initially and remain elusive despite the large number of cases to date. The models also can miss or ignore local-level factors that impact the overall interpretation. Forecasting models fail to model changes in disease dynamics, are dependent upon existing data that are often incomplete due to reporting lags or are from unreliable tests with poor sensitivity and specificity and are often restricted to short-term forecasts. Early in the pandemic, forecasting model generated highly variable estimates; however, more recently these models appear to align closer with what is reported owing to the accumulation of global- and national-level data feeding into the statistical models. The reliance on available data early during the pandemic can lead to inaccurate forecast that can be misinterpreted by decision makers.

Despite limitations, epidemiological models serve an important role during and following pandemics, including the current COVID-19 pandemic. These models provide best and worst cases estimates of future cases and mortality, given assumptions. These models are also helpful to aid in understanding the role of interventions-both non-pharmaceutical such as stay-at-home orders and pharmaceutical such as vaccination programs on disease patterns. Models can also help identify areas where interventions are not working to aid in understanding other forces at play. Importantly, models are not static and therefore should not be viewed as fixed. Models should be presented to decision makers in a manner that is clear while emphasizing the limitations and variability in the estimates. It is vital that models are viewed in the context of the supporting data and assumptions with consideration for possible variability across populations. For instance, extrapolating model predictions from one country or state to another may be inappropriate given variation in access to testing, access to healthcare, demographic differences, and local in-place response efforts. It is also important that decision makers are informed on what the model does and does not account for. In times of uncertainty, it is natural to seek information and confirmation. We must be careful to not overly rely upon model predictions and estimates in this instance and instead view this information in context of what is known and still unknown.

Table 1. COVID-19 Forecasting and Mechanistic Models






SuEIR (modified SEIR accounting for unreported cases)

US and CA state COVID-related deaths




US cases, deaths, and hospitalizations



Metapopulation SEIR

US cases/hospital burden as states re-open


U Chicago

Age-structured SEIR

Cases in Illinois only



Stochastic forecast statistical model

US cases/deaths by state



Combination of statistical and disease transmission

US and global cases/deaths


Imperial1, Imperial2

Multiple mechanistic models, with different parameter assumptions

Country-specific deaths



Statistical models fit to recent data

Short-term mortality by country



Bayesian statistical forecasting model (aggregates national case/mortality data and anonymous location data)

Short-term mortality based on social-distancing behavior




[1] Tracy L. et al. Estimating the Impact of Human Papillomavirus (HPV) Vaccination on HPV Prevalence and Cervical Cancer Incidence in Mali. Clin Infect Dis. 2011 Mar 1; 52(5): 641-645. doi: 10.1093/cid/ciq190

[2] https://www.who.int/dg/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020

[3] https://www.worldometers.info/coronavirus/countries-where-coronavirus-has-spread/

[4] https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf

[5] https://www.covidanalytics.io/projections

[6] https://behcolumbia.files.wordpress.com/2020/05/yamana_etal_reopening_projections.pdf

[7] https://covid19.uclaml.org/model.html