About Our Data and Methodology

The COVID-19 indicators visualized on our map are derived from the data sources described below. These are all publicly available on the COVIDcast endpoint of our public Epidata API. The API documentation includes full technical detail on how these indicators are calculated.

Active Indicators

Doctor Visits

This indicator estimates the percentage of outpatient doctor’s visits that are due to COVID-like symptoms, based on data provided to us by health system partners. Tele-medicine visits are included in these estimates.

Hospital Admissions

This indicator estimates the percentage of daily hospital admissions that have diagnostic codes related to COVID-19, based on medical record summaries provided to us by health system partners.

Symptoms (Facebook)

This indicator estimates the percentage of people who have a COVID-like illness (fever, along with cough, or shortness of breath, or difficulty breathing), based on symptom surveys run by Carnegie Mellon. The surveys ask respondents how many people in their household are experiencing COVID-like symptoms, among other questions. Facebook directs a random sample of its users to these surveys, which are voluntary. Individual survey responses are held by CMU and are shareable with other health researchers under a data use agreement. No individual survey responses are shared back to Facebook. As of mid-June, about 70,000 such surveys were completed daily throughout the U.S.

Symptoms in Community (Facebook)

This indicator estimates the percentage of people who know someone in their community with a COVID-like illness (fever, along with cough, or shortness of breath, or difficulty breathing). The data is based on the same CMU-run survey, advertised by Facebook, as is used for the Symptoms indicator. Note that more people tend to report knowing someone in their community with a COVID-like illness than having someone in their own household with a COVID-like illness, so these numbers are higher than the household symptoms survey.

Hours Away from Home (SafeGraph)

These indicators estimate the proportion of people who spend time outside their homes during daytime hours, using mobile device location data provided by SafeGraph. “Away from Home 6hr+” is the proportion spending more than 6 hours outside their home, while “Away from Home 3-6hr” is the proportion spending between 3 and 6 hours outside their home. These estimates may be related to the spread of COVID-19, since they are related to the number of people interacting with others outside their homes, and also reveal the impact of the pandemic and movement restrictions.

Search Trends (Google)

This indicator is based on the number of Google searches for COVID-related topics, relative to each area’s population, based on Google search statistics provided to us by Google’s Health Trends group.  A larger number corresponds to more COVID-related searching.

Combined

The “Combined” map represents a combination of all the indicators currently featured on the public map.  As of this writing, this includes Doctor Visits, Symptoms (Facebook), Symptoms in Community (Facebook), and Search Trends.  It does not include official reports (cases and deaths), hospital admissions, or SafeGraph signals.  We use a rank-1 approximation, from a nonnegative matrix factorization approach, to identify an underlying signal that best reconstructs the indicators.  Higher values of the combined signal correspond to higher values of the other indicators, but the scale (units) of the combination is arbitrary.

Official Reports

Cases

These indicators show the number of new confirmed COVID-19 cases per day. The maps reflect only cases confirmed by state and local health authorities. They are based on confirmed case counts compiled and made public by a team at Johns Hopkins University and by USAFacts. We use Johns Hopkins data for Puerto Rico and report USAFacts data in all other locations.

Deaths

These indicators shows the number of COVID-19 related deaths per day. The maps reflect official figures by state and local health authorities, and may not include excess deaths not confirmed as due to COVID-19 by health authorities. They are based on confirmed death counts compiled and made public by a team at Johns Hopkins University and by USAFacts. We use Johns Hopkins data for Puerto Rico and report USAFacts data in all other locations.

Paused and Retired Indicators

Paused and retired indicators are currently not shown on the public map. Retired indicators will likely not be shown again.  See the COVIDcast release log below for details on when and why each indicator was removed.

Surveys (Google)

(Paused) This indicator estimates the percentage of people who know someone in their community with a COVID-like illness (fever, along with cough, or shortness of breath, or difficulty breathing). The data is based on Google-run symptom surveys, through publisher websites, Google's Opinions Reward app, and similar applications. These surveys are voluntary.  As of mid April, about 600,000 people answered the survey daily throughout the U.S.  Note that these Google surveys are estimating a different quantity than the surveys given to Facebook users (percentage of people who know someone in their community who is sick, rather than percentage of people who are sick), so the estimates from the Google surveys tend to be larger.

Flu Tests (Quidel)

(Archived) This indicator is based on data about influenza lab tests provided to us by Quidel, Inc., a company that makes equipment and kits for medical tests.  When a patient (whether at a doctor’s office, clinic, or hospital) has COVID-like symptoms, standard practice currently is to perform a conventional influenza test to rule out seasonal influenza (flu), because these two diseases have similar symptoms. While the number of COVID tests performed depends on local capacity and testing policy, influenza testing is not influenced by these factors.  Because a different number of labs may report on different days, we track the average number of flu tests performed per flu testing device (in a given location and on a given day).

About Our Methodology

Full technical documentation on the sources of our data, and how our estimates are constructed, is available in the COVIDcast API data source documentation.

Live Estimates

The real-time COVID-19 indicators presented on the COVIDcast site represent our best estimates given all data that we have available up until now.  For example, the estimates on our site for April 24, 2020 represent our current best estimate of the indicator values for that day.  The first estimates for the indicator values for April 24 would typically be available on April 25 (one day later), but estimates for these April 24 values may be updated on later days as new data becomes available.  This phenomenon is particularly prominent with the Doctor Visits indicator, which is based on doctor’s visits that do or do not involve COVID-like illnesses: there is generally a lag in how some of the data is made available to us, and a large fraction of doctor’s visits on any day is only reported to us several days later.  For that reason, our Doctor Visits estimates that are just a few days old may be less reliable. When we deem them too unreliable, we do not post them, which is why this indicator is often available only up until a few days before the current day.

Smoothing

For each indicator, our estimates are formed using data smoothing techniques.  The individual smoothing technique differs based on the indicator, but in all cases, we perform some kind of data smoothing (akin to averaging, or weighted averaging) across an approximately one week window.  

Missing Estimates

Generally, we do not report estimates at locations with insufficient data (or insufficiently recent data).  The Search Trends indicator is not available at the county level, as data is only available at a coarser geographic resolution in the first place.  For the Doctor Visits and Facebook Surveys indicators, we lump together all counties in a given state that do not have sufficient data for their own individual estimate, and create a “rest of state” estimate that includes all of them.

Intensity Heat map

The “Intensity” view presents a heat map of these estimates.  For each indicator, we use a fixed range of values, from a “low” value to a “high” value, and assign a color to each value in between, as shown to the left of the map.  These “low” and “high” values are different for each indicator, but for a given indicator, they are constant across time and geographic hierarchy, meaning that the heat maps are comparable across days.  At the county level, the “rest of state” estimates are plotted in semi-transparent colors, to make the individual counties where estimates are made more easily visually distinguishable.

7-day Trend Map

The “7-day Trend” view presents a color map of the trend underlying these estimates.  This is computed by calculating the line of best fit (as measured by squared error) over the last 7 days.  So for example, the trend on April 7 is based on the line of best fit through the estimates from April 1 through April 7.  We then perform a basic statistical test to determine whether this line is significantly rising or falling.  

Correlation Analyses

Empirically (analyses conducted as of late April), we find that each of our COVID-19 indicators, averaged over a 1 week period, has a reasonably strong positive correlation (in particular, Spearman correlation, which measures correlation on the scale of ranks and is thus invariant to monotone transformations) with the number of COVID-19 cases confirmed during that same week, as made available through the JHU CSSE COVID-19 GitHub repository.  The incidence of confirmed COVID-19 cases is arguably viewed as “the standard” metric for current COVID-19 activity (albeit flawed because it is confounded by issues like testing capacity and policy), so this is a reassuring finding.  An R notebook which explicitly computes these correlations (and is completely self-contained, able to be re-compiled by any user with access to R and RStudio) is available here.  

Release Log

  • The map layout has been redesigned to prevent the controls from obscuring the map.
  • Updated maps: The cases and deaths maps now use data from both Johns Hopkins University and USAFacts. Data from Johns Hopkins University is used for Puerto Rico, while USAFacts is used everywhere else.
  • Numerous minor bug fixes.
  • New map: The “Hospital Admissions” map indicates the proportion of daily hospital admissions with COVID-related diagnoses, based on data from health system partners.
  • Minor bug and layout fixes.
  • Map controls have been revised to more easily select indicators and geographic areas.
  • A new search bar makes it possible to quickly find any county or city of interest.
  • The color scale for increasing and decreasing 7-day trends has been updated to more clearly highlight trends.
  • New maps: The “Away from home” maps use mobile device location data from SafeGraph to estimate the proportion of people spending time outside their homes each day, for either 3-6 hours or more than 6 hours.
  • Map color scales on the cases and deaths maps are now logarithmic, making it easier to see differences among regions with both low rates and high rates.
  • Cases and deaths maps are now based on 7-day averages, rather than reporting figures for a single day.
  • The COVIDcast map now includes Puerto Rico. Not all data sources are available for Puerto Rico, but data will be displayed when available.
  • The “Combined” signal now includes standard error bands when viewing the time series plot for a specific geographical area, representing the estimated uncertainty in this signal. This uncertainty comes because the signal is a combination of the other signals which are based on survey estimates or other estimates with margins of error. Details on how the standard error bands are calculated will be made available in the detailed methodology document linked above.
  • Tooltips for the official Cases and Deaths signals have been updated to contain the population, raw count, and count per 100,000 people, to help distinguish sparsely-populated areas with one or two cases from dense urban areas with more total cases but an apparently lower rate per 100,000 people.
  • Fixed a bug that inflated the color value for per capita Cases and Deaths relative to the legend.
  • Other small bug fixes and improvements in the COVIDcast map.
  • The “Combined” map has been updated to include the "Symptoms in Community (Facebook)" indicator.
  • New map: The “Combined” signal represents a statistical combination of the other indicators, not including the official reports (cases and deaths). For more information how this indicator is calculated, see the details above in the list of indicators.
  • New map: The “Symptoms in Community (Facebook)” map shows the estimated fraction of people who know someone with a COVID-like illness in their local community. 
  • The “Surveys (Facebook)” map has been renamed “Symptoms (Facebook)”, to reflect that it asks respondents whether people in their household have COVID-like symptoms.
  • The “Surveys (Google)” map has been removed. This data is still available in the public Epidata API, but new data will not be collected.
  • Time series plots now include shaded regions showing the standard error of the signal estimates, when available.
  • New map: The “Deaths (JHU)” map shows death ratios (deaths per 100,000 population) due to COVID-19 per day. This reflects official figures from state and local health authorities, as compiled by a team at Johns Hopkins University.
  • New map: The “Confirmed Cases (JHU)” map shows confirmed case ratios (cases per 100,000 population) of COVID-19 per day. This reflects official figures reporting cases confirmed by testing to be COVID-19, as compiled by a team at Johns Hopkins University.
  • The “Flu Testing (Quidel)” map is no longer shown. During flu season, rates of flu tests may have correlated with rates of COVID-like illnesses, as many doctors who suspected COVID-19 conducted flu tests to rule out influenza. However, the end of flu season means few flu tests are currently conducted.
  • Previously, the “Search Trends (Google)” signal reported search volume on each day on the map for the following day; for example, search volume on April 16 would be mapped as occurring on April 17. This is no longer done.
  • Initial public release of COVIDcast.
  • Includes maps of Doctor Visits, Facebook and Google surveys, Google search trends, and Quidel flu testing.