Information Overload – Sifting through the COVID-19 Data

If you’ve taken a look at the UK government’s COVID daily update dashboard recently, you may have noticed that the volume of information presented has significantly increased compared to the first wave. This can sometimes feel overwhelming and you may be asking yourself how you can make sense of it all to get a clearer picture of what is going on. 

Over the last few weeks, we have been seeing case numbers increase quite dramatically, even though on Boxing Day, 43% of England’s population were living under the toughest restrictions of tier 4 and a further 24.8 million in tier 3, with a subsequent full national lockdown announced on 4th January. Why have case numbers continued to rise and when might we see these numbers come down? According to the World Health Organisation, on average it takes 5-6 days from when someone is infected with coronavirus for symptoms to show, but it can take up to 14 days. There is also a time lag between a person presenting with symptoms and getting a positive test result on a PCR test. Latest estimates from NHS Track and Trace (published 7th January 2021) showed that for Pillar 2 testing (testing in the community), just 19% of test results were received within 24 hours of taking a test and 17% of test results took longer than 72 hours to be returned. For home testing kits, the worst performing of the Pillar 2 testing strategies and where there will also be a delay in a person requesting a test and receiving it, just 3.5% of results were received within 24 hours of the test being taken and 33% were received after more than 72 hours. Given the time taken for symptoms to show and the subsequent time take for a person to receive a positive test result and it be logged in the system, it is unsurprising that the effects of lockdown are not seen for a good amount of time after restrictions are introduced. Today has seen the third consecutive day that COVID-19 cases have been in the 40,000s, compared to a peak of 68, 053 on 8th January, suggesting that we may now be seeing the effects of strict lockdown measures and have seen the end of the knock on effects of families mixing on Christmas Day. Hopefully, by the time you are reading this, case numbers will have continued to decrease.

But we are not out of the woods just yet. We know that there is a delay in a person presenting with COVID-19 symptoms and their condition deteriorating enough to require hospitalisation, and a further delay in hospitalisation and subsequent death. The peak in case numbers has yet to be translated into a peak in hospitalisations and a peak in death numbers. Unfortunately, we are likely to see these numbers continually increase for the next couple of weeks. But how should we interpret the death figures that we are presented with daily? The daily death count that is usually the headline figure is “Deaths within 28 days of a positive test by date reported”. There are two major issues with this data that I feel it is important to highlight. The first is the fact that this data is subject to big fluctuations, with much lower death counts on weekends. If we examine the variability in death counts in the second wave compared to the first, we see that the number of deaths recorded at weekends is now lower than deaths recorded at weekends in April at the height of the first wave. Why might this matter? If death counts are lower at weekends, then death counts are likely to be inflated during weekdays to account for this. Headlines such as “Covid: UK reports record 1,564 daily deaths” can then be misleading as they don’t account for this artificial inflation. It is important, when dealing with noisy time series data, to look at moving averages and the 7-day averages presented with the daily data give a much better representation of the true death count. A more reliable estimate of deaths is “Deaths within 28 days of a positive test by date of death”, rather than by day reported, but this data is slower to collect as data for the 5 days prior to the latest update date is typically incomplete. When it is important to know what is happening in the UK right now, relying on data that has a lag of even just 5 days, can be a major limitation.

The second problem with the reported death data is the way that deaths are defined: “Number of deaths of people who had had a positive test result for COVID-19 and died within 28 days of the first positive test”. This means that if a person has a positive test but then the next day is run over by a bus and killed, this would be counted as a COVID-19 death. Conversely, if a person has a positive test result for COVID-19 but dies on day 29 after their first positive test, they would not be counted as a COVID-19 death. A more reliable way to define deaths as COVID-19 related is to look at the death certificate, which is what data from the Office for National Statistics considers. This data presents the number of deaths of people whose death certificate mentioned COVID-19 as one of the causes, but this data is published only weekly and there is a lag in reporting of at least 11 days. The ONS also considers excess deaths, which examine how many more deaths we are seeing compared with the 5-year average for the same time of year.

 Trying to get an accurate picture of what case numbers mean for underlying prevalence is difficult and has already been discussed by PHASTAR. But it seems that death data is no less confusing. There remains a constant balancing act between needing the most up-to-date data and ensuring that this data is reliable enough to be acting upon. And, of course, the issues discussed here are not unique to the UK. Every country across the globe faces difficulties in collecting, assessing, and disseminating COVID-19 data. Deciding how to define a COVID-19 case, and a COVID-19 death is not a straightforward task, and yet it is of paramount importance that the data collected accurately reflects the current situation. But with every country choosing their own definitions, it is difficult to make meaningful country-to-country comparisons. Each country also faces their own challenges in how they choose to communicate the information that they are collecting. In the UK we are frequently presented with lots of data which has led to a rise in “couch statisticians” whereby anyone can carry out a “statistical analysis” of the information. But is this a good idea when most of the population are not trained in statistics? As statisticians, we understand the intricacies of data, we understand the volatility of time series data, we appreciate the role of uncertainty. Choosing how much data to make publicly available varies throughout the world and simply providing the data isn’t enough; the commentary that goes along side it is just as important. 

As for where we stand right now; hopefully we are now going to be seeing a continual drop in case numbers and that 7-day average curve will only be going in one direction. But whilst we wait for that peak in cases to filter through to a peak in deaths, we are likely to see death numbers continue to increase and it is likely that we will be seeing more deaths in this second wave than we did in the first. We’re not over the worst of it yet, but hopefully, with the introduction of a widespread vaccine programme, that light at the end of the tunnel may be getting closer.