Bringing an Analytics Mindset to the Pandemic
by Nico Neumann - April 23, 2020
Spend just 10 minutes on Twitter to catch up with Covid-19 news, and you’ll run into updated numbers and loud (sometimes angry) arguments about what all the data we’re collecting means. It’s proving difficult to pin down how infectious the virus is, what its mortality rate is, how effective different mitigation efforts are, and why different regions are seeing such different patterns of infection, mortality, and recurrence.
That lack of certainty is not at all surprising; after all, it’s a new disease that we’re learning about in real time, under horribly high-pressure conditions. Moreover, different regions have vastly different testing capacity and healthcare systems - those factors alone can explain much of the variability we’re witnessing.
That said, epidemiologists and other experts are running into many of the same issues that come up in any data-analytics problem. The truth is that collecting and analyzing data is rarely straightforward; at every stage, you need to make difficult judgment calls. The decisions you make about three factors - whom to include in your data set, how much relative weight to give different factors when you investigate causal chains, and how to report the results - will have a significant impact on your findings. Making the right calls will save lives in the current healthcare crisis, and improve performance in less drastic business settings.
Who should be tested?
In the case of an unknown disease, it is easiest to test only very sick people or even those who have already passed away. (In areas without enough testing kits, there may not be any choice in the matter.) Unfortunately, while this approach is easiest, it increases the perceived mortality rate. Let’s say 10 people are very sick and 1 would fall victim to a disease. Then we would record a 10% mortality rate. But if 100 people were actually infected, and 90 of them had mild symptoms (or no symptoms at all), then the actual mortality rate would be 1% - but you wouldn’t know that unless you tested more widely. The lesson: only looking at the most obvious cases makes the virus look worse than it is. Statisticians call this issue a selection bias in sampling.
Companies can easily make the same mistake. For example, let’s say an organization wants to know what’s behind an uptick in sales. The marketing manager hypothesizes that it was due to a new ad campaign. It’s tempting in this case to focus on outcomes that are easy to measure, in the name of efficiency. Let’s say we look at all the new customers arriving at our store or website and find out that half of them saw our advertising before buying from us. We may now conclude that the conversion rate of our advertising is 50%.
However, what about all the people who saw the advertising and didn’t come to our store or website? If we included those, the customer conversion rate would be much lower. We didn’t select those people as test candidates for our analysis, because it was more expensive and more difficult to include them. The wrong conversion rate has big implications for budget allocations and ultimately for return on investment - just as understanding both the infection and mortality rates of Covid-19 has huge implications for public-health policy going forward.
Solution: Don’t measure convenient samples; extend the study to include a more representative group. The degree to which this can happen depends on costs and available resources, of course.
How much weight should we give to different factors when we interpret the data?
The second challenge is to determine the relative impact of a factor on an outcome. Say that public-health officials are trying to understand what factors were most important to individual patients’ outcomes in the current pandemic. Determining that is not simple or straightforward because there are so many possible contributing factors: age, pre-existing conditions like heart disease or diabetes, health of the immune system, timing of intervention, and whether the healthcare providers were overtaxed, to name a few. These questions are very hard to answer as the influence of many critical factors and their interactions cannot be observed or measured directly.
Businesses face similar dilemmas all the time. Let’s return to our earlier example of a significant uptick in sales. The marketing manager might think it happened because of the new ad campaign she championed. But maybe it was because of recent tweaks to the website design, a pricing change, new talent on the salesforce, or because a key competitor made a bad move - or (most likely) some combination of factors. It’s impossible to know for sure, after the fact.
Solution: We need a scientific method that distinguishes between and isolates the contribution of individual factors, as randomized controlled trials (experiments) do. In business settings, it’s usually possible to use experiments that can test the importance of small, self-contained changes. In a pandemic, that’s not going to be possible (though there are natural experiments popping up as different countries take different approaches to managing the crisis).
How to report results?
After all calculations and estimations are completed, analysts need to decide how to report their findings. How results are reported can often affect perceptions of how bad or good a situation is.
In the case of the pandemic, various stakeholders have presented infection numbers in very different ways. We saw many media outlets reporting total cases and comparing the virus growth curves to argue that certain methods work better or to criticize government policies. However, is it fair to compare 100 infection cases in the U.S. with 100 cases in Singapore? The U.S. has over 320 million people, Singapore 5.6 million. Absolute numbers should always be seen in context. Once we adjust COVID-19 cases per capita, the numbers look very different. At the same time, only showing relative increases can be misleading too. Having a 50% increase in numbers has very different implications for a country with 2 infections than it does for a country with 10,000 known cases.
Business results can be presented in a different light depending on the reporting too. Imagine you have the opportunity to invest in different companies. One reports 20% revenue growth and a second company only 10%. As with the example of infection numbers, we can see how misleading the growth rate can be if the total number of products sold isn’t considered. Growing sales by 10% is much easier if you only sell 10 products rather than 10,000 per month (everything else being equal). Likewise, reporting total sales numbers alone (without a reference point) may not provide a fair comparison either.
Solution: Always provide - or request - multiple metrics, in particular absolute and relative numbers, to understand the full context of a situation. This can be “total sales increase” and “percentage increase” and year-by-year or regional comparisons.
Nico Neumann is an assistant professor at the Melbourne Business School, where he teaches business analytics and marketing communications.