COVID-19 - Predicting the future

Written by Prof. Hans Bachor

One of the biggest news stories right now is the spread of coronavirus, of COVID-19, across the world. COVID-19 was declared an epidemic by the World Health Organisation (WHO). It is spreading, in numbers and in locations. It could be coming to many countries and become a pandemic. We don’t want to panic and need to know: Can it be slowed down? What is the data for the rate of infection? What is our chance to recover from the virus? What is the situation now, and what will be the situation next week? Next month?

We don’t have the precise answers, but in the modern age of data science we fortunately have access to the latest official data and we can draw our own conclusions using simple reliable mathematical models.

The first thing we need is data. Data can be sourced either directly from the WHO or from a reliable secondary source such as Wikipedia. These data sources are updated every day - and we can see and analyse the spread of the virus and its reach every day.

For data we can try the Wikipedia page for coronavirus, which is updated day by day. For example, for up-to-date data of the virus as it spreads in China we see:

2020 coronavirus patients in China by Phoenix7777 (CC BY-SA 4.0)

We want to learn how the virus spreads in one country, in this case China, and how well the amazing controls, such as the shutdown of cities such as Wuhan, work. Can we see that this drastic regulation, and the heroic effort of the population, actually work?

While the real world is complicated there are some simple mathematical models that can tell us a great deal about how something grows and declines over time. The growth and decline of a value is a very common case in biology (growth of a population), in finance (growth of my savings account) or in the environment - the volume of water in the water storage dam of my city.

The simplest case is linear growth or decline. This would apply to my savings account if I save a fixed sum every month and no interest payments are received. Or in my water storage, when I take out the same amount of water every day. This is easily checked when I plot the money in my account, the water storage volume, or the number of people infected, recovered or killed by the virus, as a function of time. The above graph is an example of one such plot.

Is there linear growth of the virus? If yes, this would be a straight line on the graph. The data in the graph are for three categories: blue represents recoveries, yellow represents people under treatment, red represents deaths. The sum total is the number of reported cases. Linear growth would mean a fixed number of additional cases for every day.

The graph shows that total number of cases initially grew faster than linear. Then it started to slow down about February 10th. Since mid-February it has resembled linear growth, with a slower rate than in January. Eventually we like to see no more growth.

The rate of new confirmed infections is shown as white circles. This rate rose quickly in January, well beyond linear growth. The rate peaked around February 6th and then declined. By March 1st the rate is constant. That means the total number of cases is back to a slow linear growth. That is actually good news. Eventually the rate of growth will be zero - the victory over the virus is reached. We are on the way with this in China.

Within the data for all cases we see a low growth of deaths (red) which looks almost linear and a fast-growing number of recoveries (blue), which continues to grow steadily. It appears that recently the number of new recoveries is almost constant. That is great. There is hope that at some point within the next month of March 2020 all cases are either recovered, while unfortunately a small fraction has died. So, what is the death rate?

This can be judged by a semilogarithmic plot. This is a plot of the same or similar data with a different vertical scale. The scale in the graph shown below is logarithmic. That means the distance of two points on the vertical axis is given by the ratio of the two values.

Log-linear plot of coronavirus cases with linear regressions by Galerita (CC BY-SA 4.0)

In such a plot a straight line indicates exponential growth. For example, the number of COVID-19 cases in China (red triangles) grew exponentially in early January. By the end of February, we have a much slower growth, as we discussed above - slow and linear growth. The vertical distance between two sets of data is the ratio of the two sets.

Look at the number of cases in China (red) and the number of deaths in China (pink). We see that the distance narrows throughout February and March. That means the death rate is actually going up, from about 1 case in 100 reported (1%) to about 2 cases per 100, or 2%. That means for the remaining people the death rate is slightly increasing, not such a good prospect.

For the rest of the world the predictions are more complex. We can see the number of cases (dark blue) growing exponentially in early February - and an epidemic was declared. Unfortunately, this exponential growth rate increased in the last week of February. That means we are now seeing a much faster spread, and we have to work hard to limit the growth: first to linear growth, as is happening in China, and from there to stop the growth completely. There is no evidence for this now. We have to increase the control.

The death rate in the rest of the world is also a worry. The linear distance between the confirmed cases (dark blue) and the deaths in the rest of the world (light blue) is narrowing. That death rate is growing and is now similar to the death rate in China. That is of concern. We have to increase the control.

We can see that studying these graphs, and using our knowledge of linear and exponential growth, is a great tool to understand the current epidemic. A great tool for forecasting and decision making. The WHO and our doctors are doing this and advising us, so that we can understand the effect of the virus better and inform others.

Most importantly we can avoid an uninformed panic.

Now we can use the same tools to predict how our savings will grow, by setting money aside (linear growth) or having a savings account with a fixed interest rate, earning more and more interest over time. Plotting your savings as a function of time allows you to analyse how effective your savings program is. Or if you have a mortgage, how long it will take you to pay off the mortgage.

Similarly we can plot the volume of water in a storage dam. A linear decline means you can predict when the dam is empty. You might have to switch to an exponential decline, only using a fixed percentage of water every day, and then the water will last longer.

Have a great time plotting data and make your own forecasts.


Thank for your time - and let’s hope the virus is tamed very soon.


Hans Bachor