Understanding disease severity, and especially the case fatality risk (CFR), is key to outbreak response. During an outbreak there is often a delay between cases being reported, and the outcomes (for CFR, deaths) of those cases being known. Simply dividing total deaths to date by total cases to date may lead to an underestimate of the CFR rate in real-time, because many cases have outcomes that are not yet known.

Knowing the distribution of these delays from previous outbreaks of the same (or similar) diseases, and accounting for them, can therefore help ensure less biased estimates of disease severity.
See the **Concept** section at the end of this vignette for more on how reporting delays bias CFR estimates.

The severity of a disease can be estimated while correcting for delays in reporting using methods outlines in Nishiura et al. (2009), and which are implemented in the *cfr* package.

A disease outbreak is underway. We want to know **how severe the disease is** in terms of the case fatality risk (CFR), but there is a delay between cases being reported, and the outcomes of those cases — whether recovery or death — being known. This is the *reporting delay*, and can be accounted for by knowing the reporting delay from past outbreaks.

- A time-series of cases and deaths, (cases may be substituted by another indicator of infections over time);
- Data on the distribution of delays, describing the probability an individual will die \(t\) days after they were initially infected.

- That data on reporting delays from past outbreaks is informative about reporting delays in the current outbreak.

First we load the *cfr* package.

Data on cases and deaths may be obtained from a number of publicly accessible sources, such as the global Covid-19 dataset curated by Our World in Data, a similar dataset made available through the R package *covidregionaldata* (Palmer et al. 2021), or data on outbreaks of other infections made available in *outbreaks*.

In an outbreak response scenario, such data may also be compiled and shared locally.
See the vignette on working with data from *incidence2* on working with a common format of incidence data which can help interoperability with other formats.

The *cfr* package requires only a data frame with three columns, “date”, “cases”, and “deaths”, giving the daily number of reported cases and deaths.

Here, we use some data from the first Ebola outbreak, in the Democratic Republic of the Congo in 1976, that is included with this package (Camacho et al. 2014).

We obtain the disease’s onset-to-death distribution from a more recent Ebola outbreak, reported in Barry et al. (2018). The onset-to-death distribution is considered to be Gamma distributed, with a shape \(k\) = 2.40 and a scale of \(\theta\) = 3.33.

**Note that** while we use a continuous distribution here, it is more appropriate to use a discrete distribution instead as we are working with daily data.

**Note also** that we use the central estimates for each distribution parameter, and by ignoring uncertainty in these parameters the uncertainty in the resulting CFR is likely to be underestimated.

The forthcoming *epiparameter* package aims to be a library of epidemiological delay distributions, which can be accessed easily from within workflows.
See the vignette on using delay distributions for more information on how to use this and other distribution objects supported by R to prepare delay density functions.

We use the function `cfr_static()`

to calculate overall disease severity at the latest date of the outbreak.

The `cfr_static()`

function is well suited to small outbreaks where there are relatively few events and the time period under consideration if relatively brief, so the severity is unlikely to have changed over time.

To understand how severity has changed over time (e.g. following vaccination or pathogen evolution), use the function `cfr_time_varying()`

.
This function is however not well suited to small outbreaks because it requires sufficiently many cases over time to estimate how CFR changes.
More on this can be found on the vignette on estimating how disease severity varies over the course of an outbreak.

It is important to know what proportion of cases in an outbreak are being ascertained to muster the appropriate response, and to estimate the overall burden of the outbreak.

**Note that** the ascertainment ratio may be affected by a number of factors.
When the main factor in low ascertainment is the lack of (access to) testing capacity, we refer to this as reporting or under-reporting.

The `estimate_ascertainment()`

function estimates the ascertainment ratio using daily case and death data, the known severity of the disease from previous outbreaks, and optionally a delay distribution of onset-to-death.

Here, we estimate reporting in the 1976 Ebola outbreak in the Congo, assuming that Ebola virus disease (at that time) had a baseline severity of about 0.7 (70% of cases result in deaths), based on CFR values estimated in later, larger datasets. We use the onset-to-death distribution from Barry et al. (2018).

```
# estimate reporting with a baseline severity of 70%
estimate_ascertainment(
data = ebola1976,
delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33),
severity_baseline = 0.7
)
#> Total cases = 245 and p = 0.959: using Normal approximation to binomial likelihood.
#> ascertainment_estimate ascertainment_low ascertainment_high
#> 1 0.7185383 0.7087172 0.8377214
```

This analysis suggests that between 70% and 83% of cases were reported in this outbreak.

More details can be found in the vignette on estimating the proportion of cases that are reported during an outbreak.

Simply dividing the number of deaths by the number of cases would obtain a CFR that is a *naive estimator* of the true CFR.

Suppose 10 people start showing symptoms of a disease on a given day and the end of that day all remain alive. Suppose that for the next 5 days, the numbers of new cases continue to rise until they reach 100 new cases on day 5. However, suppose that by day 5, all infected individuals remain alive.

The naive estimate of the CFR calculated at the end of the first 5 days would be *zero*, because there would have been zero deaths in total — *at that point*.
That is to say, the *outcomes* of cases (deaths) would not be known.

Even after deaths begin to occur, this lag between the ascertainment of a case or hospitalisation and outcome leads to a consistently biased estimate. Hence, adjusting for such delays using an appropriate delay distribution is essential for accurate estimates of severity.

Barry, Ahmadou, Steve Ahuka-Mundeke, Yahaya Ali Ahmed, Yokouide Allarangar, Julienne Anoko, Brett Nicholas Archer, Aaron Aruna Abedi, et al. 2018. “Outbreak of Ebola virus disease in the Democratic Republic of the Congo, April–May, 2018: an epidemiological study.” *The Lancet* 392 (10143): 213–21. https://doi.org/10.1016/S0140-6736(18)31387-4.

Camacho, A., A. J. Kucharski, S. Funk, J. Breman, P. Piot, and W. J. Edmunds. 2014. “Potential for Large Outbreaks of Ebola Virus Disease.” *Epidemics* 9 (December): 70–78. https://doi.org/10.1016/j.epidem.2014.09.003.

Nishiura, Hiroshi, Don Klinkenberg, Mick Roberts, and Johan A. P. Heesterbeek. 2009. “Early Epidemiological Assessment of the Virulence of Emerging Infectious Diseases: A Case Study of an Influenza Pandemic.” *PLOS ONE* 4 (8): e6852. https://doi.org/10.1371/journal.pone.0006852.

Palmer, Joseph, Katharine Sherratt, Richard Martin-Nielsen, Jonnie Bevan, Hamish Gibbs, Cmmid Group, Sebastian Funk, and Sam Abbott. 2021. “Covidregionaldata: Subnational Data for COVID-19 Epidemiology.” *Journal of Open Source Software* 6 (63): 3290. https://doi.org/10.21105/joss.03290.