Relating infections to \(R(t)\)

Overview

There are two primary classes of methods of estimating \(R(t)\) from case count data that are used in most R software packages.

The first class of methods assumes there is a formulaic relationship between infections and reproduction number, a relationship known as the renewal equation (Fraser 2007). These infections are then assumed to result in (some fraction of) the observed cases.
A second class of methods involves empirically calculating a quantity that approximates the latent quantity represented by a reproduction number by fitting a curve to the case count time-series and finding the time-varying slope in log space (and then performing other transformations). Empirical calculations are discussed in detail below in our examination of ways in which \(R(t)\) is constrained over time.

Renewal equation estimates of R(t)

The renewal equation relates \(R(t)\) and infections on day \(t\), \(I(t)\), using a third parameter known as the generation interval.

The generation interval, \(ω\), is the time between infection in the infector and infection in the infectee.

In this document we use the generation interval \(ω\) described by a probability mass function with non-zero values from day 1 (assuming that disease incubation takes at least 1 day) to a maximum day \(s\), i.e., the longest interval between infections in infector and infectee.

Taking care to note that \(R(t)\) is undefined on day 0 since there has been no transmission yet (and assuming the initial infections are \(I(0)\)), we sum over days to acquire the renewal equation:

\[ I(t) = R(t) \sum_{i = \max(1, t - s + 1)}^{t} \omega(i) \, I(t - i). \]

For brevity, we write the inner sum as \(Λ(t)\):

\[ Λ(t) = \sum_{i = \max(1, t - s + 1)}^{t} \omega(i) \, I(t - i). \]

The assumptions of this formulation, as per Green et al. (2022), are that incident infections can be described deterministically within each window of \([t-s+1,t]\) and that the generation interval distribution does not change over the modeling time.

Serial interval

A similar parameter to the generation interval is the serial interval, which is the time between symptom onset in the infector and symptom onset in the infectee. The serial interval and generation interval are interchangeable if the incubation time is independent from the transmission time, and some formulations of the renewal equation use a serial interval.

Exponential growth rate

A common reframing of the renewal equation is to equate \(R(t)\) with an exponential growth rate, \(r\). Under specific conditions and within a small time window (\([t-s+1,t]\)), infections in the early stage of an outbreak can be assumed to grow exponentially at a constant rate \(r\) (Wallinga and Lipsitch (2006)). Using the time window \([t- s+1,t]\) and assuming some initial infections \(k\), \(R(t)\) for \([t- s+1,t]\) can be defined using \(r\) and \(ω\):

\[ I(t)=ke^{rt} \] \[ R(t)= [\sum_{i = \max(1, t - s + 1)}^{t} \omega(i) \, e^{-ri}]^{-1} \]

Again, we will omit the writing the bounds for time in remaining formulae. A single \(R(t)\) value, say \(R_0\), can be substituted into an expression for the infection attack rate, the proportion of an at-risk population that contracts the disease during a specified time interval (Musa et al. (2020)), or in the final size equation (Ma and Earn (2006)), to estimate the proportion of all individuals that were affected by a disease with this \(R_0\):

The major difference between calculating \(R(t)\) from a renewal equation or an exponential growth rate equation is whether \(I(t)\) is used. If for a given time window both r and ω can be estimated independently, then \(R(t)\) can be inferred without infection data. Otherwise, infection data are needed to estimate \(R(t)\).

Solving for \(R(t)\)

Using the renewal equation and given that \(I(t)\) and \(ω\) are known, \(R(t)\) can be solved for algebraically starting with \(R(t=1)\) and iterating forwards in time. However, this will produce highly volatile estimates of \(R(t)\) that recover the incidence curve directly.

Solving directly for \(R(t)\) at every timestep is undesirable for several reasons:

observed infections are the result of a noisy process with random day-to-day variations, which do not reflect an underlying change in infectivity;
real-world infection data are rarely complete, especially in an emerging epidemic, meaning that a certain amount of uncertainty must be incorporated into any estimation framework;
infection incidences, \(I(t)\), are the data of interest but are cannot be observed directly, so many calculations instead use the observed reported cases, \(C(t)\), which requires some additional processing to incorporate into calculations of \(R(t)\).

Therefore, a variety of constraints on \(R(t)\) are added in the inferential process: using distributions on key variables, placing restrictions on how \(R(t)\) varies through time, and with additional data sources and delay distributions. These choices dictate which estimation framework is used, which can add additional constraints.

Empirical estimates of \(R(t)\)

In contrast to models that assume that the renewal equation defines the relationship between infections and \(R(t)\), smoothing or regression models calculate time-varying \(R(t)\) directly from the slope of the log of the infections time-series. Using this method, the relationship between \(R(t)\) and infections is empirically defined, being only constrained by the smoothing parameters of curve fit to infections data.