Rates
An introduction to basic statistics and an introduction to confidence intervals are available elsewhere in this guidance.
Calculation of crude rates
The rate of events \(r\) is given by:
\[ r = \frac{O}{n} \]
where:
\(O\) is the number of observed events
\(n\) is the population-years at risk
Calculation of confidence intervals for crude rates
The \(100(1 − \alpha)\)% confidence limits for the rate \(r\) are given by:
\[ r _{lower} = \frac{O _{lower}}{n} \] \[ r _{upper} = \frac{O _{upper}}{n} \] where:
- \(O _{lower}\) and \(O _{upper}\) are the lower and upper confidence limits for the observed number of events.
\(O _{lower}\) and \(O _{upper}\) can be calculated using either Byar’s method or the \(\chi^2\) exact method, depending on the size of the observed number of events
Counts of 10 or greater
Provided the rate \(r\) is low and the denominator at risk is large, the variability in the observed count \(O\) is described by the Poisson distribution. This can be calculated using Byar’s approximation as it is computationally simple and gives very accurate approximations to the exact Poisson probabilities (Breslow and Day, 1987).
Using Byar’s method, the lower and upper confidence limits of the observed number of events are given by:
\[ O _{lower} = O \left(1-\frac{1}{9O}-\frac{z}{3\sqrt{O}}\right)^3 \]
\[ O _{upper} = (O+1) \left(1-\frac{1}{9(O+1)}+\frac{z}{3\sqrt{O+1}}\right)^3 \]
where:
- \(O\) is the total observed count of events in the local or subject population
- \(z\) is the \(100\left(1 - \frac{\alpha}{2}\right)\)th percentile value from the standard normal distribution
Counts of less than 10
Where the observed number of events \(O\) is less than 10, the \(\chi^2\) exact method should be used to calculate the lower and upper confidence limits of the observed number of events. Using the link between the Poisson and \({\chi}^2\) distributions (Armitage and Berry, 2002), these are given by:
\[ O _{lower} = \frac{{\chi}^2_{lower}}{2} \] \[ O _{upper} = \frac{{\chi}^2_{upper}}{2} \] where:
\(O\) is the total observed count of events in the local or subject population
\({\chi}^2_{lower}\) is the \(100\left(1 - \frac{\alpha}{2}\right)\)th percentile value from the \({\chi}^2\) distribution with \(2{O}\) degrees of freedom
\({\chi}^2_{upper}\) is the \(100\left(1 - \frac{\alpha}{2}\right)\)th percentile value from the \({\chi}^2\) distribution with \(2{O}+2\) degrees of freedom
Calculation of confidence intervals when observed events are not independent
The methods described above assume that events are unique and not interrelated. However, in some circumstances this is not the case. For example, sickness absence days where someone is more likely to be absent on a given day if they were off sick the previous day than if they weren’t. This may mean that days tend to cluster together in a manner which is not random and cannot be considered independent.
In these cases, an alternative approach to calculating confidence intervals is required. We can assume that individual absences of however many days are independent of each other. This may not always be true of course, but it’s a step in the right direction. The confidence interval calculations can then be based on the number of distinct absences, while presenting the rate still in terms of numbers of absence days. The overall rate is calculated in the same way, based on the number of interest, for example absence days.
However, the confidence intervals for the rate need to be calculated by separating out absences into those of 1, 2, 3, up to the maximum number \(n\) days.
\[ O _{lower} = O + \sqrt{ \frac {\sum_j \left( {O_j j^2} \right)} {O}} \left( O \left(1-\frac{1}{9O}-\frac{z}{3\sqrt{O}}\right)^3 - O \right) \]
\[ O _{upper} = O + \sqrt{ \frac {\sum_j \left( {O_j j^2} \right)} {O}} \left( (O+1) \left(1-\frac{1}{9(O+1)}+\frac{z}{3\sqrt{O+1}}\right)^3 - O \right)\]
where:
- \(O\) is the total observed count of events in the local or subject population, for example absence days
- \(O_{j}\) is the number of absences of length \({j}\) days
- \(j\) is the number of events (single day absences, 2-day absences, 3-day absences and so on)
- \(z\) is the \(100\left(1 - \frac{\alpha}{2}\right)\)th percentile value from the standard normal distribution
These formulae can then replace \(O_{lower}\) and \(O_{upper}\) in the standard equations at the top of this page to give upper and lower confidence limits for the crude rate.
Tools
The following tools are available to calculate crude rates:
The PHEindicatormethods R package includes the function phe_rate()
The PHStatsMethods Python package includes the function ph_rate()
The Excel tool for common public health statistics and their confidence intervals (xlsx)
Page last updated: August 2024