Excess mortality within England: post-pandemic method. Methodology
Official Statistics in Development
Updated: 21 November 2024
1 Introduction
This methodology document describes the methods used by the Office for Health Improvement and Disparities (OHID) for the Excess mortality within England: post-pandemic method monthly reports. It details how monthly excess deaths have been estimated. The method is similar to that used by the Office for National Statistics for their weekly reporting of excess deaths, which is described in the ONS Methodology Document.
Excess deaths are estimated by comparing the number of (observed) registered deaths each month with the number of deaths that would have been expected, based on recent trends. The numbers of expected deaths are estimated using a statistical model, based on trends in mortality rates over the previous five years, with adjustments to take account of the extremely high death rates in peak months of the COVID-19 pandemic.
The report also includes monthly directly standardised mortality rates to complement the excess deaths data: where there is a clear rising or falling trend in the past rates it is helpful to know that as context for the excess deaths.
All the data are presented by calendar month of registration at regional and upper tier local authority level, for subgroups of the population (age groups, sex, deprivation groups) and by cause of death.
During the COVID-19 pandemic, Public Health England, and then OHID, published weekly data which compared the numbers of registered deaths with the number expected, had there been no pandemic. These reports aimed to show the overall impact of the pandemic on mortality. The current monthly reports are not related to any particular cause or event: they are intended to provide ongoing monitoring, to highlight variations in monthly mortality that may arise for any reason.
2 Methods
2.1 Overview
Three separate sets of analyses are carried out:
excess deaths from all causes, broken down by age group, sex, deprivation quintile, region (former government office region) and upper tier local authority (UTLA)
excess deaths from specific causes, identified by underlying cause of death
directly age-standardised mortality rates (DSRs), broken down by age group, sex, deprivation quintile, region, upper tier local authority and underlying cause of death
All of the analyses in the first set involve a single model, which means all the results are entirely consistent with each other.
For each cause of death, a separate model is run, because groups of causes are not necessarily mutually exclusive, and running the model at a more granular level, so that causes can be grouped in different ways, is too computationally intensive. Deaths are selected using International Classification of Diseases, 10th revision (ICD-10) [1] codes assigned by the Office for National Statistics (ONS). The causes of death included in the report are those which account for large numbers of deaths or are of specific policy interest. The cause of death analyses are based on underlying cause of death coded in the death record. Because cause of death coding can be missing from the first registration record received, and added or amended later, the cause of death analysis should be considered provisional for the most recent two months released.
The DSRs are calculated independently (they do not come from the excess deaths models), but are based on the same data.
2.2 Expected deaths – generation of the modelled estimates
2.2.1 Data sources
Models to develop baseline estimates of the expected number of death registrations in a given month of the year are constructed using a combination of deaths and population denominator data from the previous five years, with the baseline period moving on one month each month so that every month’s baseline is calculated on the same basis as any other month. For example, the baseline (comparator) for deaths registered in January 2024 is calculated using data from February 2018 to January 2023 inclusive, and that for December 2023 is calculated using data from January 2018 to December 2022 inclusive, etc. However, data for March to May 2020 and November 2020 to February 2021 are excluded from the baseline calculations as these months were dominated by exceptional numbers of deaths from COVID-19. Specifically, months excluded from the baseline had more than 18% of deaths with COVID-19 as the underlying cause. Those not excluded from the baseline had less than 10% of deaths with COVID-19 as the underlying cause. No months fell between those two cut-offs. March 2020 was also excluded because it is recognised that at the start of the pandemic there was no code for COVID-19 so many deaths went unrecorded.
Mortality data
Deaths for the baseline period are drawn from fully coded and cleaned annual extracts supplied to us by ONS, supplemented by daily deaths data supplied by ONS (for recent periods, until the annual extract replaces them). Deaths are aggregated by month of registration, age group, sex, deprivation quintile, underlying cause of death and UTLA.
Denominator data
Population data for the baseline period are derived from the 2021 Census rebased mid-year estimates, published by ONS. Population data match the breakdowns set out for mortality data above. Mid-month populations are estimated by interpolating between the published mid-year estimates.
Where mid-year estimates are not available (2024 onwards at the time of publication) ONS 2022-based population projections are used. The projections are only available at England level, so have been proportioned out to subgroups using proportions from the 2023 mid-year estimates.
At the time of publication, rebased mid-year population estimates are not available below local authority level. In order to estimate deprivation quintile populations, the rebased local authority estimates are proportioned out using proportions from the 2021 mid-year estimates.
Cause of death analyses use whole population denominators.
2.2.2 Baseline model
Model outcome
The primary model provides estimates of expected deaths by month of registration at national and subnational level, and for subgroups of the population (age group, sex, deprivation group, region and UTLA). Similar models are subsequently run to provide estimates by cause of death.
Data structure and covariates
In line with the ‘rising activity, multi-level mixed effects, indicator emphasis’ (RAMMIE) model [2], independent variables include month of year, allowing for seasonal effects. Covariates were included, allowing for the effect of age, sex, deprivation, and geographical area. Age is grouped in the model into broad age bands for younger age groups (0 to 24, 25 to 49 and 50 to 64) and 5-year age bands for older age groups (65 to 69 through to 85 to 89 and 90 and over). Younger age groups cannot reasonably be modelled in 5-year age bands as the numbers of deaths within the socio-demographic subgroups are small, so the models take an unreasonable amount of processing time to converge and give unreliable estimates. The model standardises (indirectly) for age using the age groups specified.
A linear trend was also included in the model to take into account systematic changes in the rate of death that are not reflected in the changing age structure of the population. The trend was constructed by assuming a constant daily rate of change throughout the baseline period.
Data are presented by:
age group (0 to 24, 25 to 49, 50 to 64, 65 to 74, 75 to 84 and 85 and over) derived from age at the time of death
sex (male or female) based on sex reported in the death record
region and UTLA based on April 2023 UTLA boundaries [3]
deprivation quintile based on lower layer super-output area (LSOA) of residence: LSOAs are grouped into national deprivation quintiles using the 2019 Index of Multiple Deprivation (IMD) [4]
Models are also run for separate underlying causes of death (listed in the Appendix). The models are identical to the main model described above, with deaths filtered by underlying cause. The data are then presented by region.
The structure of the models used is hierarchical with denominators and counts of death each being fully disaggregated by age group, sex, geographic area, and deprivation.
Statistical modelling
Quasi-Poisson regression models are fitted on the logarithmic scale [5]. Quasi-Poisson models are used because when counts of deaths are independent of one another they theoretically follow a Poisson distribution. This has the characteristic property that as its mean (the expected number of deaths) increases, the variability of the observed count of deaths (its variance) rises in parallel such that the variance always equals the mean.
However, deaths are not completely independent. For example, an epidemic such as a high ’flu season results in outlying high rates of death for a period, which, if not accounted for, would carry an inappropriate amount of weight in the baseline. In these circumstances, the variance increases faster than the mean. This is referred to as overdispersion. Because Quasi-Poisson models allow the linear relationship between variance and mean to have a slope other than 1, they are more suitable for analysis or death rates when overdispersion exists.
The models contain the set of covariates outlined in the ‘Data structure and covariates’ section above. To allow for effects to vary between groups, interaction terms are included for trend and age, trend and deprivation, age and sex, age and deprivation and age and month of year. When modelling deaths it is fundamental to include a denominator representing the scale of exposure: the exposure includes the size of the population and the number of working days in the month (to take account of the fact that registrations only occur on working days). The exposure is specified in the model as an offset: person-working-days.
The model generates expected death rates for each population subgroup for the month specified, which are then applied to relevant population estimates to estimate the expected numbers of deaths for each month in each subgroup. The presence of trend in the model (and the interactions of trend with age and deprivation) means that the expected deaths assume that the trend through the baseline period would continue through to the current month.
The models are run using the generalised linear modelling function in the statistical package R [6].
2.3 Observed deaths
Results of the analyses are presented on a monthly basis. ONS provide a daily feed of registered deaths data, which are provisional and subject to change. For each monthly publication, the previous two months will also be updated using the latest version of the data, resulting in small changes reflecting improvements in cause of death coding or the addition of registrations not previously received. For example, in February, the models and outputs for November and December will be re-run, in addition to the first publication of data for January.
2.4 Calculation of excess deaths
Monthly excess mortality is calculated by taking the observed number of deaths registered in a month and subtracting the expected registered deaths for that month.
Cumulative excess mortality is estimated by summing the monthly excess deaths over the period selected. Excess death ratios, cumulative and monthly, are calculated by dividing observed deaths by expected deaths. This is the same way as a standardised mortality ratio is calculated: the reference rates are generated from the model.
2.5 Calculation of directly age-standardised rates
The DSRs, with 95% confidence intervals, are calculated using the standard OHID methodology approach. They are presented as rates per 100,000 person-years, so that the rates for all months are directly comparable with one another.
Rates are presented by age group (the same age groups as the excess deaths data), sex, deprivation quintile, region, upper tier local authority and underlying cause of death
Numerator data
The deaths data are as described in section 2.2.1 above, aggregated within each of the breakdowns listed above, by calendar month (based on the date of registration), additionally broken down by 5-year age group to facilitate the standardisation calculation.
Denominator data
Monthly population estimates are derived from ONS mid-year estimates of population, exactly as described in section 2.2.1 above.
Deaths are almost exclusively registered on working days, and the number of working days in each calendar month varies depending on how weekends fall and bank holidays. Hence the monthly denominators are adjusted to ensure monthly rates are comparable: the population estimate for each month is divided by the total number of working days in the calendar year, and multiplied by the number of working days in that month. Essentially the denominator is the number of person-working-days, with the results scaled for presentation as per 100,000 person-years.
3 Limitations
The reports published on excess mortality related to the pandemic included ethnicity in the model and reports. Permission to link deaths data to HES data and other sources was only granted for work relating to the management COVID-19 pandemic. Hence, we are unable to include ethnicity in the model or outputs for ongoing monitoring.
Deaths that tend to involve a coroner’s inquest, including a large proportion of those in people under 65, will be subjected to delays in reporting, as it may take months for an inquest to take place and for the death then to be registered. The times between death and registration have gradually increased over the last 20 years, and additional disruption to the coroner service during the pandemic means that it is important to interpret excess deaths in younger people particularly with caution. Furthermore, there are increasing numbers of deaths being registered, where the coding of cause of death is delayed: for this reason, the cause of death analyses will be provisional for the first two months of publication.
Deprivation is attributed ecologically based on the LSOA of residence at time of death. Any individual living in an area may not be representative of the area as a whole. In particular, for care home residents, the deprivation level of the location of the care home may not reflect the level of deprivation they experienced prior to entering the home.
The baseline is modelled using five years of historical data. These data include years of relatively high mortality and relatively low mortality. Although trends are detected over this period, and are used to inform the expected deaths, they are not necessarily stable trends – the prediction intervals reflect the uncertainty around prediction of any one year (where available). Because the trends from the baseline period are extrapolated to the current reporting month, in groups where the death rate has been increasing through the baseline period, the expected deaths will continue this trend. If death rates have risen since the baseline period, but less than expected, this will result in in negative excess deaths. Negative excess deaths do not necessarily indicate a downward trend in rates.
Excess deaths ratios, like standardised mortality ratios (SMRs), enable comparison between groups, but because they are indirectly standardised, the comparison is not precise. It is however very unlikely to be misleading.
Directly standardised rates for individual months are inevitably based on quite small numbers of deaths for some breakdowns. For the 0-24 age group within regions the number of deaths is frequently less than 10, and DSRs cannot reliably be calculated, so they are omitted. For some other breakdowns the DSRs are valid, but confidence intervals are wide and the figures should be interpreted with caution.
4 Comparison with other measures
Excess deaths measures are published by other public bodies, for varying purposes. For an overview of the different measures available see Measuring excess mortality: a guide to the main reports.
Appendix: Cause of death ICD-10 reference codes
Cause description | ICD-10 definition |
---|---|
Cancer | C00 to C97 |
Dementia and Alzheimer’s | F01, F03 and G30 |
All circulatory diseases | I00 to I99 |
Ischaemic heart diseases | I20 to I25 |
Cerebrovascular diseases | I60 to I69 |
Influenza and pneumonia | J09 to J18 |
Chronic lower respiratory diseases | J40 to J47 |
Cirrhosis and other liver diseases | K70 to K76 |
References
World Health Organization. ICD-10: international statistical classification of diseases and related health problems. Tenth revision, 2nd ed. 2004. World Health Organization. [Cited: 13 February 2024]
Morbey RA, Elliot AJ, Charlett A, Verlander NQ, Andrews N, Smith GE. The application of a novel ‘rising activity, multi-level mixed effects, indicator emphasis’ (RAMMIE) method for syndromic surveillance in England, Bioinformatics, Volume 31, Issue 22, 15 November 2015, pages 3660 to 3665.
Office for National Statistics. Local Authority District to Region Lookup (May 2023) in England, ONS geography open data. [Cited: 13 February 2024].
Ministry of Housing, Communities & Local Government. English indices of deprivation 2019 (IoD2019). Statistical Release. 26 September 2019. [Cited: 13 February 2024].
Gardner W, Mulvey EP, Shaw EC. Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological Bulletin. 1995, 118(3), 392 to 404.
R Core Team. The R Project for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. 2024. [Cited: 13 February 2024].
© Crown copyright 2024
Office for Health Improvement and Disparities
This publication is licensed under the terms of the Open Government Licence v3.0 except where otherwise stated. To view this licence, visit nationalarchives.gov.uk/doc/open-government-licence/version/3.
Where we have identified any third-party copyright information you will need to obtain permission from the copyright holders concerned.