Proportions
An introduction to basic statistics and an introduction to confidence intervals are available elsewhere in this guidance.
Calculation of proportions
The proportion \(p\) is given by:
\[ p = \frac{O}{n} \]
where:
\(O\) is the observed number of individuals in the sample or population who have the specified characteristic
\(n\) is the total number of individuals in the sample or population
Calculation of confidence intervals for proportions
Confidence intervals for proportions are determined using the binomial distribution. A normal approximation method is often presented in statistical textbooks but does not perform well when the numerator or denominator is small. The preferred OHID method is the Wilson Score method (Wilson, 1927) which has been evaluated and recommended by Newcombe and Altman (Newcombe, 1998; Newcombe and Altman, 2000). It can be used with any data values and, unlike some methods, it does not fail to give an interval when the numerator count and, therefore, the proportion is zero (Agresti and Coull, 1998).
Using the Wilson Score method the \(100\left(1 - \frac{\alpha}{2}\right)\)% confidence limits for the proportion \(p\) are given by:
\[ p _{lower} = \frac{2O + z^2 - z\sqrt{z^2 + 4Oq}}{2(n + z^2)} \]
\[ p _{upper} = \frac{2O + z^2 + z\sqrt{z^2 + 4Oq}}{2(n + z^2)} \]
where:
\(O\) is the observed number of individuals in the sample or population who have the specified characteristic
\(n\) is the total number of individuals in the sample or population
\(𝑞\) is the proportion without the specified characteristic (\(1 – p\))
\(z\) is the \(100\left(1 - \frac{\alpha}{2}\right)\) th percentile value from the standard normal distribution
For example, for a 95% confidence interval, \(\alpha = 0.05\) and \(z \cong 1.96\) (the 97.5th percentile value from the standard normal distribution).
Tools
The following tools are available to calculate proportions:
The PHEindicatormethods R package includes the function phe_proportion()
The PHStatsMethods Python package includes the function ph_proportion()
The Excel tool for common public health statistics and their confidence intervals (xlsx)
Page last updated: August 2024