Help: Sample Size for Estimating a Single Proportion

Overview

Proportion estimation is one of the most common sample size problems. It arises whenever you need to estimate the percentage of a population with a certain characteristic, for instance, the prevalence of a disease, the proportion of customers satisfied with a service, or the percentage of voters supporting a candidate.

This calculator determines the minimum number of subjects needed to estimate a proportion with a specified level of precision (margin of error) at a given confidence level. The required sample size depends on the expected proportion, the desired precision, and the confidence level.

Sample size is maximised when the expected proportion is 0.5 (maximum uncertainty). As the proportion moves towards 0 or 1, the required sample size decreases.

Worked Example

Scenario: Vaccination Coverage Survey

A public health department wants to estimate the proportion of children aged 12–23 months who have received all recommended vaccinations in a district. Based on previous surveys, the expected coverage is approximately p = 0.70 (70%). They want the estimate to be within ±5 percentage points (absolute precision = 0.05) with 95% confidence.

Using Statulator step-by-step:

1 Open the Sample Size Calculator for Estimating a Single Proportion.

2 Set Confidence Level to 95%.

3 Enter the Expected Proportion as 0.70.

4 Enter the Precision as 0.05.

5 The calculator shows the required sample size: n = 323.

Hand calculation verification:

\[ n = \frac{z_{\alpha/2}^{2} \cdot p(1 - p)}{d^{2}} = \frac{(1.96)^{2} \times 0.70 \times 0.30}{(0.05)^{2}} = \frac{3.8416 \times 0.21}{0.0025} = \frac{0.8067}{0.0025} = 322.7 \]

Rounding up: n = 323.

Adjustments

Finite population correction: If the district has only N = 2,000 children in the target age group:

\[ n_{\text{adj}} = \frac{n}{1 + \frac{n - 1}{N}} = \frac{323}{1 + \frac{322}{2000}} = \frac{323}{1.161} \approx 278 \]

Response rate: If an 80% response rate is anticipated, the sample size should be inflated: 323 / 0.80 = 404 (before FPC), or 278 / 0.80 = 348 (after FPC).

Conservative approach

If the expected proportion is completely unknown, use p = 0.50 for the most conservative (largest) sample size:

\[ n = \frac{(1.96)^{2} \times 0.50 \times 0.50}{(0.05)^{2}} = \frac{0.9604}{0.0025} = 385 \]

Interpretation Guide

Output	Interpretation
Required Sample Size (n)	The number of subjects to sample. If the study has a binary outcome (e.g., vaccinated or not), each subject contributes one observation.
Live Interpretation	A sentence summarising the result, for example: “A sample of 323 subjects is required to estimate the proportion within ±5% with 95% confidence, assuming an expected proportion of 70%.”
Visualisation	Plots the required sample size across a range of expected proportions for several precision values. This is especially useful when the expected proportion is uncertain.
Tabulate	A table showing sample sizes for combinations of expected proportion and precision values, useful for study protocols.

Practical tip: The sample size is most sensitive to changes in precision. Doubling precision (e.g., from ±5% to ±10%) reduces the required sample size by a factor of 4. Consider whether a slightly wider margin of error would be acceptable to reduce costs.

Formula

Base Formula (Absolute Precision)

\[ n = \frac{z_{\alpha/2}^{2} \cdot p(1 - p)}{d^{2}} \]

where:

\( n \) = required sample size
\( z_{\alpha/2} \) = critical value of the standard normal distribution for confidence level \( (1 - \alpha) \)
\( p \) = expected (anticipated) proportion
\( d \) = desired absolute precision (margin of error, on the proportion scale)

Relative Precision

When precision is expressed as a fraction of the expected proportion (e.g., “within 10% of the true proportion”), the formula becomes:

\[ n = \frac{z_{\alpha/2}^{2} \cdot (1 - p)}{p \cdot \epsilon^{2}} \]

where \( \epsilon \) is the relative precision (e.g., 0.10 for 10% relative margin).

Finite Population Correction

\[ n_{\text{adj}} = \frac{n}{1 + \dfrac{n - 1}{N}} \]

Cluster Sampling Adjustment

\[ n_{\text{cluster}} = n \times [1 + (m - 1) \cdot \rho] \]

Response Rate Adjustment

\[ n_{\text{final}} = \frac{n_{\text{adj}}}{R} \]

where \( R \) is the anticipated response rate (e.g., 0.80 for 80%).

Assumptions & Requirements

Simple random sampling: The formula assumes each individual in the population has an equal chance of being selected. Cluster or stratified designs require adjustments.
Binary outcome: Each observation falls into one of two categories (e.g., yes/no, present/absent).
Normal approximation: The formula uses the normal approximation to the binomial distribution. This is generally adequate when \( n \cdot p \geq 5 \) and \( n \cdot (1 - p) \geq 5 \).
Known expected proportion: An estimate of the population proportion must be available. If completely unknown, use p = 0.50 for the most conservative estimate.
Independent observations: Each observation must be independent of others. If observations are clustered (e.g., within households), apply the cluster correction.

Textbook Examples

Medicine

A hospital wants to estimate the prevalence of hypertension among adults in a rural district.

Inputs: Expected prevalence p = 0.30, confidence level = 95%, margin of error = 5%, population = 12,000.
Result: n = 312 (after finite-population correction).
Interpretation: Surveying 312 adults will estimate the true hypertension prevalence within ±5 percentage points, 95% of the time.

Education

A university surveys students to estimate the proportion who use the online tutoring platform.

Inputs: Expected proportion p = 0.50 (unknown, conservative), confidence level = 95%, margin of error = 4%.
Result: n = 601.
Interpretation: With no prior estimate, the maximum-variance assumption (p = 0.50) yields the largest necessary sample. Surveying 601 students ensures the estimate is within ±4 pp.

Engineering

A quality control team needs to estimate the defect percentage on a production line producing 5,000 units per month.

Inputs: Expected defect percentage p = 0.03, confidence level = 99%, margin of error = 2%, population = 5,000.
Result: n = 275 (after finite-population correction).
Interpretation: Inspecting 275 units will estimate the true defect percentage within ±2 pp with 99% confidence.

Social Science

A polling firm estimates the proportion of registered voters supporting a policy initiative.

Inputs: Expected support p = 0.45, confidence level = 95%, margin of error = 3%.
Result: n = 1,056.
Interpretation: A random sample of 1,056 voters will produce an estimate within ±3 pp. This is a typical sample size for national opinion polls.

Agriculture

An extension agency estimates the proportion of farms using organic methods in a district with 800 farms.

Inputs: Expected proportion p = 0.15, confidence level = 95%, margin of error = 5%, population = 800, design effect = 1.5 (cluster sampling).
Result: n = 262 (after FPC and cluster correction).
Interpretation: Visiting 262 farms accounts for the clustering within villages and estimates the organic-farming percentage within ±5 pp.

References

Cochran, W. G. (1977). Sampling Techniques (3rd ed.). John Wiley & Sons., Chapter 3: estimation of proportions and percentages.
Lwanga, S. K., & Lemeshow, S. (1991). Sample Size Determination in Health Studies: A Practical Manual. World Health Organization.
Naing, L., Winn, T., & Rusli, B. N. (2006). Practical issues in calculating the sample size for prevalence studies. Archives of Orofacial Sciences, 1, 9–14.
Daniel, W. W., & Cross, C. L. (2013). Biostatistics: A Foundation for Analysis in the Health Sciences (10th ed.). John Wiley & Sons.
Kish, L. (1965). Survey Sampling. John Wiley & Sons., Design effect and cluster sampling correction.