Sample Size for Estimating a Single Mean

A guide to calculating the minimum sample size needed to estimate a population mean with a desired level of precision.

Overview

When planning a study that aims to estimate the average value of a quantity in a population (e.g., mean blood pressure, average income, mean reaction time), you need to determine how many observations are required so that your estimate is close enough to the true population mean.

This calculator computes the minimum sample size needed to construct a confidence interval for a single mean with a specified level of precision (margin of error). The calculation depends on three key inputs: the desired confidence level, the expected variability in the population (standard deviation), and the acceptable precision.

A larger sample size is required when you want higher confidence, when the population is more variable, or when you need a narrower margin of error.

Worked Example

Scenario: Estimating Average Systolic Blood Pressure

A researcher wants to estimate the mean systolic blood pressure (SBP) of adults aged 40–60 in a community. Previous studies suggest the standard deviation of SBP in this age group is approximately σ = 18 mmHg. The researcher wants the estimate to be within ±3 mmHg of the true mean with 95% confidence.

Using Statulator step-by-step:

1 Open the Sample Size Calculator for Estimating a Single Mean.

2 Set Confidence Level to 95% (the default).

3 Enter the Standard Deviation (σ) as 18.

4 Enter the Precision (margin of error) as 3.

5 The calculator instantly shows the required sample size. The result should be n = 139.

Hand calculation verification:

Using the formula with \( z_{0.025} = 1.96 \), \( \sigma = 18 \), and \( d = 3 \):

\[ n = \frac{z_{\alpha/2}^{2} \cdot \sigma^{2}}{d^{2}} = \frac{(1.96)^{2} \times (18)^{2}}{(3)^{2}} = \frac{3.8416 \times 324}{9} = \frac{1244.68}{9} \approx 138.3 \]

Rounding up: n = 139

Why does Statulator report a slightly larger sample size? By default, Statulator uses the t-distribution rather than the z-distribution because the population standard deviation is rarely known exactly. The t-distribution is more accurate for this situation but yields a slightly larger sample size than the manual z-based calculation shown above. To reproduce the manual z-based number, open the Adjust panel and deselect Adjust for t-distribution. Keeping the t-distribution adjustment on (the default) is recommended.

Optional Adjustments

Finite population correction: If the target population is small (e.g., N = 500), click Adjustments and enter the population size. The corrected sample size will be smaller. For our example with N = 500:

\[ n_{\text{adj}} = \frac{n}{1 + \frac{n - 1}{N}} = \frac{139}{1 + \frac{138}{500}} = \frac{139}{1.276} \approx 109 \]

Clustering: If the study uses cluster sampling (e.g., households), you can adjust using an intra-cluster correlation coefficient (ICC) or a design effect (DEFF).

Response rate: If you expect not all sampled subjects to respond (e.g., 60% response rate in a survey), open Adjust and tick Adjust for Response Rate. Statulator will inflate the required sample size to compensate. For our example with an anticipated response rate of 0.60:

\[ n_{\text{adj}} = \frac{n}{r} = \frac{139}{0.60} \approx 232 \]

Note: with the default t-distribution adjustment also enabled, Statulator starts from the slightly larger t-based sample size (≈ 142) rather than the z-based 139 used in the formula above, so the calculator will report 236. Deselect Adjust for t-distribution in the Adjust panel to reproduce the 232 in the formula exactly.

Interpretation Guide

The calculator produces several outputs that help you understand your sample size requirement:

OutputInterpretation
Required Sample Size (n) The minimum number of observations you need to collect. This is always rounded up to the next whole number because you cannot collect a fraction of an observation.
Live Interpretation A plain-language sentence that summarises the result. For example: “A sample of 139 subjects is needed to estimate the mean within ±3 units with 95% confidence, assuming a standard deviation of 18.”
Visualisation tab Shows how sample size changes as the standard deviation varies, for three different precision levels. Use this to understand sensitivity: if your σ estimate is uncertain, you can see the range of sample sizes you might need.
Tabulate tab Generates a table of sample sizes across a grid of standard deviation and precision values, useful for protocol development and grant applications.

Practical tip: The standard deviation (σ) is often the most uncertain input. If prior data are limited, consider using the upper end of plausible σ values to ensure your study is adequately powered. You can also use the Visualisation tab to assess how sensitive the sample size is to changes in σ.

Formula

Base Formula
\[ n = \frac{z_{\alpha/2}^{2} \cdot \sigma^{2}}{d^{2}} \]

where:

The sample size is always rounded up to the nearest integer (\( \lceil n \rceil \)).

t-Distribution Adjustment

The base formula uses the normal distribution. When sample sizes are small, the t-distribution provides more accurate critical values but depends on degrees of freedom \( (\nu = n - 1) \), which in turn depends on n. Statulator uses an iterative approach:

\[ n_{t} = \frac{t_{\alpha/2,\, n-1}^{2} \cdot \sigma^{2}}{d^{2}} \]

Starting from the normal-based \( n \), the calculator substitutes \( t_{\alpha/2,\, n-1} \) and recomputes until the sample size converges.

Finite Population Correction (FPC)

When the population size \( N \) is known and finite, the required sample size is reduced:

\[ n_{\text{adj}} = \frac{n}{1 + \dfrac{n - 1}{N}} \]

where \( N \) is the total population size. This correction is meaningful when \( n \) is more than about 5% of \( N \).

Cluster Sampling Adjustment

When observations are sampled in clusters (e.g., schools, clinics), the effective sample size is reduced. The adjustment uses either a design effect (DEFF) or the intra-cluster correlation coefficient (ICC):

Using DEFF:

\[ n_{\text{cluster}} = n \times \text{DEFF} \]

Using ICC (\( \rho \)) and average cluster size (\( m \)):

\[ \text{DEFF} = 1 + (m - 1) \cdot \rho \] \[ n_{\text{cluster}} = n \times [1 + (m - 1) \cdot \rho] \]
Response Rate Adjustment

When non-response is expected, the required sample size is inflated by dividing by the anticipated response rate \( r \) (a value between 0 and 1):

\[ n_{\text{adj}} = \frac{n}{r} \]

For example, if 139 respondents are needed and the anticipated response rate is 0.60, the study should approach \( \lceil 139 / 0.60 \rceil = 232 \) subjects.

Assumptions & Requirements

Textbook Examples

Medicine

A nutritionist estimates the mean daily sodium intake (mg) of adults in a coastal city.

Inputs: Expected SD = 480 mg, confidence level = 95%, desired precision (margin of error) = 50 mg.
Result: n = 354.
Interpretation: Measuring sodium intake in 354 adults will estimate the population mean within ±50 mg with 95% confidence.

Engineering

A civil engineer estimates the mean compressive strength (MPa) of concrete cylinders from a batch.

Inputs: Expected SD = 3.5 MPa, confidence level = 99%, desired precision = 1 MPa, population = 500.
Result: n = 77 (after finite-population correction).
Interpretation: Testing 77 cylinders from the batch of 500 will estimate mean strength within ±1 MPa with 99% confidence.

Education

A school board estimates the mean reading score of Year 4 students across the district.

Inputs: Expected SD = 12 points, confidence level = 95%, desired precision = 2 points, design effect = 1.8 (schools as clusters).
Result: n = 250 (after cluster correction).
Interpretation: Sampling 250 students (accounting for clustering within schools) yields a mean estimate within ±2 points.

Agriculture

An agronomist estimates the mean grain yield (kg/ha) across a region's wheat farms.

Inputs: Expected SD = 350 kg/ha, confidence level = 95%, desired precision = 40 kg/ha.
Result: n = 295.
Interpretation: Measuring yield on 295 farms will estimate the regional mean within ±40 kg/ha.

Social Science

A labour economist estimates the mean hourly wage of gig-economy workers in a metropolitan area.

Inputs: Expected SD = $8.00, confidence level = 95%, desired precision = $1.00.
Result: n = 246.
Interpretation: Surveying 246 workers will estimate the mean hourly wage within ±$1.00 with 95% confidence.

References

  1. Cochran, W. G. (1977). Sampling Techniques (3rd ed.). John Wiley & Sons., Chapter 4: estimation of sample size for means.
  2. Daniel, W. W., & Cross, C. L. (2013). Biostatistics: A Foundation for Analysis in the Health Sciences (10th ed.). John Wiley & Sons., Section 7.4: determining sample size.
  3. Lwanga, S. K., & Lemeshow, S. (1991). Sample Size Determination in Health Studies: A Practical Manual. World Health Organization.
  4. Naing, L., Winn, T., & Rusli, B. N. (2006). Practical issues in calculating the sample size for prevalence studies. Archives of Orofacial Sciences, 1, 9–14.
  5. Kish, L. (1965). Survey Sampling. John Wiley & Sons. — Design effect and cluster sampling correction.