A guide to determining the sample size required to detect a meaningful difference between two group means.
This calculator determines the number of subjects needed in each group to detect a specified difference between two independent population means with a given level of statistical power. It is used when planning a two-arm study such as a randomised controlled trial, a cohort study, or any comparison of two independent groups.
The calculation balances four competing factors: the significance level (α), the desired power (1 − β), the expected variability (σ), and the minimum clinically important difference (δ). Making any of these more stringent increases the required sample size.
The calculator supports three hypothesis frameworks: equality (standard two-sided test), non-inferiority/superiority (one-sided with a margin), and equivalence (two one-sided tests, TOST).
An education researcher plans to compare the mean exam score between students taught with a new interactive method versus the traditional lecture method. Based on pilot data, the common standard deviation is σ = 12 points. The researcher considers a difference of 5 points to be educationally meaningful and wants 80% power at a 5% significance level (two-sided) with equal allocation (1:1 ratio).
1 Open the Sample Size Calculator for Comparing Two Independent Means.
2 Under Hypothesis, select Equality (the default).
3 Set Significance Level (α) to 0.05 and Power (1 − β) to 0.80.
4 Enter the Standard Deviation (σ) as 12.
5 Enter the Mean Difference as 5.
6 Keep the Allocation Ratio (r) at 1 (equal groups).
7 The calculator shows the required sample size per group. The result should be approximately n = 91 per group (182 total).
Rounding up: n = 91 per group, 182 total.
If the researcher instead wanted to show that the new method is not worse than the traditional method by more than 3 points (non-inferiority margin δ = −3), with the same parameters and a true mean difference of 0:
Rounding up: n = 198 per group.
| Output | Interpretation |
|---|---|
| Sample Size per Group | The minimum number of subjects needed in each group. With unequal allocation (r ≠ 1), the two groups will have different sizes: Group 1 = n, Group 2 = n × r. |
| Total Sample Size | The combined number across both groups: n × (1 + r). |
| Live Interpretation | A plain-language summary of what the sample size achieves, including the detectable difference, power, and significance level. |
| Visualisation | Shows how sample size changes across a range of effect sizes or standard deviations, helping you assess sensitivity to uncertain inputs. |
Key considerations: If the allocation ratio is not 1:1, the total sample size increases for the same power. For example, a 2:1 ratio needs a larger total n than 1:1. Unequal allocation is sometimes needed for ethical or practical reasons (e.g., giving more participants the active treatment).
where \( \delta_m \) is the non-inferiority or superiority margin, and \( \delta_0 \) is the assumed true difference.
For equivalence testing (two one-sided tests, TOST), the type II error is split between the two one-sided tests, so the critical value is \( z_{\beta/2} \) rather than \( z_{\beta} \).
where:
Replace \( z \)-values with \( t \)-values and iterate:
\[ n_{t} = \frac{(r + 1)}{r} \cdot \frac{(t_{\alpha/2,\,\nu} + t_{\beta,\,\nu})^{2} \cdot \sigma^{2}}{\delta^{2}} \quad \text{where} \quad \nu = (r + 1) \cdot n_{t} - 2 \]A clinical trial tests whether a new drug reduces systolic blood pressure more than a placebo. The clinically meaningful difference is 8 mmHg.
Inputs: Difference = 8 mmHg, SD = 15 mmHg (both groups), α = 0.05 (two-sided), power = 80%, allocation 1:1.
Result: n = 57 per group (114 total).
Interpretation: Enrolling 57 patients per arm provides 80% power to detect an 8 mmHg difference between treatments.
Researchers compare mean exam scores between students using adaptive-learning software and traditional instruction.
Inputs: Difference = 5 points, SD = 12 points, α = 0.05 (two-sided), power = 90%.
Result: n = 122 per group (244 total).
Interpretation: Each group needs 122 students to detect a 5-point improvement with 90% power.
An automotive lab compares fuel efficiency (km/L) between two engine designs.
Inputs: Difference = 1.5 km/L, SD = 2.8 km/L, α = 0.05 (two-sided), power = 80%.
Result: n = 56 per group (112 total).
Interpretation: Testing 56 vehicles with each engine design gives 80% power to detect a 1.5 km/L difference.
A field trial compares mean grain yield (kg/ha) between a new fertilizer and the current standard.
Inputs: Difference = 200 kg/ha, SD = 400 kg/ha, α = 0.05 (two-sided), power = 80%.
Result: n = 64 per group (128 total).
Interpretation: Allocating 64 plots to each treatment provides 80% power to detect a 200 kg/ha yield improvement.
Back to Sample Size Calculator for Comparing Two Independent Means