A guide to determining the sample size for before-after studies, crossover trials, and other paired designs.
In paired designs, each subject serves as their own control: measurements are taken twice on the same individual (e.g., before and after a treatment) or on matched pairs. Because the within-subject variability is typically smaller than between-subject variability, paired designs are often more efficient than independent-group designs.
This calculator determines the number of pairs (or subjects) needed to detect a specified mean difference in paired observations with a given level of power. The key input is the standard deviation of the differences, not the standard deviation of the raw measurements.
The calculation uses the standardised effect size (Cohen’s d), defined as the expected mean difference divided by the standard deviation of differences.
A nutritionist wants to evaluate whether a 12-week dietary programme leads to weight loss. Based on a similar previous study, the expected mean weight change is −2.5 kg and the standard deviation of individual weight changes is σd = 4.0 kg. The study requires 90% power at a 5% significance level (two-sided).
1 Open the Sample Size Calculator for Comparing Paired Differences.
2 Set Significance Level (α) to 0.05 and Power (1 − β) to 0.90.
3 Enter the Mean of Paired Differences as 2.5 (the absolute value of the expected change).
4 Enter the Standard Deviation of Differences as 4.0.
5 The calculator shows the standardised effect size (Cohen’s d = 2.5 / 4.0 = 0.625) and the required number of pairs. The result should be approximately n = 28 pairs.
Rounding up: n = 27 pairs (with t-distribution adjustment, this increases to approximately 28).
If you only know the standard deviation of raw measurements (σ) and the correlation between pre- and post-measurements (ρ), you can derive:
For example, if σ = 5 kg and ρ = 0.7, then σd = 5 × √(0.6) ≈ 3.87 kg.
| Output | Interpretation |
|---|---|
| Required Number of Pairs | The minimum number of subjects (each measured twice) or matched pairs needed. Each pair provides one difference score. |
| Standardised Effect Size (d) | Cohen’s d for paired designs: the ratio of the expected mean difference to the standard deviation of differences. By convention, d = 0.2 is small, 0.5 is medium, and 0.8 is large. |
| Visualisation | Plots sample size against a range of effect sizes, letting you see how the sample size changes if the true effect is slightly larger or smaller than expected. |
Practical tip: Paired designs reduce the required sample size compared to independent-group designs when within-subject correlation is high. If you have a choice, a paired design with ρ > 0.5 typically needs fewer total observations than an unpaired design.
where the standardised effect size is \( d = \dfrac{\mu_d}{\sigma_d} \)
where:
Iterated from the normal-based starting value until convergence, with \( \nu = n - 1 \) degrees of freedom.
where \( m \) is the average cluster size and \( \rho \) is the intra-cluster correlation coefficient.
A crossover trial measures whether a bronchodilator improves FEV1 (L) compared to placebo in the same asthma patients.
Inputs: Expected mean difference = 0.20 L, SD of differences = 0.35 L, α = 0.05 (two-sided), power = 80%.
Result: n = 26 pairs.
Interpretation: Enrolling 26 patients (each receiving both treatments in sequence) gives 80% power to detect a 0.20 L improvement in FEV1.
A school evaluates whether a summer reading programme improves comprehension scores (pre vs. post).
Inputs: Expected mean gain = 4 points, SD of gains = 8 points, α = 0.05 (two-sided), power = 90%.
Result: n = 44 students.
Interpretation: Testing 44 students before and after the programme provides 90% power to confirm a 4-point mean improvement.
A manufacturer measures battery discharge time (hours) before and after a firmware update on the same devices.
Inputs: Expected mean improvement = 1.2 hours, SD of differences = 2.0 hours, α = 0.05 (two-sided), power = 80%.
Result: n = 24 devices.
Interpretation: Testing 24 devices before and after the update is sufficient to detect a 1.2-hour improvement with 80% power.
A psychologist measures reaction time (ms) before and after a mindfulness intervention.
Inputs: Expected mean reduction = 15 ms, SD of differences = 30 ms, α = 0.05 (two-sided), power = 80%.
Result: n = 34 participants.
Interpretation: Recording reaction times from 34 participants at both sessions provides 80% power for the paired comparison.
Back to Sample Size Calculator for Comparing Paired Differences