Sample Size for Comparing Paired Differences

A guide to determining the sample size for before-after studies, crossover trials, and other paired designs.

Overview

In paired designs, each subject serves as their own control: measurements are taken twice on the same individual (e.g., before and after a treatment) or on matched pairs. Because the within-subject variability is typically smaller than between-subject variability, paired designs are often more efficient than independent-group designs.

This calculator determines the number of pairs (or subjects) needed to detect a specified mean difference in paired observations with a given level of power. The key input is the standard deviation of the differences, not the standard deviation of the raw measurements.

The calculation uses the standardised effect size (Cohen’s d), defined as the expected mean difference divided by the standard deviation of differences.

Worked Example

Scenario: Weight Loss Intervention (Before-After Study)

A nutritionist wants to evaluate whether a 12-week dietary programme leads to weight loss. Based on a similar previous study, the expected mean weight change is −2.5 kg and the standard deviation of individual weight changes is σd = 4.0 kg. The study requires 90% power at a 5% significance level (two-sided).

Using Statulator step-by-step:

1 Open the Sample Size Calculator for Comparing Paired Differences.

2 Set Significance Level (α) to 0.05 and Power (1 − β) to 0.90.

3 Enter the Mean of Paired Differences as 2.5 (the absolute value of the expected change).

4 Enter the Standard Deviation of Differences as 4.0.

5 The calculator shows the standardised effect size (Cohen’s d = 2.5 / 4.0 = 0.625) and the required number of pairs. The result should be approximately n = 28 pairs.

Hand calculation verification:
\[ d = \frac{\mu_d}{\sigma_d} = \frac{2.5}{4.0} = 0.625 \] \[ n = \frac{(z_{\alpha/2} + z_{\beta})^{2}}{d^{2}} = \frac{(1.96 + 1.282)^{2}}{(0.625)^{2}} = \frac{(3.242)^{2}}{0.3906} = \frac{10.51}{0.3906} \approx 26.9 \]

Rounding up: n = 27 pairs (with t-distribution adjustment, this increases to approximately 28).

Tip: Estimating σd

If you only know the standard deviation of raw measurements (σ) and the correlation between pre- and post-measurements (ρ), you can derive:

\[ \sigma_d = \sigma \sqrt{2(1 - \rho)} \]

For example, if σ = 5 kg and ρ = 0.7, then σd = 5 × √(0.6) ≈ 3.87 kg.

Interpretation Guide

OutputInterpretation
Required Number of Pairs The minimum number of subjects (each measured twice) or matched pairs needed. Each pair provides one difference score.
Standardised Effect Size (d) Cohen’s d for paired designs: the ratio of the expected mean difference to the standard deviation of differences. By convention, d = 0.2 is small, 0.5 is medium, and 0.8 is large.
Visualisation Plots sample size against a range of effect sizes, letting you see how the sample size changes if the true effect is slightly larger or smaller than expected.

Practical tip: Paired designs reduce the required sample size compared to independent-group designs when within-subject correlation is high. If you have a choice, a paired design with ρ > 0.5 typically needs fewer total observations than an unpaired design.

Formula

Base Formula
\[ n = \frac{(z_{\alpha/2} + z_{\beta})^{2}}{d^{2}} \]

where the standardised effect size is \( d = \dfrac{\mu_d}{\sigma_d} \)

where:

t-Distribution Adjustment
\[ n_{t} = \frac{(t_{\alpha/2,\, n-1} + t_{\beta,\, n-1})^{2}}{d^{2}} \]

Iterated from the normal-based starting value until convergence, with \( \nu = n - 1 \) degrees of freedom.

Cluster Sampling Adjustment
\[ n_{\text{cluster}} = n \times [1 + (m - 1) \cdot \rho] \]

where \( m \) is the average cluster size and \( \rho \) is the intra-cluster correlation coefficient.

Assumptions & Requirements

Textbook Examples

Medicine

A crossover trial measures whether a bronchodilator improves FEV1 (L) compared to placebo in the same asthma patients.

Inputs: Expected mean difference = 0.20 L, SD of differences = 0.35 L, α = 0.05 (two-sided), power = 80%.
Result: n = 26 pairs.
Interpretation: Enrolling 26 patients (each receiving both treatments in sequence) gives 80% power to detect a 0.20 L improvement in FEV1.

Education

A school evaluates whether a summer reading programme improves comprehension scores (pre vs. post).

Inputs: Expected mean gain = 4 points, SD of gains = 8 points, α = 0.05 (two-sided), power = 90%.
Result: n = 44 students.
Interpretation: Testing 44 students before and after the programme provides 90% power to confirm a 4-point mean improvement.

Engineering

A manufacturer measures battery discharge time (hours) before and after a firmware update on the same devices.

Inputs: Expected mean improvement = 1.2 hours, SD of differences = 2.0 hours, α = 0.05 (two-sided), power = 80%.
Result: n = 24 devices.
Interpretation: Testing 24 devices before and after the update is sufficient to detect a 1.2-hour improvement with 80% power.

Social Science

A psychologist measures reaction time (ms) before and after a mindfulness intervention.

Inputs: Expected mean reduction = 15 ms, SD of differences = 30 ms, α = 0.05 (two-sided), power = 80%.
Result: n = 34 participants.
Interpretation: Recording reaction times from 34 participants at both sessions provides 80% power for the paired comparison.

References

  1. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates., Chapter 2: the t-test for means.
  2. Chow, S.-C., Shao, J., Wang, H., & Lokhnygina, Y. (2018). Sample Size Calculations in Clinical Research (3rd ed.). Chapman & Hall/CRC., Chapter 4: paired designs.
  3. Machin, D., Campbell, M. J., Tan, S. B., & Tan, S. H. (2009). Sample Size Tables for Clinical Studies (3rd ed.). Wiley-Blackwell.
  4. Rosner, B. (2016). Fundamentals of Biostatistics (8th ed.). Cengage Learning., Section 8.5: sample size for paired t-test.