Sample Size for Comparing Paired Proportions

A guide to calculating the sample size for McNemar’s test, comparing two correlated binary outcomes on the same subjects.

Overview

Paired proportions arise when the same subjects are classified on a binary outcome under two conditions. Common scenarios include comparing diagnostic tests on the same patients, assessing a binary outcome before and after an intervention, or evaluating agreement between two raters.

The analysis focuses on discordant pairs, subjects who change category between the two conditions. In a 2×2 table of paired outcomes, the discordant proportions are denoted b (positive→negative) and c (negative→positive). The test statistic is based on McNemar’s test.

Statulator offers two input methods: the marginal approach (specify the two marginal proportions and the correlation between them) or the discordant approach (specify the discordant proportions directly).

The 2×2 Paired Table
Time 2: +Time 2: −Total
Time 1: +abp0
Time 1: −cd1 − p0
Totalp11 − p11

b and c are the discordant proportions; a and d are concordant. Only the discordant pairs carry information about the difference between conditions.

Worked Example

Scenario: Comparing Two Diagnostic Tests

A radiologist wants to compare the sensitivity of two imaging techniques (CT vs. MRI) applied to the same patients for detecting liver lesions. Based on published data, CT detects lesions in p0 = 0.75 of cases and MRI in p1 = 0.85. The correlation between paired results is estimated at ρ = 0.60. The study requires 80% power at α = 0.05 (two-sided).

Using Statulator step-by-step (Marginal Method):

1 Open the Sample Size Calculator for Comparing Paired Proportions.

2 Select the Marginal input method.

3 Set α to 0.05 and Power to 0.80.

4 Enter Proportion at Time 1 (p0) as 0.75 and Proportion at Time 2 (p1) as 0.85.

5 Enter the Correlation (ρ) as 0.60.

6 The calculator computes the discordant proportions and displays the required number of pairs.

Deriving discordant proportions:

From the marginal inputs:

\[ q_0 = 1 - p_0 = 0.25, \quad q_1 = 1 - p_1 = 0.15 \] \[ b = p_0 q_1 - \rho \sqrt{p_0 q_0 p_1 q_1} = 0.75 \times 0.15 - 0.60 \times \sqrt{0.75 \times 0.25 \times 0.85 \times 0.15} \] \[ = 0.1125 - 0.60 \times \sqrt{0.0239} = 0.1125 - 0.60 \times 0.1546 = 0.1125 - 0.0928 = 0.0197 \] \[ c = b + (p_1 - p_0) = 0.0197 + 0.10 = 0.1197 \]
Sample size calculation:
\[ p_{\text{sum}} = b + c = 0.0197 + 0.1197 = 0.1394 \] \[ p_{\text{diff}} = c - b = 0.1197 - 0.0197 = 0.10 \] \[ n = \left(\frac{z_{\alpha/2}\sqrt{p_{\text{sum}}} + z_{\beta}\sqrt{p_{\text{sum}} - p_{\text{diff}}^2}}{p_{\text{diff}}}\right)^{2} \] \[ = \left(\frac{1.96 \times \sqrt{0.1394} + 0.842 \times \sqrt{0.1394 - 0.01}}{0.10}\right)^{2} = \left(\frac{1.96 \times 0.3734 + 0.842 \times 0.3598}{0.10}\right)^{2} \] \[ = \left(\frac{0.7319 + 0.3029}{0.10}\right)^{2} = (10.348)^{2} \approx 107.1 \]

Rounding up: n = 108 pairs.

Discordant Method

If you already know the discordant proportions (e.g., from a pilot study where b = 0.05 and c = 0.15), select the Discordant input method and enter these values directly. This bypasses the need for marginal proportions and correlation.

Interpretation Guide

OutputInterpretation
Number of Pairs The minimum number of subjects, each assessed under both conditions. Every subject provides one paired observation.
Discordant Proportions (b, c) b = proportion who are positive at Time 1 but negative at Time 2; c = proportion who are negative at Time 1 but positive at Time 2. The difference p1 − p0 = c − b.
Continuity Correction Adds 1/|c − b| to the uncorrected sample size. Recommended when discordant proportions are small.

Practical tip: The correlation (ρ) between paired measurements has a large impact on the discordant proportions and hence on sample size. A higher correlation means fewer discordant pairs, which makes it harder to detect a difference (larger n needed). Verify your correlation estimate carefully from prior data.

Formula

McNemar’s Test Sample Size (Connor, 1987)
\[ n = \left(\frac{z_{\alpha/2}\sqrt{p_{\text{sum}}} + z_{\beta}\sqrt{p_{\text{sum}} - p_{\text{diff}}^{2}}}{p_{\text{diff}}}\right)^{2} \]

where:

Deriving b and c from Marginal Proportions
\[ b = p_0(1 - p_1) - \rho\sqrt{p_0(1-p_0) \cdot p_1(1-p_1)} \] \[ c = b + (p_1 - p_0) \]

where \( p_0 \) and \( p_1 \) are the marginal proportions at Time 1 and Time 2, and \( \rho \) is the correlation between paired binary outcomes.

Continuity Correction
\[ n_c = n + \frac{1}{|c - b|} \]
Cluster Sampling Adjustment
\[ n_{\text{cluster}} = n \times [1 + (m - 1) \cdot \rho_{\text{ICC}}] \]

Assumptions & Requirements

Textbook Examples

Medicine

A crossover study compares two diagnostic tests for detecting a cardiac biomarker on the same patients.

Inputs: Discordant proportions b = 0.15, c = 0.05, α = 0.05 (two-sided), power = 80%, continuity correction applied (default).
Result: n = 165 pairs.
Interpretation: Testing 165 patients with both diagnostic methods provides 80% power to detect the difference in sensitivity.

Education

A before-after study tests whether a workshop changes teachers' attitudes toward inclusive education.

Inputs: Discordant proportions b = 0.20, c = 0.08, α = 0.05 (two-sided), power = 90%, continuity correction applied (default).
Result: n = 209 pairs.
Interpretation: Surveying 209 teachers before and after the workshop gives 90% power to detect the shift in attitudes.

Social Science

A panel survey measures whether a media campaign changes public opinion on climate policy (same respondents, two time points).

Inputs: Discordant proportions b = 0.12, c = 0.06, α = 0.05 (two-sided), power = 80%, continuity correction applied (default).
Result: n = 407 pairs.
Interpretation: Re-interviewing 407 respondents will detect the 6 pp net shift in opinion with 80% power.

Medicine

A dermatology trial applies two topical creams to matched lesion sites on the same patients to compare healing rates.

Inputs: Discordant proportions b = 0.25, c = 0.10, α = 0.05 (two-sided), power = 80%, continuity correction applied (default).
Result: n = 127 pairs.
Interpretation: Enrolling 127 patients (each serving as their own control) provides 80% power for the paired comparison.

References

  1. Connor, R. J. (1987). Sample size for testing differences in proportions for the paired-sample design. Biometrics, 43(1), 207–211.
  2. Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical Methods for Rates and Proportions (3rd ed.). John Wiley & Sons., Chapter 9: McNemar’s test.
  3. Lachin, J. M. (1992). Power and sample size evaluation for the McNemar test with application to matched case-control studies. Statistics in Medicine, 11(9), 1239–1251.
  4. Machin, D., Campbell, M. J., Tan, S. B., & Tan, S. H. (2009). Sample Size Tables for Clinical Studies (3rd ed.). Wiley-Blackwell.
  5. Chow, S.-C., Shao, J., Wang, H., & Lokhnygina, Y. (2018). Sample Size Calculations in Clinical Research (3rd ed.). Chapman & Hall/CRC.