Help: Sample Size for Comparing Two Independent Proportions

Sample Size for Comparing Two Independent Proportions

A guide to determining the sample size for studies comparing success rates, response rates, or event proportions between two groups.

Overview

This calculator determines the number of subjects needed per group to detect a specified difference between two population proportions. It is applicable to randomised controlled trials, cohort studies, and any study comparing binary outcomes (e.g., response/no response, event/no event) between two independent groups.

The calculator supports three hypothesis frameworks: equality (is there a difference?), non-inferiority/superiority (is the new treatment not worse/better by a specified margin?), and equivalence (are the two treatments sufficiently similar?).

An optional continuity correction can be applied to account for the discrete nature of proportions when using the normal approximation. Unequal allocation ratios are also supported.

Worked Example

Scenario: Comparing Treatment Response Rates in a Clinical Trial

A clinical researcher is planning a randomised trial comparing a new drug to the standard treatment for a skin condition. The standard treatment has a known response rate of p₀ = 0.60 (60%). The new drug is expected to achieve p₁ = 0.75 (75%). The study requires 80% power at a 5% two-sided significance level with equal allocation.

Using Statulator step-by-step:

1 Open the Sample Size Calculator for Comparing Two Independent Proportions.

2 Select Hypothesis: Equality.

3 Set Significance Level (α) to 0.05 and Power to 0.80.

4 Enter Proportion in Group 1 (p₀) as 0.60.

5 Enter Proportion in Group 2 (p₁) as 0.75.

6 Keep Allocation Ratio at 1.

7 The result should be n = 150 per group (300 total) without continuity correction, or n = 163 per group (326 total) with continuity correction (the default).

Hand calculation verification (without continuity correction):

\[ n = \frac{(z_{\alpha/2} + z_{\beta})^{2} \left[\frac{p_0(1-p_0)}{r} + p_1(1-p_1)\right]}{(p_1 - p_0)^{2}} \] \[ = \frac{(1.96 + 0.842)^{2} \times [0.60 \times 0.40 + 0.75 \times 0.25]}{(0.15)^{2}} = \frac{7.849 \times [0.24 + 0.1875]}{0.0225} = \frac{7.849 \times 0.4275}{0.0225} \approx 149.1 \]

Rounding up: n = 150 per group, matching Statulator’s output when the continuity correction is turned off.

Continuity Correction

The continuity correction compensates for using a continuous (normal) distribution to approximate a discrete (binomial) distribution. It slightly increases the sample size and is recommended for small to moderate sample sizes. The corrected sample size is:

\[ n_c = \frac{n}{4} \left(1 + \sqrt{1 + \frac{2(r+1)}{n \cdot r \cdot |p_1 - p_0|}}\right)^{2} \]

Interpretation Guide

Output	Interpretation
Sample Size per Group	The number of subjects needed in each arm. With unequal allocation (r ≠ 1): Group 1 = n, Group 2 = n × r.
With/Without Continuity Correction	The corrected value is always equal to or larger than the uncorrected value. The correction is most important when the expected difference is small.
Non-Inferiority Margin	In non-inferiority designs, this is the largest acceptable amount by which the new treatment can be worse than the standard. A negative margin (e.g., −0.10) means the new treatment can be up to 10 percentage points worse and still be considered non-inferior.
Equivalence Margin	In equivalence designs, both treatments must be within ±δ of each other. The margin must be larger than the assumed true difference.

Formula

Equality (Two-Sided Test)

\[ n = \frac{(z_{\alpha/2} + z_{\beta})^{2} \left[\dfrac{p_0(1-p_0)}{r} + p_1(1-p_1)\right]}{(p_1 - p_0)^{2}} \]

Non-Inferiority / Superiority (One-Sided)

\[ n = \frac{(z_{\alpha} + z_{\beta})^{2} \left[\dfrac{p_0(1-p_0)}{r} + p_1(1-p_1)\right]}{(p_1 - p_0 - \delta_m)^{2}} \]

Equivalence (TOST)

\[ n = \frac{(z_{\alpha} + z_{\beta/2})^{2} \left[\dfrac{p_0(1-p_0)}{r} + p_1(1-p_1)\right]}{(\delta_m - |p_1 - p_0|)^{2}} \]

For equivalence testing (two one-sided tests, TOST), the type II error is split between the two one-sided tests, so the critical value is \( z_{\beta/2} \) rather than \( z_{\beta} \).

where:

\( p_0 \) = proportion in the control/reference group
\( p_1 \) = proportion in the treatment/experimental group
\( r \) = allocation ratio (Group 2 / Group 1)
\( \delta_m \) = non-inferiority, superiority, or equivalence margin
\( z_{\alpha/2}, z_{\alpha} \) = z-critical values for two-sided and one-sided tests
\( z_{\beta} \) = z-critical value for power

Continuity Correction (Fleiss et al.)

\[ n_c = \frac{n}{4} \left(1 + \sqrt{1 + \frac{2(r + 1)}{n \cdot r \cdot |p_1 - p_0|}}\right)^{2} \]

Cluster Sampling Adjustment

\[ n_{\text{cluster}} = n \times [1 + (m - 1) \cdot \rho] \]

Assumptions & Requirements

Independent groups: Subjects in the two groups are sampled independently. For paired binary data (e.g., same subjects measured twice), use the Paired Proportions calculator.
Binary outcome: Each subject has a binary outcome (success/failure, event/no event).
Normal approximation: The formula uses the normal approximation to the binomial distribution. Apply the continuity correction when the expected proportions or differences are small.
Known expected proportions: Both p₀ and p₁ must be specified. These are typically estimated from prior studies or clinical judgement.
Fixed allocation ratio: The ratio of group sizes is determined before the study begins and does not change.

Textbook Examples

Medicine

An RCT compares a new antibiotic (expected cure rate 85%) to the standard treatment (cure rate 70%).

Inputs: p₁ = 0.85, p₂ = 0.70, α = 0.05 (two-sided), power = 80%, allocation ratio 1:1, continuity correction applied (default).
Result: n = 131 per group (262 total).
Interpretation: Enrolling 131 patients per arm gives 80% power to detect the 15 pp difference at the 5% significance level.

Education

Researchers test whether a new teaching method increases the pass rate from 60% to 75%.

Inputs: p₁ = 0.75, p₂ = 0.60, α = 0.05 (two-sided), power = 90%, continuity correction applied (default).
Result: n = 213 per group (426 total).
Interpretation: Each classroom group needs 213 students for 90% power to detect the 15 pp improvement.

Social Science

A government agency compares voter turnout between two outreach strategies: SMS reminders vs. no contact.

Inputs: p₁ = 0.55, p₂ = 0.48, α = 0.05 (two-sided), power = 80%, continuity correction applied (default).
Result: n = 825 per group (1,650 total).
Interpretation: The small expected difference (7 pp) requires a large sample. Each group needs 825 participants for adequate power.

Engineering

A factory tests whether a new assembly process reduces the defect rate from 8% to 4%.

Inputs: p₁ = 0.04, p₂ = 0.08, α = 0.05 (two-sided), power = 80%, continuity correction applied (default).
Result: n = 599 per group (1,198 total).
Interpretation: Each production batch needs 599 inspected units to detect the halving of the defect percentage with 80% power.

References

Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical Methods for Rates and Proportions (3rd ed.). John Wiley & Sons., Chapter 8: sample size for comparing two proportions.
Chow, S.-C., Shao, J., Wang, H., & Lokhnygina, Y. (2018). Sample Size Calculations in Clinical Research (3rd ed.). Chapman & Hall/CRC., Chapters 3–4.
Julious, S. A. (2010). Sample Sizes for Clinical Trials. Chapman & Hall/CRC., Non-inferiority and equivalence designs for proportions.
Casagrande, J. T., Pike, M. C., & Smith, P. G. (1978). An improved approximate formula for calculating sample sizes for comparing two binomial distributions. Biometrics, 34(3), 483–486.
Kish, L. (1965). Survey Sampling. John Wiley & Sons.

Back to Sample Size Calculator for Comparing Two Independent Proportions