A guide to determining the sample size for studies comparing success rates, response rates, or event proportions between two groups.
This calculator determines the number of subjects needed per group to detect a specified difference between two population proportions. It is applicable to randomised controlled trials, cohort studies, and any study comparing binary outcomes (e.g., response/no response, event/no event) between two independent groups.
The calculator supports three hypothesis frameworks: equality (is there a difference?), non-inferiority/superiority (is the new treatment not worse/better by a specified margin?), and equivalence (are the two treatments sufficiently similar?).
An optional continuity correction can be applied to account for the discrete nature of proportions when using the normal approximation. Unequal allocation ratios are also supported.
A clinical researcher is planning a randomised trial comparing a new drug to the standard treatment for a skin condition. The standard treatment has a known response rate of p0 = 0.60 (60%). The new drug is expected to achieve p1 = 0.75 (75%). The study requires 80% power at a 5% two-sided significance level with equal allocation.
1 Open the Sample Size Calculator for Comparing Two Independent Proportions.
2 Select Hypothesis: Equality.
3 Set Significance Level (α) to 0.05 and Power to 0.80.
4 Enter Proportion in Group 1 (p0) as 0.60.
5 Enter Proportion in Group 2 (p1) as 0.75.
6 Keep Allocation Ratio at 1.
7 The result should be n = 150 per group (300 total) without continuity correction, or n = 163 per group (326 total) with continuity correction (the default).
Rounding up: n = 150 per group, matching Statulator’s output when the continuity correction is turned off.
The continuity correction compensates for using a continuous (normal) distribution to approximate a discrete (binomial) distribution. It slightly increases the sample size and is recommended for small to moderate sample sizes. The corrected sample size is:
| Output | Interpretation |
|---|---|
| Sample Size per Group | The number of subjects needed in each arm. With unequal allocation (r ≠ 1): Group 1 = n, Group 2 = n × r. |
| With/Without Continuity Correction | The corrected value is always equal to or larger than the uncorrected value. The correction is most important when the expected difference is small. |
| Non-Inferiority Margin | In non-inferiority designs, this is the largest acceptable amount by which the new treatment can be worse than the standard. A negative margin (e.g., −0.10) means the new treatment can be up to 10 percentage points worse and still be considered non-inferior. |
| Equivalence Margin | In equivalence designs, both treatments must be within ±δ of each other. The margin must be larger than the assumed true difference. |
For equivalence testing (two one-sided tests, TOST), the type II error is split between the two one-sided tests, so the critical value is \( z_{\beta/2} \) rather than \( z_{\beta} \).
where:
An RCT compares a new antibiotic (expected cure rate 85%) to the standard treatment (cure rate 70%).
Inputs: p1 = 0.85, p2 = 0.70, α = 0.05 (two-sided), power = 80%, allocation ratio 1:1, continuity correction applied (default).
Result: n = 131 per group (262 total).
Interpretation: Enrolling 131 patients per arm gives 80% power to detect the 15 pp difference at the 5% significance level.
Researchers test whether a new teaching method increases the pass rate from 60% to 75%.
Inputs: p1 = 0.75, p2 = 0.60, α = 0.05 (two-sided), power = 90%, continuity correction applied (default).
Result: n = 213 per group (426 total).
Interpretation: Each classroom group needs 213 students for 90% power to detect the 15 pp improvement.
A government agency compares voter turnout between two outreach strategies: SMS reminders vs. no contact.
Inputs: p1 = 0.55, p2 = 0.48, α = 0.05 (two-sided), power = 80%, continuity correction applied (default).
Result: n = 825 per group (1,650 total).
Interpretation: The small expected difference (7 pp) requires a large sample. Each group needs 825 participants for adequate power.
A factory tests whether a new assembly process reduces the defect rate from 8% to 4%.
Inputs: p1 = 0.04, p2 = 0.08, α = 0.05 (two-sided), power = 80%, continuity correction applied (default).
Result: n = 599 per group (1,198 total).
Interpretation: Each production batch needs 599 inspected units to detect the halving of the defect percentage with 80% power.
Back to Sample Size Calculator for Comparing Two Independent Proportions