Compare the means of two independent groups using the pooled (equal variance) or Welch (unequal variance) t-test.
The independent two-sample t-test determines whether the means of two unrelated groups differ significantly. Statulator provides both the pooled t-test (assumes equal variances) and Welch’s t-test (does not assume equal variances). Welch’s test is the recommended default because it performs well regardless of whether variances are equal.
The test supports two-sided, left-sided, and right-sided alternatives, and reports the 95% confidence interval for the mean difference.
A psychologist compares reaction times (ms) between a caffeine group and a placebo group:
Caffeine: n1 = 30, x̄1 = 245, s1 = 32
Placebo: n2 = 28, x̄2 = 268, s2 = 38
1 Open the Two-Sample t-Test calculator.
2 Enter the summary statistics for both groups.
3 Select Two-sided test. Choose Welch or pooled.
4 Result (Welch): t = −2.54, df = 52.3, p = 0.014, 95% CI for difference = (−41.2, −4.8).
The caffeine group has significantly faster reaction times (p = 0.014). The 95% CI for the difference (−41.2 to −4.8 ms) does not cross zero, confirming the significance.
Pooled vs. Welch: If both sample sizes are similar and SDs are close, pooled and Welch give nearly identical results. When in doubt, use Welch. It is never worse than pooled and is more robust when variances differ.
Two-sided: \( p = 2 \cdot P(T \leq -|t|) \)
95% CI: \( (\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2,\,\nu} \cdot \text{SE} \)
A trial compares mean recovery time (days) between patients receiving a new analgesic (n = 35, mean = 4.2, SD = 1.8) and a standard analgesic (n = 35, mean = 5.1, SD = 2.0).
Result (Welch): t = −1.98, df = 67, p = 0.052; mean difference = −0.9 days, 95% CI: (−1.81, 0.01).
Interpretation: Patients on the new analgesic recovered about 0.9 days faster on average, but the difference is borderline and not significant at the 5% level — the 95% CI just crosses zero.
Researchers compare mean reading fluency (words per minute) between two instruction methods: Method A (n = 28, mean = 112, SD = 15) and Method B (n = 32, mean = 104, SD = 18).
Result (Pooled): t = 1.87, df = 58, p = 0.067; mean difference = 8 wpm, 95% CI: (−0.6, 16.6).
Interpretation: The 8 wpm advantage for Method A is not statistically significant at 5% (p = 0.067). The CI includes zero, so the true difference could be negligible.
An engineer compares the tensile strength (MPa) of welds made by two techniques: TIG (n = 20, mean = 310, SD = 22) and MIG (n = 20, mean = 295, SD = 28).
Result (Welch): t = 1.88, df = 36, p = 0.068; mean difference = 15 MPa, 95% CI: (−1.2, 31.2).
Interpretation: The 15 MPa difference is not significant at the 5% level. Given the unequal SDs, Welch's variant is preferred over the pooled test.
A psychologist compares anxiety scores between an intervention group (n = 40, mean = 32, SD = 8) and a control group (n = 40, mean = 38, SD = 9).
Result (Pooled): t = −3.15, df = 78, p = 0.002; mean difference = −6, 95% CI: (−9.8, −2.2).
Interpretation: The intervention group reported significantly lower anxiety (p = 0.002), and the 95% CI for the mean difference excludes zero.