Help: Chi-square Goodness-of-fit Test

Chi-square Goodness-of-fit Test

Test whether observed frequencies across categories match a set of expected (theoretical) frequencies.

Overview

The goodness-of-fit test compares observed counts in k categories to the counts expected under a specified hypothesis. Common applications include testing whether a die is fair, whether disease cases are equally distributed across seasons, or whether a sample follows a theoretical distribution.

Statulator also provides post-hoc z-tests for each category, with optional Bonferroni correction, to identify which specific categories deviate from expectation.

Worked Example

Scenario: Seasonal Distribution of Injuries

An emergency department recorded injuries across four seasons: Spring = 82, Summer = 112, Autumn = 78, Winter = 128. If injuries were equally distributed, we would expect 100 per season (400/4). Test at α = 0.05.

Using Statulator:

1 Open the Chi-square Goodness-of-fit Test.

2 Enter the observed and expected counts for each category.

3 The result: χ² = 15.68, df = 3, p = 0.0013, significant. The post-hoc tests identify Summer and Winter as the categories that differ from expectation.

\[ \chi^2 = \frac{(82-100)^2}{100} + \frac{(112-100)^2}{100} + \frac{(78-100)^2}{100} + \frac{(128-100)^2}{100} = 3.24 + 1.44 + 4.84 + 7.84 = 17.36 \]

Interpretation Guide

A significant overall χ² tells you the observed distribution deviates from the expected distribution but does not tell you which categories differ. The post-hoc z-tests (with Bonferroni correction) identify the specific categories that contribute most to the deviation.

Yates correction: Optional continuity correction that makes the test more conservative. Most useful when some expected counts are small.

Formula

Chi-square Statistic

\[ \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} \quad;\quad \text{df} = k - 1 \]

With Yates Correction

\[ \chi^2_{\text{Yates}} = \sum_{i=1}^{k} \frac{(|O_i - E_i| - 0.5)^2}{E_i} \]

Post-hoc z-test for Each Category

\[ z_i = \frac{\hat{p}_i - p_{0i}}{\text{SE}_i} \quad;\quad \text{SE}_i = \sqrt{\frac{\hat{p}_i(1-\hat{p}_i)}{N}} \]

where \( \hat{p}_i = O_i/N \) and \( p_{0i} = E_i/N \). With Bonferroni correction, compare p-values to \( \alpha/k \).

Assumptions & Requirements

Independent observations: Each observation falls into exactly one category.
Adequate expected counts: All expected frequencies should be ≥ 5. If some are < 5, consider combining categories.
Fixed total: The total sample size N is fixed in advance.
Pre-specified expected proportions: Expected values must be determined before looking at the data.

Textbook Examples

Social Science

A sociologist tests whether births are equally distributed across the four quarters of the year (n = 400 births).

Data: Q1 = 110, Q2 = 95, Q3 = 88, Q4 = 107. Expected: 100 each.
Result: χ² = 3.14, df = 3, p = 0.37.
Interpretation: No significant departure from a uniform distribution; births appear evenly spread across quarters.

Medicine

A genetics lab checks whether observed blood-type frequencies in 500 donors match the expected population distribution (O: 44%, A: 42%, B: 10%, AB: 4%).

Data: O = 235, A = 195, B = 50, AB = 20. Expected: 220, 210, 50, 20.
Result: χ² = 2.41, df = 3, p = 0.49.
Interpretation: The observed frequencies are consistent with the expected blood-type distribution (p = 0.49).

Engineering

A quality engineer tests whether defects are equally likely across five production shifts (n = 250 defects).

Data: Shift A = 62, B = 43, C = 55, D = 48, E = 42. Expected: 50 each.
Result: χ² = 5.76, df = 4, p = 0.22.
Interpretation: There is no statistically significant difference in defect counts across shifts.

Education

A registrar tests whether student enrolments follow the university's target distribution across four faculties: Arts 30%, Science 25%, Engineering 25%, Business 20%.

Data (n = 600): Arts = 200, Science = 140, Engineering = 135, Business = 125. Expected: 180, 150, 150, 120.
Result: χ² = 5.19, df = 3, p = 0.16.
Interpretation: Enrolments do not significantly deviate from the target distribution (p = 0.16).

References

Agresti, A. (2013). Categorical Data Analysis (3rd ed.). Wiley., Chapter 1.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157–175.
Sharpe, D. (2015). Chi-square test is statistically significant: Now what? Practical Assessment, Research & Evaluation, 20(8), 1–10.

Back to Chi-square Goodness-of-fit Test