Detailed analysis modal for the association between two categorical variables, including the chi-square test of independence, effect sizes, and optional exact tests for small samples.
When you click a Cat × Cat cell in the association matrix, Statulator opens a modal with four tabs:
You can download the complete analysis as a PDF report or export the graph as a PNG image using the buttons in the modal footer.
A clinical trial has 400 patients assigned to Drug A, Drug B, or Placebo. The outcome is recorded as Improved or Not Improved. The researcher wants to know whether improvement rates differ across treatment groups.
1 Load the CSV on the Dataset Analysis page.
2 Click Select Variables, confirm both DrugGroup and Outcome are detected as Categorical, and save.
3 Click Stat Analysis to generate the matrix.
4 Click the cell at the intersection of DrugGroup and Outcome.
The Observed Frequencies table shows a 3 × 2 contingency table with row totals (per drug group) and column totals (Improved / Not Improved). The Expected Frequencies table shows what counts you would expect if treatment and outcome were independent.
The chi-square statistic is χ² = 18.5, df = 2, p < 0.001 — a highly significant association. Cramér’s V = 0.22 indicates a small-to-moderate effect size. The assumption check confirms all expected cell counts are above 5.
Since this is a 3 × 2 table (not 2 × 2), Fisher’s exact test and odds ratio are not shown — these appear only for 2 × 2 tables.
The observed frequency table shows the actual count of observations in each combination of categories. The expected frequency table shows the counts you would see if the two variables were completely independent. Large discrepancies between observed and expected counts drive a large chi-square statistic.
This tests the null hypothesis that the two variables are independent. A small p-value (typically < 0.05) leads to rejecting independence and concluding that the variables are associated. The chi-square statistic itself is not directly interpretable as an effect size — use Cramér’s V for that.
A standardised effect size ranging from 0 (no association) to 1 (perfect association). Common benchmarks: V < 0.1 = negligible, 0.1–0.3 = small, 0.3–0.5 = medium, > 0.5 = large. These benchmarks depend on the table dimensions; for large tables, even moderate V values may be substantively meaningful.
When the contingency table is exactly 2 × 2, Statulator also provides Fisher’s exact test. This is preferred over chi-square when expected cell counts are small (below 5). The exact p-value does not rely on the chi-square approximation.
For 2 × 2 tables, the odds ratio (OR) compares the odds of the outcome between the two groups, while the relative risk (RR) compares probabilities. Both are shown with 95 % confidence intervals. An OR or RR of 1.0 indicates no association; values above 1 indicate higher odds/risk in the first group.
The chi-square approximation is reliable when all expected cell counts are at least 5. If any expected count falls below 5, it is highlighted in red in the expected frequency table, and the assumption check will show a warning. In this case, Fisher’s exact test (for 2 × 2 tables) is recommended, or consider collapsing sparse categories.
95 % CI for OR via log-transform:
\[ \ln(\text{OR}) \pm 1.96\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} \]Two-sided p-value is the sum of all table probabilities ≤ the observed probability.
Association between Smoking Status (Never, Former, Current) and Lung Cancer (Yes, No) in 1,200 hospital patients.
Results: χ² = 42.3, df = 2, p < 0.001, Cramér's V = 0.19. Expected counts all above 5.
Interpretation: A significant association exists between smoking status and lung cancer, though V = 0.19 indicates a small effect. Current smokers had the highest observed-to-expected ratio.
Relationship between Teaching Method (Lecture, Active Learning) and Pass/Fail outcome in 300 students.
Results: χ² = 6.4, df = 1, p = 0.011, OR = 1.85 [1.15, 2.98], Fisher's exact p = 0.013.
Interpretation: Active learning students had 85 % higher odds of passing. Both chi-square and Fisher’s exact test agree. The 2 × 2 table allows odds ratio and relative risk reporting.
Association between Soil Type (Clay, Loam, Sand) and Disease Presence (Diseased, Healthy) across 500 plants.
Results: χ² = 28.1, df = 2, p < 0.001, V = 0.24.
Interpretation: Disease incidence differs significantly across soil types. Sand soil had a higher-than-expected disease rate, suggesting soil drainage may play a role.
Gender (Male, Female) vs preferred news source (TV, Online, Print, Radio) in 2,000 survey respondents.
Results: χ² = 15.7, df = 3, p = 0.001, V = 0.09. One cell (Female × Radio) had expected count 4.2.
Interpretation: A statistically significant but very small association. The low expected count in one cell is flagged; for this 2 × 4 table, the chi-square is still reasonably valid as only one cell falls below 5.