Num × Cat — Group Comparisons

Detailed analysis modal for comparing a numeric variable across the levels of a categorical variable, using parametric and non-parametric tests with effect sizes and assumption checks.

Overview

When you click a Num × Cat cell in the association matrix, Statulator opens a modal with four tabs:

You can download the complete analysis as a PDF report or export the graph as a PNG image using the buttons in the modal footer.

Worked Example

Scenario: Exam Scores by Teaching Method

An educator has data from 300 students across three teaching methods: Lecture, Flipped Classroom, and Problem-Based Learning. The numeric outcome is the final exam score (0–100). The question: does exam performance differ by teaching method?

Steps in Statulator:

1 Load the CSV on the Dataset Analysis page.

2 Click Select Variables, confirm ExamScore is Numeric and Method is Categorical, and save.

3 Click Stat Analysis to generate the matrix.

4 Click the cell at the intersection of ExamScore and Method.

What you will see (3 groups → ANOVA):

The Group Descriptives table shows 100 students per group. Lecture: mean 68.2 (SD 12.1); Flipped: mean 74.5 (SD 11.3); PBL: mean 72.1 (SD 13.0).

One-way ANOVA: F(2, 297) = 7.62, p < 0.001, η² = 0.049 (small effect). Kruskal-Wallis H = 14.1, df = 2, p < 0.001 (confirms the parametric result).

Levene’s test: F(2, 297) = 1.04, p = 0.35 — equal variances assumption is met. Shapiro-Francia per group: all pass (p > 0.05), so the ANOVA assumptions are satisfied.

What changes with 2 groups:

If Method had only two levels (e.g., Lecture vs Flipped), you would instead see Student’s t-test, Welch’s t-test, Mann-Whitney U, Cohen’s d, and the mean difference with a 95 % confidence interval.

Interpretation Guide

Group Descriptives

The descriptive table gives a quick overview of each group’s central tendency and spread. Compare means and medians: if they diverge substantially within a group, that group’s distribution may be skewed.

Parametric Tests (2 groups)

Student’s t-test (pooled): Assumes equal variances in both groups. Use this when Levene’s test is non-significant (p > 0.05).

Welch’s t-test: Does not assume equal variances. Preferred when Levene’s test is significant or group sizes are unequal. In practice, Welch’s test is a safe default.

Parametric Test (3+ groups)

One-way ANOVA: Tests whether at least one group mean differs from the others. A significant result tells you there is a difference somewhere but not which specific groups differ. η² (eta-squared) quantifies the proportion of total variance explained by group membership: < 0.01 negligible, 0.01–0.06 small, 0.06–0.14 medium, > 0.14 large.

Non-Parametric Tests

Mann-Whitney U (2 groups): Compares the rank distributions of two groups. Use when normality is violated or with ordinal data.

Kruskal-Wallis H (3+ groups): Extends the Mann-Whitney concept to three or more groups. Use when ANOVA assumptions are not met.

If the parametric and non-parametric tests agree, you can be confident in the result. If they disagree, prefer the non-parametric result when assumptions are violated.

Effect Sizes

Cohen’s d (2 groups): The standardised mean difference. Benchmarks: 0.2 = small, 0.5 = medium, 0.8 = large (Cohen, 1988).

η² (3+ groups): The proportion of total variance attributed to group membership. Benchmarks: 0.01 = small, 0.06 = medium, 0.14 = large.

Mean Difference & 95 % CI (2 groups)

The confidence interval for the mean difference gives the range of plausible values for the true difference between population means. If the CI excludes zero, the difference is statistically significant at the 5 % level.

Assumption Checks

Levene’s test: Tests whether group variances are equal. A Pass (p > 0.05) supports the equal-variance assumption. A Fail (p ≤ 0.05) means variances differ; use Welch’s t-test (2 groups) or note the ANOVA assumption violation (3+ groups).

Shapiro-Francia per group: Tests normality within each group. Failures suggest the non-parametric alternative may be more appropriate, especially with small samples.

Formulas

Two-Sample t-Test

Pooled (equal variances):

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{1/n_1 + 1/n_2}}, \quad s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}} \]

Welch (unequal variances):

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}, \quad \text{df} = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}} \]
Cohen’s d
\[ d = \frac{\bar{x}_1 - \bar{x}_2}{s_p} \]

Small ≈ 0.2, Medium ≈ 0.5, Large ≈ 0.8 (Cohen, 1988).

One-Way ANOVA
\[ F = \frac{\text{MSB}}{\text{MSW}} = \frac{\text{SS}_B/(k-1)}{\text{SS}_W/(N-k)} \]
  • \(k\) = number of groups;   \(N\) = total observations.
  • \(\eta^2 = \text{SS}_B / \text{SS}_T\) used as effect size.
Eta (η) Correlation Ratio
\[ \eta = \sqrt{\frac{\text{SS}_{\text{between}}}{\text{SS}_{\text{total}}}}, \quad \eta^2 = \frac{\sum_{j} n_j(\bar{x}_j - \bar{x})^2}{\sum_{i}(x_i - \bar{x})^2} \]
  • \(\bar{x}_j\) = mean of group \(j\);   \(n_j\) = group size;   \(\bar{x}\) = grand mean.
Mann-Whitney U Test
\[ U = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1 \]
  • \(R_1\) = sum of ranks in group 1.
  • Z-approximation: \(Z = \frac{U - n_1 n_2/2}{\sqrt{n_1 n_2(n_1+n_2+1)/12}}\).
Kruskal-Wallis H Test
\[ H = \frac{12}{N(N+1)}\sum_{j=1}^{k}\frac{R_j^2}{n_j} - 3(N+1) \]
  • \(R_j\) = sum of ranks in group \(j\);   compared to \(\chi^2\) distribution with \(k-1\) df.
Levene’s Test (Median-Based)
\[ W = \frac{(N-k)\sum_{j} n_j(\bar{z}_{j\cdot} - \bar{z}_{\cdot\cdot})^2}{(k-1)\sum_{j}\sum_{i}(z_{ij} - \bar{z}_{j\cdot})^2} \]
  • \(z_{ij} = |x_{ij} - \widetilde{x}_j|\) where \(\widetilde{x}_j\) is the group median. Compared to \(F_{k-1,\,N-k}\).

Assumptions & Requirements

Textbook Examples

Medicine

Blood glucose levels (mg/dL) compared between Diabetic and Non-Diabetic patients (n = 600).

Results: Welch’s t = 12.3, p < 0.001, Cohen’s d = 1.42 (large). Mean difference: 45.2 mg/dL (95% CI: 38.0, 52.4). Levene’s p = 0.003 (unequal variances → Welch preferred). Mann-Whitney U confirms significance.
Interpretation: Diabetic patients have substantially higher blood glucose levels with a large effect size. Unequal variances support using the Welch variant.

Education

Test scores across 4 school types: Public, Private, Charter, Magnet (n = 800).

Results: ANOVA F(3, 796) = 11.2, p < 0.001, η² = 0.040. Kruskal-Wallis H = 32.1, p < 0.001. Levene’s p = 0.42 (equal variances). Shapiro-Francia: all groups pass.
Interpretation: Significant differences exist across school types, but the effect size is small (η² = 0.04). Both parametric and non-parametric tests agree. All assumptions are met.

Agriculture

Crop yield (tonnes/ha) by Fertilizer Type (Organic vs Chemical) across 150 plots.

Results: Student’s t = 2.95, p = 0.004, d = 0.52 (medium). Mean difference: 1.8 tonnes/ha (95% CI: 0.6, 3.0). Levene’s p = 0.18, Shapiro-Francia: both pass.
Interpretation: Chemical fertilizer produces significantly higher yields with a medium effect size. All assumptions are satisfied, so the pooled t-test is appropriate.

Social Science

Monthly income ($) across 3 employment sectors: Government, Private, Self-Employed (n = 1,500).

Results: ANOVA F(2, 1497) = 28.4, p < 0.001, η² = 0.037. Kruskal-Wallis H = 55.8, p < 0.001. Shapiro-Francia: Self-Employed group fails (right-skewed income).
Interpretation: Income differs significantly across sectors. The Shapiro-Francia failure in the Self-Employed group (income is typically right-skewed) makes the Kruskal-Wallis result more reliable here. The effect size is small.

References

  1. Student (1908). The probable error of a mean. Biometrika, 6(1), 1–25.
  2. Welch, B. L. (1947). The generalization of Student’s problem when several different population variances are involved. Biometrika, 34(1/2), 28–35.
  3. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum.
  4. Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18(1), 50–60.
  5. Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583–621.
  6. Levene, H. (1960). Robust tests for equality of variances. In I. Olkin (Ed.), Contributions to Probability and Statistics (pp. 278–292). Stanford University Press.
  7. Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver & Boyd.