Detailed analysis modal for comparing a numeric variable across the levels of a categorical variable, using parametric and non-parametric tests with effect sizes and assumption checks.
When you click a Num × Cat cell in the association matrix, Statulator opens a modal with four tabs:
You can download the complete analysis as a PDF report or export the graph as a PNG image using the buttons in the modal footer.
An educator has data from 300 students across three teaching methods: Lecture, Flipped Classroom, and Problem-Based Learning. The numeric outcome is the final exam score (0–100). The question: does exam performance differ by teaching method?
1 Load the CSV on the Dataset Analysis page.
2 Click Select Variables, confirm ExamScore is Numeric and Method is Categorical, and save.
3 Click Stat Analysis to generate the matrix.
4 Click the cell at the intersection of ExamScore and Method.
The Group Descriptives table shows 100 students per group. Lecture: mean 68.2 (SD 12.1); Flipped: mean 74.5 (SD 11.3); PBL: mean 72.1 (SD 13.0).
One-way ANOVA: F(2, 297) = 7.62, p < 0.001, η² = 0.049 (small effect). Kruskal-Wallis H = 14.1, df = 2, p < 0.001 (confirms the parametric result).
Levene’s test: F(2, 297) = 1.04, p = 0.35 — equal variances assumption is met. Shapiro-Francia per group: all pass (p > 0.05), so the ANOVA assumptions are satisfied.
If Method had only two levels (e.g., Lecture vs Flipped), you would instead see Student’s t-test, Welch’s t-test, Mann-Whitney U, Cohen’s d, and the mean difference with a 95 % confidence interval.
The descriptive table gives a quick overview of each group’s central tendency and spread. Compare means and medians: if they diverge substantially within a group, that group’s distribution may be skewed.
Student’s t-test (pooled): Assumes equal variances in both groups. Use this when Levene’s test is non-significant (p > 0.05).
Welch’s t-test: Does not assume equal variances. Preferred when Levene’s test is significant or group sizes are unequal. In practice, Welch’s test is a safe default.
One-way ANOVA: Tests whether at least one group mean differs from the others. A significant result tells you there is a difference somewhere but not which specific groups differ. η² (eta-squared) quantifies the proportion of total variance explained by group membership: < 0.01 negligible, 0.01–0.06 small, 0.06–0.14 medium, > 0.14 large.
Mann-Whitney U (2 groups): Compares the rank distributions of two groups. Use when normality is violated or with ordinal data.
Kruskal-Wallis H (3+ groups): Extends the Mann-Whitney concept to three or more groups. Use when ANOVA assumptions are not met.
If the parametric and non-parametric tests agree, you can be confident in the result. If they disagree, prefer the non-parametric result when assumptions are violated.
Cohen’s d (2 groups): The standardised mean difference. Benchmarks: 0.2 = small, 0.5 = medium, 0.8 = large (Cohen, 1988).
η² (3+ groups): The proportion of total variance attributed to group membership. Benchmarks: 0.01 = small, 0.06 = medium, 0.14 = large.
The confidence interval for the mean difference gives the range of plausible values for the true difference between population means. If the CI excludes zero, the difference is statistically significant at the 5 % level.
Levene’s test: Tests whether group variances are equal. A Pass (p > 0.05) supports the equal-variance assumption. A Fail (p ≤ 0.05) means variances differ; use Welch’s t-test (2 groups) or note the ANOVA assumption violation (3+ groups).
Shapiro-Francia per group: Tests normality within each group. Failures suggest the non-parametric alternative may be more appropriate, especially with small samples.
Pooled (equal variances):
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{1/n_1 + 1/n_2}}, \quad s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}} \]Welch (unequal variances):
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}, \quad \text{df} = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}} \]Small ≈ 0.2, Medium ≈ 0.5, Large ≈ 0.8 (Cohen, 1988).
Blood glucose levels (mg/dL) compared between Diabetic and Non-Diabetic patients (n = 600).
Results: Welch’s t = 12.3, p < 0.001, Cohen’s d = 1.42 (large). Mean difference: 45.2 mg/dL (95% CI: 38.0, 52.4). Levene’s p = 0.003 (unequal variances → Welch preferred). Mann-Whitney U confirms significance.
Interpretation: Diabetic patients have substantially higher blood glucose levels with a large effect size. Unequal variances support using the Welch variant.
Test scores across 4 school types: Public, Private, Charter, Magnet (n = 800).
Results: ANOVA F(3, 796) = 11.2, p < 0.001, η² = 0.040. Kruskal-Wallis H = 32.1, p < 0.001. Levene’s p = 0.42 (equal variances). Shapiro-Francia: all groups pass.
Interpretation: Significant differences exist across school types, but the effect size is small (η² = 0.04). Both parametric and non-parametric tests agree. All assumptions are met.
Crop yield (tonnes/ha) by Fertilizer Type (Organic vs Chemical) across 150 plots.
Results: Student’s t = 2.95, p = 0.004, d = 0.52 (medium). Mean difference: 1.8 tonnes/ha (95% CI: 0.6, 3.0). Levene’s p = 0.18, Shapiro-Francia: both pass.
Interpretation: Chemical fertilizer produces significantly higher yields with a medium effect size. All assumptions are satisfied, so the pooled t-test is appropriate.
Monthly income ($) across 3 employment sectors: Government, Private, Self-Employed (n = 1,500).
Results: ANOVA F(2, 1497) = 28.4, p < 0.001, η² = 0.037. Kruskal-Wallis H = 55.8, p < 0.001. Shapiro-Francia: Self-Employed group fails (right-skewed income).
Interpretation: Income differs significantly across sectors. The Shapiro-Francia failure in the Self-Employed group (income is typically right-skewed) makes the Kruskal-Wallis result more reliable here. The effect size is small.