An interactive pairwise matrix that maps associations across all selected variables and provides detailed statistical tests on demand.
The Stat Analysis feature computes a pairwise association measure for every combination of your selected variables and displays the results as a colour-coded matrix. Each cell shows a summary statistic whose type depends on the variable pair:
Clicking any cell opens a detailed modal with the full battery of relevant statistical tests, effect sizes, confidence intervals, and an automatically generated plain-language interpretation. The matrix uses sticky headers so that row and column labels remain visible when scrolling through large datasets, and a progress bar is displayed during computation for datasets with many variables.
Each pair type opens a different analysis modal. See the dedicated help pages for a full walkthrough of every tab, statistic, and diagnostic plot:
An agronomist has a CSV with 150 plot records containing Yield (kg/ha, numeric), Rainfall (mm, numeric), SoilpH (numeric), FertilizerType (Organic, Chemical, None), and Region (North, South, East, West). They want a quick overview of which variables are associated with each other.
1 Open Dataset Analysis and load the CSV.
2 Click Select Variables, verify types, select all five variables, and click Save Changes.
3 Click the green Stat Analysis button.
4 A progress bar appears while the matrix is computed. Once complete, the 5 × 5 association matrix is displayed with colour-coded cells.
5 Hover over any cell to see a tooltip with variable names, types, and available analyses. Click the cell (e.g., Yield × FertilizerType) to open the detailed analysis modal.
The modal displays an independent two-sample t-test (if 2 groups) or one-way ANOVA (if 3+ groups), plus non-parametric alternatives (Mann-Whitney U or Kruskal-Wallis), Levene’s test for equality of variances, effect sizes (Cohen’s d or η²), and an auto-generated interpretation paragraph.
Darker, more saturated cell colours indicate stronger associations. The diagonal is always 1.0 (a variable is perfectly associated with itself). Look for off-diagonal cells with high absolute values to identify the strongest relationships in your data.
The modal presents both Pearson r (linear association) and Spearman ρ (monotonic association). If they diverge substantially, the relationship may be non-linear. The simple linear regression section provides the fitted equation, R², and a 95 % prediction band. The Shapiro-Wilk test checks normality of the residuals, a key assumption of the parametric tests. Common strength labels: |r| < 0.3 weak, 0.3–0.7 moderate, > 0.7 strong.
The chi-square test of independence tells you whether the two categorical variables are associated. Cramér’s V quantifies the strength (0 = no association, 1 = perfect). For 2 × 2 tables, the odds ratio (OR) and relative risk (RR) with 95 % CIs are also provided, alongside Fisher’s exact test for small expected cell counts. Check the note about cells with expected count < 5; if too many exist, Fisher’s exact test is more reliable.
When comparing a numeric variable across categorical groups: for 2 groups, the two-sample t-test (pooled and Welch) and Mann-Whitney U test are shown with Cohen’s d and the mean difference with 95 % CI. For 3+ groups, one-way ANOVA and Kruskal-Wallis replace them, with η² as the effect size. Levene’s test checks whether group variances are equal; if it is significant, prefer the Welch t-test (2 groups) or note the ANOVA assumption violation (3+ groups).
At the bottom of each modal, Statulator generates a plain-language summary. It reports the direction and strength of the relationship, statistical significance, key effect sizes, assumption checks, and recommendations for non-parametric alternatives when assumptions are violated. This is a starting point; always review the raw numbers and consider the context of your study.
95 % CI via Fisher z-transform:
\[ z = \tfrac{1}{2}\ln\!\left(\frac{1+r}{1-r}\right),\quad \text{SE}_z = \frac{1}{\sqrt{n-3}} \] \[ z_{\text{lower,upper}} = z \pm 1.96\,\text{SE}_z, \quad r_{\text{lower,upper}} = \frac{e^{2z_*}-1}{e^{2z_*}+1} \]95 % CIs computed via log-transform: \(\ln(\text{OR}) \pm 1.96\sqrt{1/a+1/b+1/c+1/d}\).
Two-sided p-value is the sum of all table probabilities ≤ the observed probability.
Pooled:
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{1/n_1 + 1/n_2}}, \quad s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}} \]Welch:
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}, \quad \text{df} = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}} \]Small ≈ 0.2, Medium ≈ 0.5, Large ≈ 0.8 (Cohen, 1988).
Association matrix for a clinical dataset with Age (numeric), BMI (numeric), Smoker (categorical), and Diabetes (categorical).
Key findings: Age × BMI: Pearson r = 0.32 (weak positive). Smoker × Diabetes: χ² = 9.8, p = 0.002, Cramér's V = 0.18. BMI × Diabetes: t-test p < 0.001, Cohen's d = 0.65.
Interpretation: The matrix quickly identifies BMI as the strongest numeric predictor of diabetes status, while smoking is also significantly associated.
Association matrix for student data: GPA (numeric), Study Hours (numeric), Major (4 categories), Scholarship (yes/no).
Key findings: GPA × Study Hours: r = 0.55. GPA × Scholarship: t-test p < 0.001, d = 0.91. Major × Scholarship: χ² = 12.4, p = 0.006.
Interpretation: Scholarship holders have substantially higher GPAs (large effect). Study hours are moderately correlated with GPA. Scholarship rates differ across majors.
Association matrix for a field trial: Yield (numeric), Rainfall (numeric), Soil Type (3 categories), Fertilizer (2 categories).
Key findings: Yield × Rainfall: r = 0.68. Yield × Fertilizer: t-test p = 0.003, d = 0.52. Yield × Soil Type: ANOVA p < 0.001, η² = 0.22.
Interpretation: Soil type explains the most variance in yield (η² = 0.22), followed by rainfall. The new fertilizer also contributes a medium-sized effect.
Association matrix for a labour survey: Income (numeric), Years of Experience (numeric), Gender (2 categories), Industry (5 categories).
Key findings: Income × Experience: r = 0.61. Income × Gender: Welch t-test p < 0.001, d = 0.45. Income × Industry: Kruskal-Wallis p < 0.001.
Interpretation: Experience is the strongest linear predictor of income. The gender pay gap has a medium effect size. Income distributions vary significantly across industries (non-parametric test used due to heavy skew).