Detailed analysis modal for the relationship between two numeric (continuous) variables, including Pearson and Spearman correlations, simple linear regression, and regression diagnostics.
When you click a Num × Num cell in the association matrix, Statulator opens a modal with four tabs:
You can download the complete analysis (all four tabs) as a PDF report or export the scatter plot as a PNG image using the buttons in the modal footer.
A researcher has a dataset of 500 adults with Age (years) and SystolicBP (mmHg). They want to know whether older age is associated with higher blood pressure.
1 Load the CSV on the Dataset Analysis page.
2 Click Select Variables, confirm both variables are detected as Numeric, and save.
3 Click Stat Analysis to generate the 2 × 2 association matrix.
4 Click the cell at the intersection of Age and SystolicBP.
The Statistics tab shows Pearson r = 0.54 (95 % CI: 0.47, 0.60; p < 0.001) and Spearman ρ = 0.51, confirming a moderate positive correlation. The regression table shows SystolicBP = 95.2 + 0.63 × Age, with R² = 0.29 — meaning age explains about 29 % of the variance in systolic blood pressure. The Shapiro-Francia test on residuals may fail (common in large samples), suggesting a note of caution about parametric inference.
The Graph tab displays the scatter plot with the regression line and 95 % confidence band, showing the upward trend clearly. The Diagnostics tab reveals whether residuals are well-behaved (no obvious patterns in the Residuals vs Fitted plot, points close to the diagonal in the Q-Q plot).
Pearson r measures linear association between two variables, ranging from −1 (perfect negative) to +1 (perfect positive). Spearman ρ measures monotonic association based on ranks and is more robust to outliers and non-linear relationships. If the two values diverge substantially, the relationship may be non-linear.
Common strength labels: |r| < 0.3 = weak, 0.3–0.7 = moderate, > 0.7 = strong.
The intercept (β0) represents the predicted value of the response variable when the predictor is zero. The slope (β1) represents the expected change in the response for each one-unit increase in the predictor. Both estimates come with standard errors, t-statistics, and 95 % confidence intervals. A significant p-value for the slope indicates a statistically significant linear relationship.
R² (coefficient of determination) indicates the proportion of variance in the response explained by the predictor. Adjusted R² penalises for the number of predictors (identical to R² in simple regression). The F-statistic tests whether the model fits significantly better than a model with no predictors.
This tests whether the regression residuals are normally distributed. A Pass (p > 0.05) means normality is not rejected. A Fail (p ≤ 0.05) suggests the residuals deviate from normality; consider the Spearman correlation as a non-parametric alternative or inspect the diagnostic plots for the nature of the departure.
Residuals vs Fitted: Look for a random scatter around zero. A curved pattern suggests non-linearity; a funnel shape suggests heteroscedasticity.
Normal Q-Q: Points should lie close to the diagonal. Systematic departures in the tails indicate non-normal residuals (heavy tails, skewness).
Scale-Location: The square root of standardised residuals should show no trend. An upward slope suggests increasing variance (heteroscedasticity).
Residuals vs Leverage: Identifies influential observations. Points with high leverage and large residuals (near Cook’s distance contours) may disproportionately affect the regression line.
95 % CI via Fisher z-transform:
\[ z = \tfrac{1}{2}\ln\!\left(\frac{1+r}{1-r}\right),\quad \text{SE}_z = \frac{1}{\sqrt{n-3}} \] \[ z_{\text{lower,upper}} = z \pm 1.96\,\text{SE}_z, \quad r_{\text{lower,upper}} = \frac{e^{2z_*}-1}{e^{2z_*}+1} \]Relationship between patient age and serum cholesterol in 800 adults.
Results: Pearson r = 0.38 (moderate positive), regression: Cholesterol = 152 + 0.72 × Age, R² = 0.14. Shapiro-Francia: Pass (W = 0.998, p = 0.22).
Interpretation: Age explains about 14 % of cholesterol variation. The relationship is statistically significant but modest in predictive power.
Study hours per week vs final exam score for 200 university students.
Results: Pearson r = 0.63, Spearman ρ = 0.59, R² = 0.40. Residual diagnostics show mild heteroscedasticity in the Scale-Location plot.
Interpretation: Study hours have a strong positive association with exam scores, explaining 40 % of the variance. The slight heteroscedasticity warrants caution but does not invalidate the main conclusion.
Rainfall (mm) vs crop yield (tonnes/ha) across 120 farm plots.
Results: Pearson r = 0.71, R² = 0.50. Residuals vs Fitted plot shows slight curvature, suggesting a possible non-linear component. Spearman ρ = 0.74.
Interpretation: A strong association between rainfall and yield. The higher Spearman ρ and curved residual pattern suggest a non-linear model might fit better.
Years of education vs annual income for 1,500 survey respondents.
Results: Pearson r = 0.52, Spearman ρ = 0.56, R² = 0.27. Shapiro-Francia: Fail (large sample), Q-Q plot shows right skew in residuals.
Interpretation: Education is moderately associated with income. The Shapiro-Francia failure in large samples is common and does not necessarily invalidate the regression; the Q-Q plot suggests income might benefit from a log transformation.