StatPlot — Plot Reference Guide

Quick reference for all 27 graph types available in StatPlot. Each entry covers what the plot shows, how to read it, and when to use it.

How to Use StatPlot

1 Upload a CSV on the Dataset Analysis page, then click the StatPlot button.

2 Click one or two variable chips to select them. The available graph types update automatically based on whether variables are numeric or categorical.

3 Choose a graph type card. The chart renders instantly in the preview area.

4 Open the Customise panel to change titles, colours, themes, axis ranges, and dimensions. Export as PNG or SVG when ready.

One Numeric Variable 7 plots

Histogram

Definition

A histogram divides the range of a numeric variable into equal-width bins and draws a bar for each bin whose height represents the count (or frequency) of observations falling within that interval. The number of bins is determined automatically using Sturges' rule.

Interpretation

Look at the overall shape: is the distribution symmetric, left-skewed, or right-skewed? A single peak suggests one dominant group; multiple peaks (bimodal or multimodal) may indicate subpopulations. Gaps between bars can signal unusual data structure or measurement artefacts.

Application

Checking normality before running a t-test. Exploring income data to see if it is right-skewed. Quality control to check whether product weights cluster around the target value.

Box Plot

Definition

A box plot (box-and-whisker plot) displays the five-number summary of a numeric variable: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. The box spans Q1 to Q3 (the interquartile range), whiskers extend to the most extreme data points within 1.5 × IQR, and any values beyond the whiskers are plotted individually as outliers.

Interpretation

A box centred within the whiskers suggests a symmetric distribution. If the median line sits closer to Q1, the data is right-skewed. Individual points beyond the whiskers are potential outliers. The length of the box indicates the spread of the middle 50 % of observations.

Application

Identifying outliers in laboratory test results. Comparing pre- and post-treatment scores at a glance. Summarising exam grades when the sample is too small for a reliable histogram.

Density Plot

Definition

A density plot is a smoothed, continuous estimate of the probability density function of a numeric variable. It uses kernel density estimation (Gaussian kernel with Silverman's bandwidth) to produce a curve where the area under any interval equals the estimated probability of a value falling there.

Interpretation

The peak of the curve indicates the most likely value range. Multiple peaks suggest the presence of subgroups. Unlike a histogram, the density plot is not sensitive to bin width, making it easier to compare distributional shapes across different samples.

Application

Overlaying distributions of two treatment groups for visual comparison. Assessing symmetry of residuals in regression diagnostics. Estimating modal values in continuous survey responses.

Violin Plot

Definition

A violin plot combines a mirrored density curve with a miniature box plot. The outer shape shows the kernel density estimate on both sides of a central axis, while the inner box plot marks the median and interquartile range. It reveals the full distributional shape that a box plot alone conceals.

Interpretation

Wide sections indicate where data points are concentrated; narrow "waists" indicate sparse regions. A bimodal violin (two bulges) signals two distinct subpopulations. Compare the width and symmetry of the violin to assess skewness, kurtosis, and multimodality at a glance.

Application

Visualising gene expression levels across tissue types in genomics. Showing response time distributions in psychology experiments. Presenting patient age distributions in clinical trial reports.

QQ Plot

Definition

A quantile-quantile (QQ) plot compares the observed quantiles of a sample against the theoretical quantiles of a normal distribution. Each point represents one observation: its x-coordinate is the expected value if the data were perfectly normal, and its y-coordinate is the actual observed value. A dashed reference line shows the expected pattern under normality.

Interpretation

If points fall along the reference line, the data are approximately normally distributed. Systematic curvature at the tails indicates skewness: an upward curve in the upper tail means right-skew; a downward curve in the lower tail means left-skew. S-shaped departures suggest heavy tails (leptokurtosis).

Application

Verifying normality assumptions before parametric tests (t-test, ANOVA). Checking regression residuals for model adequacy. Assessing whether log-transformation successfully normalised a skewed variable.

Stem-and-Leaf

Definition

A stem-and-leaf display is a text-based plot that preserves the original data values while showing their distribution. Each number is split into a "stem" (leading digits) and a "leaf" (trailing digit). Stems are listed vertically and leaves are written horizontally beside their stem, sorted in ascending order.

Interpretation

Read it like a sideways histogram where you can still see individual values. Rows with many leaves indicate concentration; sparse rows indicate gaps. Because exact values are visible, it is easy to spot clusters, outliers, and repeated values directly.

Application

Exploratory data analysis in small classroom datasets. Quick hand-drawn summaries during field data collection. Textbook exercises teaching distributional concepts without software.

Cleveland Dot Plot

Definition

A Cleveland dot plot arranges individual data points along a numeric axis and stacks dots vertically where values overlap. Unlike a histogram, each dot represents one observation (or a small cluster), giving a precise view of the distribution without the binning decisions that histograms require.

Interpretation

Columns of stacked dots indicate common values. The overall envelope of dots shows the distribution shape. Isolated dots at the extremes are potential outliers. It is most useful for small to moderate sample sizes where individual points are distinguishable.

Application

Displaying survey ratings (e.g. 1–10 scale) where exact counts matter. Small clinical samples (n < 50) where histograms would be too coarse. Comparing before/after measurements in quality improvement projects.

One Categorical Variable 5 plots

Bar Chart

Definition

A bar chart displays the frequency (count) of each category as a vertical bar. Categories are placed along the x-axis and bar heights represent how many observations belong to each category. Each bar is coloured distinctly from the active palette.

Interpretation

Compare bar heights to identify the most and least common categories. Large differences between bars indicate an imbalanced distribution. The count labels above each bar give the exact frequency.

Application

Summarising survey responses (e.g. "Excellent / Good / Fair / Poor"). Displaying disease counts by diagnostic category. Reporting defect types in manufacturing quality audits.

Pie Chart

Definition

A pie chart represents the proportion of each category as a slice of a circle. The angle of each slice is proportional to the category's share of the total. Labels show both the category name and percentage.

Interpretation

Useful when there are few categories (2–5) and the goal is to show parts of a whole. Avoid when categories have similar proportions, as small angular differences are hard to perceive. The percentage labels compensate for this perceptual limitation.

Application

Market share breakdown among a small number of competitors. Budget allocation across departments. Demographic composition (e.g. ethnicity distribution in a study sample).

Donut Chart

Definition

A donut chart is a variant of the pie chart with a hollow centre. It shows proportions identically to a pie chart but the empty centre can be used for annotations, totals, or simply to improve visual clarity by reducing the emphasis on angle comparisons.

Interpretation

Read identically to a pie chart — compare arc lengths and read percentage labels. The centre space can draw attention to a summary statistic (e.g. total N) placed inside. Preferred in modern dashboards for its cleaner appearance.

Application

Dashboard KPIs with a total figure in the centre. Infographic proportions in reports. Patient outcome categories in clinical summary posters.

Pareto Chart

Definition

A Pareto chart combines a bar chart sorted in descending frequency with a cumulative percentage line. Bars are ordered from most to least common, and the overlaid line shows the running total as a percentage on a secondary y-axis (0–100 %).

Interpretation

The "Pareto principle" (80/20 rule) suggests that a small number of categories often account for the majority of occurrences. If the cumulative line reaches 80 % within the first two or three bars, those categories dominate and should be prioritised for action.

Application

Root cause analysis in Six Sigma to identify the top defect types. Customer complaint categorisation to prioritise service improvements. Hospital incident reporting to focus on the most frequent adverse event categories.

Waffle Chart

Definition

A waffle chart is a 10 × 10 grid of 100 squares where each square represents 1 % of the data. Squares are colour-coded by category, making it straightforward to see proportions as filled area. A legend maps colours to categories with exact percentages.

Interpretation

Count coloured squares to estimate a category's share. The grid format makes it easier to compare proportions than a pie chart because area perception is more accurate than angle perception. It works best with 2–6 categories.

Application

Public health infographics (e.g. "23 out of 100 adults are obese"). Election result visualisations showing seat distribution. Fundraising progress displays where each square represents a milestone.

Two Numeric Variables 5 plots

Scatter Plot

Definition

A scatter plot places one numeric variable on the x-axis and another on the y-axis, plotting each observation as a point. Point opacity scales automatically with sample size to handle overplotting in large datasets.

Interpretation

An upward trend suggests a positive association; a downward trend suggests a negative association. A cloud with no pattern indicates little or no linear relationship. Clusters of points may reveal subgroups, and isolated points are potential outliers or influential observations.

Application

Exploring the relationship between height and weight. Visualising dose-response data in pharmacology. Checking for linearity before fitting a regression model.

Regression Plot

Definition

A regression plot extends the scatter plot by overlaying the ordinary least squares (OLS) regression line. The line represents the best linear fit: ŷ = b₀ + b₁x, minimising the sum of squared residuals.

Interpretation

The slope of the line shows the direction and strength of the linear relationship. Points scattered tightly around the line indicate a strong fit (high R²); wide scatter indicates a weak fit. Check whether points curve away from the line, which would suggest a non-linear relationship.

Application

Predicting blood pressure from age in epidemiology. Calibration curves in laboratory science. Forecasting sales from advertising spend.

Hexbin Plot

Definition

A hexbin plot bins data points into hexagonal cells and colours each hexagon by the number of points it contains. It is an alternative to scatter plots when the dataset is large (thousands of points) and individual points overlap heavily.

Interpretation

Darker (or warmer) hexagons indicate regions of high point density. The overall pattern of colour reveals the shape of the bivariate distribution — including linear trends, clusters, and outlier regions — without the visual clutter of overplotted points.

Application

Genomic data with millions of gene expression pairs. GPS coordinate density in urban mobility studies. Large survey datasets where scatter plots become unreadable.

Residual Plot

Definition

A residual plot shows the residuals (observed minus predicted values) from a linear regression on the y-axis against the fitted values on the x-axis. A dashed horizontal line at zero marks where residuals would be if the model were perfect.

Interpretation

Residuals should scatter randomly around zero with constant spread. A funnel shape (wider on one side) indicates heteroscedasticity. A curved pattern suggests that a non-linear model would fit better. Clusters of large residuals may point to outliers or omitted variables.

Application

Regression diagnostics to verify constant variance. Detecting non-linearity that a scatter plot might obscure. Identifying influential outliers in predictive models.

Bubble Plot

Definition

A bubble plot is a scatter plot where overlapping points are grouped into circular "bubbles" whose size is proportional to the number of data points in that region. It replaces overplotted points with a single bubble, using area to encode local density.

Interpretation

Large bubbles mark regions where many observations cluster; small bubbles indicate sparse areas. The plot preserves the spatial relationship of a scatter plot while making density immediately visible. Compare bubble sizes to identify the most populated regions of the data space.

Application

Moderate-sized datasets (500–5,000 points) where scatter overplots but hexbin is too aggressive. Summarising repeated measurements at similar x-values. Presenting population-level trends in ecological studies.

Numeric × Categorical 7 plots

Grouped Box Plot

Definition

A grouped box plot places one box-and-whisker plot per category side by side along the x-axis. Each box shows the median, IQR, whiskers, and outliers for the numeric variable within that group. Colours distinguish the groups.

Interpretation

Compare median positions across groups to see which group tends higher or lower. Overlapping boxes suggest no significant group difference; non-overlapping boxes suggest a likely difference. Differences in box width (IQR) indicate unequal variability across groups.

Application

Comparing blood glucose across diabetic, pre-diabetic, and control groups. Salary distributions by department. Crop yield across fertiliser treatments.

Strip / Jitter Plot

Definition

A strip plot (jitter plot) shows every individual data point, positioned along the y-axis for its numeric value and jittered horizontally within each category to reduce overlap. It is the most transparent plot for group comparisons because no data is hidden behind summary statistics.

Interpretation

Dense clusters indicate the bulk of the distribution. Gaps or isolated points reveal outliers and multimodality. Because every point is visible, you can assess sample size per group at a glance and spot patterns (e.g. ceiling effects) that summaries would conceal.

Application

Small clinical trials (n < 30 per group) where individual patient responses matter. Pilot study data where summary statistics may be misleading. Behavioural experiment scores showing individual participant performance.

Mean ± SE Bar

Definition

A mean ± SE bar chart displays group means as bar heights with error bars representing ± 1 standard error of the mean. The standard error quantifies the precision of the sample mean estimate.

Interpretation

Non-overlapping error bars suggest (but do not prove) a statistically significant difference between groups. The height of the bar shows the central tendency; the error bar length reflects how much uncertainty surrounds that estimate. Smaller error bars indicate more precise estimates (often due to larger sample sizes).

Application

Presenting treatment vs control results in biomedical papers. Comparing mean scores across experimental conditions. Summary figures in ANOVA-based study designs.

Grouped Violin

Definition

A grouped violin plot creates one violin (mirrored density + mini box plot) per category. Each violin shows the full distributional shape of the numeric variable within that group, allowing simultaneous comparison of location, spread, and modality.

Interpretation

Compare violin widths at the same y-value to see which group has more data in that range. A violin with two bulges indicates a bimodal group. The inner box plot quickly shows the median and IQR. Wider violins overall indicate greater variability in that group.

Application

Comparing gene expression levels across cell types. Reaction time distributions in cognitive psychology experiments. Patient outcomes across hospital sites in multi-centre trials.

Raincloud Plot

Definition

A raincloud plot combines three elements per group: a half-violin (density curve on one side), a jittered strip of individual data points, and a miniature box plot. This "cloud + rain + umbrella" combination is the most comprehensive single-group visualisation available, showing distribution shape, raw data, and summary statistics simultaneously.

Interpretation

The half-violin reveals the density shape; the jittered points show every observation; the box plot anchors the median and IQR. This combination lets you detect features (e.g. bimodality, gaps, outliers) that any single plot alone might miss. It is increasingly required by journal reviewers in psychology and neuroscience.

Application

Psychology journal figures (now recommended by many APA-style guidelines). Neuroscience data showing trial-level responses per condition. Any group comparison where reviewers request "show the data."

Mean ± 95 % CI

Definition

A mean ± 95 % CI plot shows group means as dots with whiskers extending to the 95 % confidence interval of the mean. Unlike the mean ± SE plot, the confidence interval has a direct inferential interpretation: if two intervals do not overlap, the difference is likely statistically significant.

Interpretation

Non-overlapping confidence intervals provide strong visual evidence of a significant difference. The dot position shows the point estimate; the whisker span shows the range of plausible population means. Wider intervals mean more uncertainty, often due to smaller sample sizes or higher variability.

Application

Forest-plot style summaries of subgroup analyses. Clinical trial primary endpoints comparing arms. Meta-analysis presentations where confidence intervals are the standard display.

Paired Line Plot

Definition

A paired line plot connects matched observations across exactly two groups with straight lines. Each line represents one subject (or matched pair), making it possible to see individual changes rather than just group-level summaries. Lines are coloured to indicate the direction of change (increase vs decrease).

Interpretation

If most lines slope in the same direction, there is a consistent effect. Crossing lines indicate subjects who changed in opposite directions. The steepness of lines shows the magnitude of individual changes. This plot requires exactly two groups; it will display a warning otherwise.

Application

Before/after treatment studies (e.g. weight loss, pain score). Matched case-control designs comparing paired observations. Pre-post educational interventions showing individual student improvement.

Two Categorical Variables 4 plots

Stacked Bar

Definition

A stacked bar chart divides each bar into coloured segments representing the categories of the second variable. The total bar height shows the combined count for each category of the first variable, and the segment heights show how the second variable is distributed within each group.

Interpretation

Compare total bar heights to see which groups are largest. Compare segment proportions within bars to see whether the composition of the second variable changes across groups. Consistent segment ratios suggest no association; changing ratios suggest an association between the two variables.

Application

Disease severity (mild/moderate/severe) across age groups. Product defect types across production lines. Survey response patterns across demographic groups.

Grouped Bar

Definition

A grouped (clustered) bar chart places bars for each category of the second variable side by side within each group of the first variable. Unlike stacked bars, grouped bars share a common baseline, making it easier to compare individual category counts directly.

Interpretation

Compare bar heights within each cluster to see which sub-category dominates. Compare the same colour across clusters to see how a particular sub-category varies across groups. It is more precise than stacked bars for individual comparisons but can become cluttered with many categories.

Application

Treatment outcome (success/failure) across multiple clinics. Brand preference across age groups in market research. Exam pass/fail rates across different courses or semesters.

Mosaic Plot

Definition

A mosaic plot divides a rectangle into tiles whose areas are proportional to the joint frequencies of two categorical variables. Column widths represent the marginal proportions of the first variable, and within each column, tile heights represent the conditional proportions of the second variable.

Interpretation

If the two variables were independent, all columns would have the same colour distribution (equal segment heights). Deviations from this pattern indicate an association. Wide columns indicate common categories; tall tiles within a column indicate dominant sub-categories for that group.

Application

Visualising chi-square results — a mosaic plot is the natural graphical companion to a contingency table. Epidemiological cross-tabulations (exposure × outcome). Log-linear model exploration in multiway tables.

Heatmap

Definition

A heatmap displays a cross-tabulation as a grid of coloured cells where the colour intensity of each cell encodes the count (frequency) at that row-column intersection. Rows represent categories of the first variable, columns represent categories of the second, and darker cells indicate higher counts.

Interpretation

Scan for the darkest cells to find the most common category combinations. A uniform colour across all cells suggests no association; clusters of dark and light cells suggest a pattern. The cell count labels (when enabled) give exact frequencies for precision.

Application

Confusion matrices in classification model evaluation. Cross-tabulating diagnosis × treatment in hospital records. Correlation matrices (using binned numeric variables) for initial variable screening.