Scatter Plots
GUIDEWhat a scatter plot shows, how to read correlation, and why a pattern is never proof of cause.
A scatter plot answers one question better than any other chart: are these two numeric things related? Instead of summarising the data into bars or slices, it plots every observation as a single point, positioned by its value on the horizontal (x) axis and its value on the vertical (y) axis. The result is a cloud of dots whose shape tells the story — whether the two variables rise together, pull against each other, or drift independently.
How a scatter plot works
Both axes carry a continuous numeric scale — there are no categories. To place a point you take one observation that has been measured on two variables, read its first value along the x axis and its second value up the y axis, and drop a dot where they meet. Repeat for every observation and you have built the plot. Because position is the most precisely read visual cue we have, the eye can detect trends, gaps, and stragglers in the cloud almost instantly, even across hundreds of points.
The key insight is that you are not reading any single dot — you are reading the collective shape. One point on its own says little; a few hundred points arranged into a clear diagonal band says a great deal.
Reading correlation: the three basic shapes
Correlation is just the word for the direction and tightness of that cloud. Three patterns cover most of what you will see:
- Positive correlation. The cloud climbs from the lower-left to the upper-right: as the x value gets larger, the y value tends to get larger too. The points lean uphill.
- Negative correlation. The cloud falls from the upper-left to the lower-right: as x increases, y tends to decrease. The points lean downhill.
- No correlation. The points form a shapeless blob with no consistent slope. Knowing one variable tells you nothing useful about the other.
Two further qualities matter. Strength is how tightly the points hug an imaginary trend line: a narrow band is a strong relationship, a wide spread is a weak one. Form is whether the trend is a straight line or a curve — some variables rise quickly then level off, which a straight-line summary would miss.
Outliers and clusters
Beyond the overall trend, two features are worth hunting for. Outliers are points that sit far from the rest of the cloud — unusually high, unusually low, or simply away from the pattern. They can flag a data-entry error, a genuinely exceptional case, or the most interesting observation in the whole set, so they deserve a second look rather than automatic deletion.
Clusters are clumps of points that group together, hinting that the data contains distinct subgroups. A cloud that splits into two separate clumps may mean you are unknowingly mixing two different populations, and analysing them together can hide or invent a relationship. Colouring points by a third attribute often makes such groups visible.
When to use a scatter plot
- Exploring a relationship. When you have paired measurements and want to know whether they move together before running any deeper analysis.
- Spotting outliers. The plot makes unusual observations leap out from the crowd.
- Showing many data points. Where a bar chart would need hundreds of bars, a scatter plot shows hundreds of points comfortably.
If your question is "does this go up when that goes up?" — and both "this" and "that" are numbers you measured on the same items — a scatter plot is the right tool. If one of your axes is really a set of named categories, you want a bar chart instead.
When not to use a scatter plot
Scatter plots are powerful but specialised, and the wrong job makes them confusing:
- One variable is categorical. If your x axis holds names rather than numbers, the horizontal positions are arbitrary and the cloud means nothing. Use a bar chart.
- You are tracking change over time. A series measured across ordered time periods is better connected with a line chart, which makes the trend and its pace explicit.
- You have only a handful of points. With five or six observations there is no cloud to read, and a small table or bar chart communicates more honestly.
- The data is heavily overlapping. When thousands of points pile onto the same spot, the cloud turns into a solid blob that hides density. Transparency or a binned heat-style view solves this better than raw dots.
The mistake that matters most: correlation is not causation
A scatter plot can show two variables moving together, but it can never prove that one causes the other. The cause might run the opposite way, both might be driven by an unseen third factor, or a tidy diagonal might be pure coincidence in a small sample. Treat a strong pattern as a question worth investigating — not as an answer. Watch too for a single extreme outlier dragging an apparent trend out of thin air, and for hidden clusters faking a relationship that vanishes once the groups are separated.
Make a scatter plot
Ready to plot your own data? The free scatter plot maker lets you enter X and Y values (or paste them), label the axes, and export a PNG or SVG — no signup. Or read how scatter compares with the rest in the complete guide to chart types.