Correlation vs. Causation
ARTICLETwo variables moving together is not proof that one causes the other — and a chart, on its own, almost never settles the question.
Correlation means two variables tend to move together; causation means one of them actually produces the change in the other. A correlation does not, by itself, prove causation — because the same pattern can be produced by coincidence, by causation running the opposite way, or by a hidden third variable. A scatter plot can show you that two things are related and how strongly, but it cannot tell you why. Confusing the two is one of the most common ways a perfectly honest chart leads to a wrong conclusion.
The two ideas
Correlation is a measurable property of data: as one variable increases, does the other tend to increase (positive correlation), tend to decrease (negative correlation), or show no consistent pattern (no correlation)? It is the kind of thing a chart is built to reveal. You can see it in the shape of a scatter plot and quantify it with a single number.
Causation is a claim about mechanism: changing one variable would change the other. That is a much stronger statement, and it is not something the correlation alone can deliver. The gap between "these move together" and "this one drives that one" is where most overclaiming happens.
Why causation doesn't follow
The reason is simple: a single visible pattern is consistent with several different stories, and the data on the chart usually cannot distinguish between them. When you see a strong upward trend between two variables, your mind reaches for the most intuitive explanation — but intuition is not evidence. The chart shows the what; it is silent on the why. To go from one to the other you need something the scatter plot does not contain: a controlled comparison, a known mechanism, or a study designed to rule out the alternatives.
Seeing a tidy trend line and concluding "so X causes Y" feels natural, but the line measures association, not mechanism. The same line would appear whether X causes Y, Y causes X, or something else causes both.
Three things that produce a correlation
Whenever two variables are correlated, at least four explanations are on the table — and only one of them is "X causes Y." Here are the alternatives, with deliberately generic, made-up illustrations:
- Coincidence. With enough variables measured over enough time, some pairs will line up by pure chance, with no real connection at all. Two unrelated quantities can drift in the same direction for a while and then stop.
- Reverse causation. The arrow may point the other way. Suppose busier stores and more staff on the floor are correlated; it is tempting to say more staff drives more visitors, but it may be that managers simply schedule more staff because they expect a busy day.
- A lurking (confounding) third variable. Something you did not chart drives both. Imagine ice-cream sales and the number of people swimming both rising together. Neither causes the other — warm weather, the hidden third variable, lifts both at once.
The third case, a confounder, is the one that fools people most often, because the two charted variables really do move together for a real reason — just not the reason you would guess from the chart.
Reading scatter plots without overclaiming
A scatter plot is the natural home of this question, so it is worth knowing exactly what it does and does not tell you. The plot below shows a positive correlation — points drifting up to the right — with a regression line summarising the trend. The line describes the relationship; it does not explain it.
Read it for what it offers: direction (positive or negative), strength (tightly clustered or loosely scattered), and any outliers. Resist reading in a cause. If the comparison you actually care about is a trend over time rather than a relationship between two variables, note that the chart type changes the question too — see scatter plot vs. line chart for which to reach for.
Honest habits when presenting relationships
- Describe, don't conclude. Say "X and Y are positively correlated," not "X drives Y," unless you have evidence beyond the chart.
- Ask for the third variable. Before accepting a relationship, ask what unmeasured factor could be moving both.
- Check the direction of the arrow. Could the effect plausibly run the other way?
- Beware tiny samples and short windows. A handful of points can look correlated by chance.
- Use causal language only with causal evidence — typically a controlled experiment or a study designed to rule out alternatives.
None of this means scatter plots are untrustworthy. They are excellent at the job they are built for: showing whether and how two variables relate. The discipline is simply to stop where the data stops. To explore your own pairs of variables, build one with the free, in-browser chart makers, or read the full scatter plot guide first.