Correlation vs. Causation

ARTICLE

Two variables moving together is not proof that one causes the other — and a chart, on its own, almost never settles the question.

Correlation means two variables tend to move together; causation means one of them actually produces the change in the other. A correlation does not, by itself, prove causation — because the same pattern can be produced by coincidence, by causation running the opposite way, or by a hidden third variable. A scatter plot can show you that two things are related and how strongly, but it cannot tell you why. Confusing the two is one of the most common ways a perfectly honest chart leads to a wrong conclusion.

The two ideas

Correlation is a measurable property of data: as one variable increases, does the other tend to increase (positive correlation), tend to decrease (negative correlation), or show no consistent pattern (no correlation)? It is the kind of thing a chart is built to reveal. You can see it in the shape of a scatter plot and quantify it with a single number.

Causation is a claim about mechanism: changing one variable would change the other. That is a much stronger statement, and it is not something the correlation alone can deliver. The gap between "these move together" and "this one drives that one" is where most overclaiming happens.

Why causation doesn't follow

The reason is simple: a single visible pattern is consistent with several different stories, and the data on the chart usually cannot distinguish between them. When you see a strong upward trend between two variables, your mind reaches for the most intuitive explanation — but intuition is not evidence. The chart shows the what; it is silent on the why. To go from one to the other you need something the scatter plot does not contain: a controlled comparison, a known mechanism, or a study designed to rule out the alternatives.

The classic trap

Seeing a tidy trend line and concluding "so X causes Y" feels natural, but the line measures association, not mechanism. The same line would appear whether X causes Y, Y causes X, or something else causes both.

Three things that produce a correlation

Whenever two variables are correlated, at least four explanations are on the table — and only one of them is "X causes Y." Here are the alternatives, with deliberately generic, made-up illustrations:

The third case, a confounder, is the one that fools people most often, because the two charted variables really do move together for a real reason — just not the reason you would guess from the chart.

Reading scatter plots without overclaiming

A scatter plot is the natural home of this question, so it is worth knowing exactly what it does and does not tell you. The plot below shows a positive correlation — points drifting up to the right — with a regression line summarising the trend. The line describes the relationship; it does not explain it.

variable X variable Y
A positive correlation with a trend line. The chart shows that X and Y rise together — not whether X causes Y, Y causes X, or a third variable drives both.

Read it for what it offers: direction (positive or negative), strength (tightly clustered or loosely scattered), and any outliers. Resist reading in a cause. If the comparison you actually care about is a trend over time rather than a relationship between two variables, note that the chart type changes the question too — see scatter plot vs. line chart for which to reach for.

Honest habits when presenting relationships

None of this means scatter plots are untrustworthy. They are excellent at the job they are built for: showing whether and how two variables relate. The discipline is simply to stop where the data stops. To explore your own pairs of variables, build one with the free, in-browser chart makers, or read the full scatter plot guide first.