Regression Line

TERM

The best-fit straight line summarising a scatter of points.

A regression line is the single straight line that best fits a scatter of points, summarising the overall relationship between two variables in one trend.

When points are spread across a plot, no single line passes through them all. The regression line is the one that comes closest to all of them at once — specifically, the line that makes the total of the squared vertical gaps between points and line as small as possible. It distils a messy cloud into a clear direction and slope.

How a regression line works in a chart

Drawn over a scatter plot, the line shows the trend the points imply. Its slope tells you the typical change in the vertical variable for a one-unit change in the horizontal one: an upward slope means they rise together, a downward slope means one falls as the other rises. The tighter the points cluster around the line, the more reliably it describes them; a wide scatter means the line is only a rough guide.

A concrete example

Imagine plotting hours studied against test score, with points drifting upward from lower-left to upper-right. A regression line through them might have a slope of 5, meaning each extra hour of study is associated with about 5 more points on average. If the line passes through 50 at zero hours, it predicts roughly 70 points for 4 hours (50 + 5 × 4 = 70). That is a useful summary — but the slope describes an association, not a guarantee, and it is not proof that studying caused the gain. Predicting far outside the data's range is also unreliable.

Related terms

A regression line quantifies the same relationship that correlation measures as a number, and it makes a trend explicit. It is almost always drawn on top of a scatter plot — see the scatter plot guide.