Outlier

TERM

A data point that sits far from the rest.

An outlier is a data point that lies far from the rest of the data, standing well above or below the range where most values cluster.

Most datasets have a bulk of values that sit close together, with the occasional point a long way off. That distant point is an outlier. Outliers matter because they can reveal something genuinely interesting — a rare event, a special case — or signal a problem such as a measurement error or a typo. Either way they deserve a second look rather than being ignored.

How an outlier is identified

"Far from the rest" can be made precise. A common rule uses the quartiles: a value is flagged as an outlier if it lies more than 1.5 times the interquartile range below the first quartile or above the third. Visually, though, outliers are often obvious — they are simply the points sitting apart from the crowd, and the eye spots them before any formula is applied.

A concrete example

Suppose response times for a task are mostly between 2 and 6 seconds, but one reading comes in at 45 seconds. That 45 is an outlier: it sits far above the cluster and far beyond the range set by the quartiles. It might mean something real (one user was interrupted) or an error (the timer kept running) — but it clearly does not belong with the others, and leaving it in would badly distort the average.

How outliers appear in a chart

On a scatter plot, an outlier is a lone point set apart from the main cloud. On a box plot, outliers are drawn as individual dots beyond the whiskers. Because outliers can stretch an axis or pull a mean off-centre, charts sometimes mark or set them aside so the rest of the data stays readable.

Related terms

Outliers are detected using quartiles, and their presence is exactly why the median is often preferred to the mean. An outlier is a single unusual data point and is most visible in the scatter plot guide.