Rank Correlation

A rank correlation is a statistic used to measure the relationship between rankings of ordinal variables or different rankings of the same variable.

Learning Objective

Define rank correlation and illustrate how it differs from linear correlation.

Key Points

A rank correlation coefficient measures the degree of similarity between two rankings and can be used to assess the significance of the relation between them.
If one the variable decreases as the other increases, the rank correlation coefficients will be negative.
An increasing rank correlation coefficient implies increasing agreement between rankings.

Terms

Spearman's rank correlation coefficient
A nonparametric measure of statistical dependence between two variables that assesses how well the relationship between two variables can be described using a monotonic function.
rank correlation coefficient
A measure of the degree of similarity between two rankings that can be used to assess the significance of the relation between them.
Kendall's rank correlation coefficient
A statistic used to measure the association between two measured quantities; specifically, it measures the similarity of the orderings of the data when ranked by each of the quantities.

Full Text

A rank correlation is any of several statistics that measure the relationship between rankings of different ordinal variables or different rankings of the same variable. In this context, a "ranking" is the assignment of the labels "first", "second", "third", et cetera, to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings and can be used to assess the significance of the relation between them.

If, for example, one variable is the identity of a college basketball program and another variable is the identity of a college football program, one could test for a relationship between the poll rankings of the two types of program. One could then ask, do colleges with a higher-ranked basketball program tend to have a higher-ranked football program? A rank correlation coefficient can measure that relationship, and the measure of significance of the rank correlation coefficient can show whether the measured relationship is small enough to be likely to be a coincidence.

If there is only one variable—for example, the identity of a college football program—but it is subject to two different poll rankings (say, one by coaches and one by sportswriters), then the similarity of the two different polls' rankings can be measured with a rank correlation coefficient.

Rank Correlation Coefficients

Rank correlation coefficients, such as Spearman's rank correlation coefficient and Kendall's rank correlation coefficient, measure the extent to which as one variable increases the other variable tends to increase, without requiring that increase to be represented by a linear relationship .

Spearman's Rank Correlation

This graph shows a Spearman rank correlation of 1 and a Pearson correlation coefficient of 0.88. A Spearman correlation of 1 results when the two variables being compared are monotonically related, even if their relationship is not linear. In contrast, this does not give a perfect Pearson correlation.

If as the one variable increases the other decreases, the rank correlation coefficients will be negative. It is common to regard these rank correlation coefficients as alternatives to Pearson's coefficient, used either to reduce the amount of calculation or to make the coefficient less sensitive to non-normality in distributions. However, this view has little mathematical basis, as rank correlation coefficients measure a different type of relationship than the Pearson product-moment correlation coefficient. They are best seen as measures of a different type of association rather than as alternative measure of the population correlation coefficient.

An increasing rank correlation coefficient implies increasing agreement between rankings. The coefficient is inside the interval $[-1, 1]$ and assumes the value:

$-1$ if the disagreement between the two rankings is perfect: one ranking is the reverse of the other;
0 if the rankings are completely independent; or
1 if the agreement between the two rankings is perfect: the two rankings are the same.

Nature of Rank Correlation

To illustrate the nature of rank correlation, and its difference from linear correlation, consider the following four pairs of numbers $(x, y)$:

$(0, 1) \\ (10, 100) \\ (101, 500) \\ (102, 2000)$

As we go from each pair to the next pair, $x$ increases, and so does $y$. This relationship is perfect, in the sense that an increase in $x$ is always accompanied by an increase in $y$. This means that we have a perfect rank correlation and both Spearman's correlation coefficient and Kendall's correlation coefficient are 1. In this example, the Pearson product-moment correlation coefficient is 0.7544, indicating that the points are far from lying on a straight line.

In the same way, if $y$ always decreases when $x$ increases, the rank correlation coefficients will be $-1$ while the Pearson product-moment correlation coefficient may or may not be close to $-1$. This depends on how close the points are to a straight line. However, in the extreme case of perfect rank correlation, when the two coefficients are both equal (being both $+1$ or both $-1$), this is not in general so, and values of the two coefficients cannot meaningfully be compared. For example, for the three pairs $(1, 1)$, $(2, 3)$, $(3, 2)$, Spearman's coefficient is $\frac{1}{2}$, while Kendall's coefficient is $\frac{1}{3}$.

[ edit ]

Prev Concept

Variation and Prediction Intervals

Ecological Fallacy

Next Concept