Rank correlation

In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.

Context

If, for example, one variable is the identity of a college basketball program and another variable is the identity of a college football program, one could test for a relationship between the poll rankings of the two types of program: do colleges with a higher-ranked basketball program tend to have a higher-ranked football program? A rank correlation coefficient can measure that relationship, and the measure of significance of the rank correlation coefficient can show whether the measured relationship is small enough to likely be a coincidence.
If there is only one variable, the identity of a college football program, but it is subject to two different poll rankings, then the similarity of the two different polls' rankings can be measured with a rank correlation coefficient.
As another example, in a contingency table with low income, medium income, and high income in the row variable and educational level—no high school, high school, university—in the column variable), a rank correlation measures the relationship between income and educational level.

Correlation coefficients

Some of the more popular rank correlation statistics include

An increasing rank correlation coefficient implies increasing agreement between rankings. The coefficient is inside the interval and assumes the value:

1 if the agreement between the two rankings is perfect; the two rankings are the same.
0 if the rankings are completely independent.
−1 if the disagreement between the two rankings is perfect; one ranking is the reverse of the other.

Following, a ranking can be seen as a permutation of a set of objects. Thus we can look at observed rankings as data obtained when the sample space is a symmetric group. We can then introduce a metric, making the symmetric group into a metric space. Different metrics will correspond to different rank correlations.

General correlation coefficient

showed that his and Spearman's are particular cases of a general correlation coefficient.
Suppose we have a set of objects, which are being considered in relation to two properties, represented by and, forming the sets of values and. To any pair of individuals, say the -th and the -th we assign a -score, denoted by, and a -score, denoted by. The only requirement for these functions is that they be anti-symmetric, so and. Then the generalized correlation coefficient is defined as
Equivalently, if all coefficients are collected into matrices and, with and, then
where is the Frobenius inner product and the Frobenius norm. In particular, the general correlation coefficient is the cosine of the angle between the matrices and.

Kendall's $\tau$ as a particular case

If, are the ranks of the -member according to the -quality and -quality respectively, then we can define
The sum is the number of concordant pairs minus the number of discordant pairs. The sum is just, the number of terms, as is. Thus in this case,

Spearman's $\rho$ as a particular case

If, are the ranks of the -member according to the and the -quality respectively, we can simply define
The sums and are equal, since both and range from to. Then we have:
now
We also have
and hence
being the sum of squares of the first naturals equals. Thus, the last equation reduces to
Further
and thus, substituting into the original formula these results we get
where is the difference between ranks.
which is exactly Spearman's rank correlation coefficient.

Rank-biserial correlation

Gene Glass noted that the rank-biserial can be derived from Spearman's. "One can derive a coefficient defined on X, the dichotomous variable, and Y, the ranking variable, which estimates Spearman's rho between X and Y in the same way that biserial r estimates Pearson's r between two normal variables”. The rank-biserial correlation had been introduced nine years before by Edward Cureton as a measure of rank correlation when the ranks are in two groups.

Kerby simple difference formula

Dave Kerby recommended the rank-biserial as the measure to introduce students to rank correlation, because the general logic can be explained at an introductory level. The rank-biserial is the correlation used with the Mann–Whitney U test, a method commonly covered in introductory college courses on statistics. The data for this test consists of two groups; and for each member of the groups, the outcome is ranked for the study as a whole.
Kerby showed that this rank correlation can be expressed in terms of two concepts: the percent of data that support a stated hypothesis, and the percent of data that do not support it. The Kerby simple difference formula states that the rank correlation can be expressed as the difference between the proportion of favorable evidence minus the proportion of unfavorable evidence.

Example and interpretation

To illustrate the computation, suppose a coach trains long-distance runners for one month using two methods. Group A has 5 runners, and Group B has 4 runners. The stated hypothesis is that method A produces faster runners. The race to assess the results finds that the runners from Group A do indeed run faster, with the following ranks: 1, 2, 3, 4, and 6. The slower runners from Group B thus have ranks of 5, 7, 8, and 9.
The analysis is conducted on pairs, defined as a member of one group compared to a member of the other group. For example, the fastest runner in the study is a member of four pairs:,,, and. All four of these pairs support the hypothesis, because in each pair the runner from Group A is faster than the runner from Group B. There are a total of 20 pairs, and 19 pairs support the hypothesis. The only pair that does not support the hypothesis are the two runners with ranks 5 and 6, because in this pair, the runner from Group B had the faster time. By the Kerby simple difference formula, 95% of the data support the hypothesis, and 5% do not support, so the rank correlation is r =.95 -.05 =.90.
The maximum value for the correlation is r = 1, which means that 100% of the pairs favor the hypothesis. A correlation of r = 0 indicates that half the pairs favor the hypothesis and half do not; in other words, the sample groups do not differ in ranks, so there is no evidence that they come from two different populations. An effect size of r = 0 can be said to describe no relationship between group membership and the members' ranks.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...