Critical values of the studentized range distribution are used in Tukey's range test. The studentized range is used to calculate significance levels for results obtained by data mining, where one selectively seeks extreme differences in sample data, rather than only sampling randomly.
Derivation of the studentized range distribution function
The studentized range distribution function arises from re-scaling the sample rangeR by the sample standard deviations, since the studentized range is customarily tabulated in units of standard deviations, with the variable. The derivation begins with a perfectly general form of the distribution function of the sample range, which applies to any sample data distribution. In order to obtain the distribution in terms of the "studentized" range q, we will change variable from R to s and q. Assuming the sample data is normally distributed, the standard deviations will be distributed. By further integrating over s we can remove s as a parameter and obtain the re-scaled distribution in terms of q alone.
General form
For any probability density functionf, the range probability density f is: What this means is that we are adding up the probabilities that, given k draws from a distribution, two of them differ by r, and the remaining k − 2 draws all fall between the two extreme values. If we change variables to u where is the low-end of the range, and define F as the cumulative distribution function of f, then the equation can be simplified: We introduce a similar integral, and notice that differentiating under the integral-sign gives which recovers the integral above, so that last relation confirms because for any continuous cdf
Special form for normal data
The range distribution is most often used for confidence intervals around sample averages, which are asymptotically normally distributed by the central limit theorem. In order to create the studentized range distribution for normal data, we first switch from the generic f and F to the distribution functionsφ and Φ for the standard normal distribution, and change the variable r to s·q, where q is a fixed factor that re-scales r by scaling factors: Choose the scaling factor s to be the sample standard deviation, so that q becomes the number of standard deviations wide that the range is. For normal data s is chi distributed and the distribution function f of the chi distribution is given by: Multiplying the distributions f and f and integrating to remove the dependence on the standard deviation s gives the studentized range distribution function for normal data: where The equation for the pdf shown in the sections above comes from using to replace the exponential expression in the outer integral.