Calibration (statistics)


There are two main uses of the term calibration in statistics that denote special types of statistical inference problems. "Calibration" can mean
In addition, "calibration" is used in statistics with the usual general meaning of calibration. For example, model calibration can be also used to refer to Bayesian inference about the value of a model's parameters, given some data set, or more generally to any type of fitting of a statistical model.
As Philip Dawid puts it, "a forecaster is well calibrated if, for example, of those events to which he assigns a probability 30 percent, the long-run proportion that actually occurs turns out to be 30 percent".

In regression

The calibration problem in regression is the use of known data on the observed relationship between a dependent variable and an independent variable to make estimates of other values of the independent variable from new observations of the dependent variable. This can be known as "inverse regression": see also sliced inverse regression.
One example is that of dating objects, using observable evidence such as tree rings for dendrochronology or carbon-14 for radiometric dating. The observation is caused by the age of the object being dated, rather than the reverse, and the aim is to use the method for estimating dates based on new observations. The problem is whether the model used for relating known ages with observations should aim to minimise the error in the observation, or minimise the error in the date. The two approaches will produce different results, and the difference will increase if the model is then used for extrapolation at some distance from the known results.

In classification

Calibration in classification means turning transform classifier scores into class membership probabilities. An overview of calibration methods for two-class and multi-class classification tasks is given by Gebel
The following univariate calibration methods exist for transforming classifier scores into class membership probabilities in the two-class case:
The following multivariate calibration methods exist for transforming classifier scores into class membership probabilities in the case with classes count greater than two:
In prediction and forecasting, a Brier score is sometimes used to assess prediction accuracy of a set of predictions, specifically that the magnitude of the assigned probabilities track the relative frequency of the observed outcomes. Philip E. Tetlock employs the term "calibration" in this sense in his 2015 book Superforecasting.
This differs from accuracy and precision. For example, as expressed by Daniel Kahneman, "if you give all events that happen a probability of.6 and all the events that don’t happen a probability of.4, your discrimination is perfect but your calibration is miserable".
Aggregative Contingent Estimation was a program of the Office of Incisive Analysis at the Intelligence Advanced Research Projects Activity that sponsored research and forecasting tournaments in partnership with The Good Judgment Project, co-created by Philip E. Tetlock, Barbara Mellers, and Don Moore.
In meteorology, in particular, as concerns weather forecasting, a related mode of assessment is known as forecast skill.