Discrete choice

In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such choices contrast with standard consumption models in which the quantity of each good consumed is assumed to be a continuous variable. In the continuous case, calculus methods can be used to determine the optimum amount chosen, and demand can be modeled empirically using regression analysis. On the other hand, discrete choice analysis examines situations in which the potential outcomes are discrete, such that the optimum is not characterized by standard first-order conditions. Thus, instead of examining “how much” as in problems with continuous choice variables, discrete choice analysis examines “which one.” However, discrete choice analysis can also be used to examine the chosen quantity when only a few distinct quantities must be chosen from, such as the number of vehicles a household chooses to own and the number of minutes of telecommunications service a customer decides to purchase. Techniques such as logistic regression and probit regression can be used for empirical analysis of discrete choice.
Discrete choice models theoretically or empirically model choices made by people among a finite set of alternatives. The models have been used to examine, e.g., the choice of which car to buy, where to go to college, which mode of transport to take to work among numerous other applications. Discrete choice models are also used to examine choices by organizations, such as firms or government agencies. In the discussion below, the decision-making unit is assumed to be a person, though the concepts are applicable more generally. Daniel McFadden won the Nobel prize in 2000 for his pioneering work in developing the theoretical basis for discrete choice.
Discrete choice models statistically relate the choice made by each person to the attributes of the person and the attributes of the alternatives available to the person. For example, the choice of which car a person buys is statistically related to the person's income and age as well as to price, fuel efficiency, size, and other attributes of each available car. The models estimate the probability that a person chooses a particular alternative. The models are often used to forecast how people's choices will change under changes in demographics and/or attributes of the alternatives.
Discrete choice models specify the probability that an individual chooses an option among a set of alternatives. The probabilistic description of discrete choice behavior is used not to reflect individual behavior that is viewed as intrinsically probabilistic. Rather, it is the lack of information that leads us to describe choice in a probabilistic fashion. In practice, we cannot know all factors affecting individual choice decisions as their determinants are partially observed or imperfectly measured. Therefore, discrete choice models rely on stochastic assumptions and specifications to account for unobserved factors related to a) choice alternatives, b) taste variation over people and over time heterogeneous choice sets. The different formulations have been summarized and classified into groups of models.

Applications

Marketing researchers use discrete choice models to study consumer demand and to predict competitive business responses, enabling choice modelers to solve a range of business problems, such as pricing, product development, and demand estimation problems. In market research, this is commonly called conjoint analysis.
Transportation planners use discrete choice models to predict demand for planned transportation systems, such as which route a driver will take and whether someone will take rapid transit systems. The first applications of discrete choice models were in transportation planning, and much of the most advanced research in discrete choice models is conducted by transportation researchers.
Energy forecasters and policymakers use discrete choice models for households’ and firms’ choice of heating system, appliance efficiency levels, and fuel efficiency level of vehicles.
Environmental studies utilize discrete choice models to examine the recreators’ choice of, e.g., fishing or skiing site and to infer the value of amenities, such as campgrounds, fish stock, and warming huts, and to estimate the value of water quality improvements.
Labor economists use discrete choice models to examine participation in the work force, occupation choice, and choice of college and training programs.
Common features of discrete choice models

Discrete choice models take many forms, including: Binary Logit, Binary Probit, Multinomial Logit, Conditional Logit, Multinomial Probit, Nested Logit, Generalized Extreme Value Models, Mixed Logit, and Exploded Logit. All of these models have the features described below in common.

Choice set

The choice set is the set of alternatives that are available to the person. For a discrete choice model, the choice set must meet three requirements:

The set of alternatives must be collectively exhaustive, meaning that the set includes all possible alternatives. This requirement implies that the person necessarily does choose an alternative from the set.
The alternatives must be mutually exclusive, meaning that choosing one alternative means not choosing any other alternatives. This requirement implies that the person chooses only one alternative from the set.
The set must contain a finite number of alternatives. This third requirement distinguishes discrete choice analysis from forms of regression analysis in which the dependent variable can take an infinite number of values.

As an example, the choice set for a person deciding which mode of transport to take to work includes driving alone, carpooling, taking bus, etc. The choice set is complicated by the fact that a person can use multiple modes for a given trip, such as driving a car to a train station and then taking train to work. In this case, the choice set can include each possible combination of modes. Alternatively, the choice can be defined as the choice of “primary” mode, with the set consisting of car, bus, rail, and other. Note that the alternative “other” is included in order to make the choice set exhaustive.
Different people may have different choice sets, depending on their circumstances. For instance, the Scion automobile was not sold in Canada as of 2009, so new car buyers in Canada faced different choice sets from those of American consumers. Such considerations are taken into account in the formulation of discrete choice models.

Defining choice probabilities

A discrete choice model specifies the probability that a person chooses a particular alternative, with the probability expressed as a function of observed variables that relate to the alternatives and the person. In its general form, the probability that person n chooses alternative i is expressed as:
where
In the mode of transport example above, the attributes of modes, such as travel time and cost, and the characteristics of consumer, such as annual income, age, and gender, can be used to calculate choice probabilities. The attributes of the alternatives can differ over people; e.g., cost and time for travel to work by car, bus, and rail are different for each person depending on the location of home and work of that person.
Properties:

P_ni is between 0 and 1
where J is the total number of alternatives.
where N is the number of people making the choice.

Different models have different properties. Prominent models are introduced below.

Consumer utility

Discrete choice models can be derived from utility theory. This derivation is useful for three reasons:

It gives a precise meaning to the probabilities P_ni
It motivates and distinguishes alternative model specifications, e.g., the choice of a functional form for G.
It provides the theoretical basis for calculation of changes in consumer surplus from changes in the attributes of the alternatives.

U_ni is the utility that person n obtains from choosing alternative i. The behavior of the person is utility-maximizing: person n chooses the alternative that provides the highest utility. The choice of the person is designated by dummy variables, y_ni, for each alternative:
Consider now the researcher who is examining the choice. The person's choice depends on many factors, some of which the researcher observes and some of which the researcher does not. The utility that the person obtains from choosing an alternative is decomposed into a part that depends on variables that the researcher observes and a part that depends on variables that the researcher does not observe. In a linear form, this decomposition is expressed as
where

is a vector of observed variables relating to alternative i for person n that depends on attributes of the alternative, x_ni, interacted perhaps with attributes of the person, s_n, such that it can be expressed as for some numerical function z,
is a corresponding vector of coefficients of the observed variables, and
captures the impact of all unobserved factors that affect the person's choice.

The choice probability is then
Given β, the choice probability is the probability that the random terms, are below the respective quantities Different choice models arise from different distributions of ε_ni for all i and different treatments of β.

Properties of discrete choice models implied by utility theory

Only differences matter

The probability that a person chooses a particular alternative is determined by comparing the utility of choosing that alternative to the utility of choosing other alternatives:
As the last term indicates, the choice probability depends only on the difference in utilities between alternatives, not on the absolute level of utilities. Equivalently, adding a constant to the utilities of all the alternatives does not change the choice probabilities.

Scale must be normalized

Since utility has no units, it is necessary to normalize the scale of utilities. The scale of utility is often defined by the variance of the error term in discrete choice models. This variance may differ depending on the characteristics of the dataset, such as when or where the data are collected. Normalization of the variance therefore affects the interpretation of parameters estimated across diverse datasets.

Prominent types of discrete choice models

Discrete choice models can first be classified according to the number of available alternatives.
Multinomial choice models can further be classified according to the model specification:
In addition, specific forms of the models are available for examining rankings of alternatives and for ratings data.
Details for each model are provided in the following sections.

Binary choice

A. Logit with attributes of the person but no attributes of the alternatives

U_n is the utility that person n obtains from taking an action. The utility the person obtains from taking the action depends on the characteristics of the person, some of which are observed by the researcher and some are not. The person takes the action,, if U_n > 0. The unobserved term, ε_n, is assumed to have a logistic distribution. The specification is written succinctly as:

B. Probit with attributes of the person but no attributes of the alternatives

The description of the model is the same as model A, except the unobserved terms are distributed standard normal instead of logistic.
where is cumulative distribution function of standard normal.

C. Logit with variables that vary over alternatives

U_ni is the utility person n obtains from choosing alternative i. The utility of each alternative depends on the attributes of the alternatives interacted perhaps with the attributes of the person. The unobserved terms are assumed to have an extreme value distribution.
We can relate this specification to model A above, which is also binary logit. In particular, P_n1 can also be expressed as
Note that if two error terms are iid extreme value, their difference is distributed logistic, which is the basis for the equivalence of the two specifications.

D. Probit with variables that vary over alternatives

The description of the model is the same as model C, except the difference of the two unobserved terms are distributed standard normal instead of logistic.
Then the probability of taking the action is
where Φ is the cumulative distribution function of standard normal.

Multinomial choice without correlation among alternatives

E. Logit with attributes of the person but no attributes of the alternatives

The utility for all alternatives depends on the same variables, s_n, but the coefficients are different for different alternatives:

Since only differences in utility matter, it is necessary to normalize for one alternative. Assuming,
are iid extreme value

The choice probability takes the form
where J is the total number of alternatives.

F. Logit with variables that vary over alternatives (also called conditional logit)

The utility for each alternative depends on attributes of that alternative, interacted perhaps with attributes of the person:
where J is the total number of alternatives.
Note that model E can be expressed in the same form as model F by appropriate respecification of variables. Define where is the Kronecker delta and s_n are from model E. Then, model F is obtained by using
where J is the total number of alternatives.

Multinomial choice with correlation among alternatives

A standard logit model is not always suitable, since it assumes that there is no correlation in unobserved factors over alternatives. This lack of correlation translates into a particular pattern of substitution among alternatives that might not always be realistic in a given situation. This pattern of substitution is often called the Independence of Irrelevant Alternatives property of standard logit models. See the Red Bus/Blue Bus example in which this pattern does not hold, or the path choice example. A number of models have been proposed to allow correlation over alternatives and more general substitution patterns:

Nested Logit Model - Captures correlations between alternatives by partitioning the choice set into 'nests'
* Cross-nested Logit model - Alternatives may belong to more than one nest
* C-logit Model - Captures correlations between alternatives using 'commonality factor'
* Paired Combinatorial Logit Model - Suitable for route choice problems.
Generalized Extreme Value Model - General class of model, derived from the random utility model to which multinomial logit and nested logit belong
Conditional probit - Allows full covariance among alternatives using a joint normal distribution.
Mixed logit- Allows any form of correlation and substitution patterns. When a mixed logit is with jointly normal random terms, the models is sometimes called "multinomial probit model with logit kernel". Can be applied to route choice.

The following sections describe Nested Logit, GEV, Probit, and Mixed Logit models in detail.

G. Nested Logit and Generalized Extreme Value (GEV) models

The model is the same as model F except that the unobserved component of utility is correlated over alternatives rather than being independent over alternatives.

The marginal distribution of each ε_ni is extreme value, but their joint distribution allows correlation among them.
The probability takes many forms depending on the pattern of correlation that is specified. See Generalized Extreme Value.
H. Multinomial probit

The model is the same as model G except that the unobserved terms are distributed jointly normal, which allows any pattern of correlation and heteroscedasticity:
where is the joint normal density with mean zero and covariance.
The integral for this choice probability does not have a closed form, and so the probability is approximated by quadrature or .
When is the identity matrix, the model is called independent probit.

I. Mixed logit

Mixed Logit models have become increasingly popular in recent years for several reasons. First, the model allows β to be random in addition to ε. The randomness in β accommodates random taste variation over people and correlation across alternatives that generates flexible substitution patterns. Second, the advent in simulation has made approximation of the model fairly easy. In addition, McFadden and Train have shown that any true choice model can be approximated, to any degree of accuracy by a mixed logit with appropriate specification of explanatory variables and distribution of coefficients.

for any distribution, where is the set of distribution parameters to be estimated,
iid extreme value,

The choice probability is
where
is logit probability evaluated at with the total number of alternatives.
The integral for this choice probability does not have a closed form, so the probability is approximated by simulation.

Estimation from choices

Discrete choice models are often estimated using maximum likelihood estimation. Logit models can be estimated by logistic regression, and probit models can be estimated by probit regression. Nonparametric methods, such as the maximum score estimator, have been proposed.
Estimation of such models is usually done via parametric, semi-parametric and non-parametric maximum likelihood methods.

Estimation from rankings

In many situations, a person's ranking of alternatives is observed, rather than just their chosen alternative. For example, a person who has bought a new car might be asked what he/she would have bought if that car was not offered, which provides information on the person's second choice in addition to their first choice. Or, in a survey, a respondent might be asked:
The models described above can be adapted to account for rankings beyond the first choice. The most prominent model for rankings data is the exploded logit and its mixed version.

J. Exploded logit

Under the same assumptions as for a standard logit, the probability for a ranking of the alternatives is a product of standard logits. The model is called "exploded logit" because the choice situation that is usually represented as one logit formula for the chosen alternative is expanded to have a separate logit formula for each ranked alternative. The exploded logit model is the product of standard logit models with the choice set decreasing as each alternative is ranked and leaves the set of available choices in the subsequent choice.
Without loss of generality, the alternatives can be relabeled to represent the person's ranking, such that alternative 1 is the first choice, 2 the second choice, etc. The choice probability of ranking J alternatives as 1, 2,..., J is then
As with standard logit, the exploded logit model assumes no correlation in unobserved factors over alternatives. The exploded logit can be generalized, in the same way as the standard logit is generalized, to accommodate correlations among alternatives and random taste variation. The "mixed exploded logit" model is obtained by probability of the ranking, given above, for L_ni in the mixed logit model.
This model is also known in econometrics as the rank ordered logit model and it was introduced in that field by Beggs, Cardell and Hausman in 1981. One application is the Combes et al. paper explaining the ranking of candidates to become professor. It is also known as Plackett–Luce model in biomedical literature.

Ordered models

In surveys, respondents are often asked to give ratings, such as:
Or,
A multinomial discrete-choice model can examine the responses to these questions. However, these models are derived under the concept that the respondent obtains some utility for each possible answer and gives the answer that provides the greatest utility. It might be more natural to think that the respondent has some latent measure or index associated with the question and answers in response to how high this measure is. Ordered logit and ordered probit models are derived under this concept.

K. Ordered logit

Let U_n represent the strength of survey respondent n’s feelings or opinion on the survey subject. Assume that there are cutoffs of the level of the opinion in choosing particular response. For instance, in the example of the helping people facing foreclosure, the person chooses

1, if U_n < a
2, if a < U_n < b
3, if b < U_n < c
4, if c < U_n < d
5, if U_n > d,

for some real numbers a, b, c, d.
Defining Logistic, then the probability of each possible response is:
The parameters of the model are the coefficients β and the cut-off points, one of which must be normalized for identification. When there are only two possible responses, the ordered logit is the same a binary logit, with one cut-off point normalized to zero.

L. Ordered probit

The description of the model is the same as model K, except the unobserved terms have normal distribution instead of logistic.
The choice probabilities are :

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...