Thompson's construction

In computer science, Thompson's construction algorithm, also called the McNaughton-Yamada-Thompson algorithm, is a method of transforming a regular expression into an equivalent nondeterministic finite automaton. This NFA can be used to match strings against the regular expression. This algorithm is credited to Ken Thompson.
Regular expressions and nondeterministic finite automata are two representations of formal languages. For instance, text processing utilities use regular expressions to describe advanced search patterns, but NFAs are better suited for execution on a computer. Hence, this algorithm is of practical interest, since it can compile regular expressions into NFAs. From a theoretical point of view, this algorithm is a part of the proof that they both accept exactly the same languages, that is, the regular languages.
An NFA can be made deterministic by the powerset construction and then be minimized to get an optimal automaton corresponding to the given regular expression. However, an NFA may also be interpreted directly.
To decide whether two given regular expressions describe the same language, each can be converted into an equivalent minimal deterministic finite automaton via Thompson's construction, powerset construction, and DFA minimization. If, and only if, the resulting automata agree up to renaming of states, the regular expressions' languages agree.

The algorithm

The algorithm works recursively by splitting an expression into its constituent subexpressions, from which the NFA will be constructed using a set of rules. More precisely, from a regular expression, the obtained automaton with the transition function respects the following properties:

has exactly one initial state, which is not accessible from any other state. That is, for any state and any letter, does not contain.
has exactly one final state, which is not co-accessible from any other state. That is, for any letter,.
Let be the number of concatenation of the regular expression and let be the number of symbols apart from parentheses — that is,,, and. Then, the number of states of is .
The number of transitions leaving any state is at most two.
Since an NFA of states and at most transitions from each state can match a string of length in time, a Thompson NFA can do pattern matching in linear time, assuming a fixed-size alphabet.
Rules

The following rules are depicted according to Aho et al., p. 122. In what follows, N and N are the NFA of the subexpressions and, respectively.
The empty-expression ε is converted to
A symbol a of the input alphabet is converted to
The union expression | is converted to
State q goes via ε either to the initial state of N or N. Their final states become intermediate states of the whole NFA and merge via two ε-transitions into the final state of the NFA.
The concatenation expression st is converted to
The initial state of N is the initial state of the whole NFA. The final state of N becomes the initial state of N. The final state of N is the final state of the whole NFA.
The Kleene star expression ^* is converted to
An ε-transition connects initial and final state of the NFA with the sub-NFA N in between. Another ε-transition from the inner final to the inner initial state of N allows for repetition of expression according to the star operator.

The parenthesized expression is converted to N itself.

With these rules, using the empty expression and symbol rules as base cases, it is possible to prove with mathematical induction that any regular expression may be converted into an equivalent NFA.

Example

Two examples are now given, a small informal one with the result, and a bigger with a step by step application of the algorithm.

Small Example

The picture below shows the result of Thompson's construction on . The pink oval corresponds to a, the teal oval corresponds to a*, the green oval corresponds to b, the orange oval corresponds to a*b, and the blue oval corresponds to ε.

Application of the algorithm

As an example, the picture shows the result of Thompson's construction algorithm on the regular expression * that denotes the set of binary numbers that are multiples of 3:
The upper right part shows the logical structure of the expression, with "." denoting concatenation ; subexpressions are named - for reference purposes.
The left part shows the nondeterministic finite automaton resulting from Thompson's algorithm, with the and state of each subexpression colored in and, respectively.
An ε as transition label is omitted for clarity — unlabelled transitions are in fact ε transitions.
The entry and exit state corresponding to the root expression is the start and accept state of the automaton, respectively.
The algorithm's steps are as follows:
An equivalent minimal deterministic automaton is shown below.

Relation to other algorithms

Thompson's is one of several algorithms for constructing NFAs from regular expressions; an earlier algorithm was given by McNaughton and Yamada. Converse to Thompson's construction, Kleene's algorithm transforms a finite automaton into a regular expression.
Glushkov's construction algorithm is similar to Thompson's construction, once the ε-transitions are removed.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...