The algorithm works recursively by splitting an expression into its constituent subexpressions, from which the NFA will be constructed using a set of rules. More precisely, from a regular expression, the obtained automaton with the transition function respects the following properties:
has exactly one initial state, which is not accessible from any other state. That is, for any state and any letter, does not contain.
has exactly one final state, which is not co-accessible from any other state. That is, for any letter,.
Let be the number of concatenation of the regular expression and let be the number of symbols apart from parentheses — that is,,, and. Then, the number of states of is .
The number of transitions leaving any state is at most two.
Since an NFA of states and at most transitions from each state can match a string of length in time, a Thompson NFA can do pattern matching in linear time, assuming a fixed-size alphabet.
Rules
The following rules are depicted according to Aho et al., p. 122. In what follows, N and N are the NFA of the subexpressions and, respectively. The empty-expression ε is converted to A symbola of the input alphabet is converted to The union expression | is converted to State q goes via ε either to the initial state of N or N. Their final states become intermediate states of the whole NFA and merge via two ε-transitions into the final state of the NFA. The concatenation expressionst is converted to The initial state of N is the initial state of the whole NFA. The final state of N becomes the initial state of N. The final state of N is the final state of the whole NFA. The Kleene star expression* is converted to An ε-transition connects initial and final state of the NFA with the sub-NFA N in between. Another ε-transition from the inner final to the inner initial state of N allows for repetition of expression according to the star operator.
The parenthesized expression is converted to N itself.
With these rules, using the empty expression and symbol rules as base cases, it is possible to prove with mathematical induction that any regular expression may be converted into an equivalent NFA.
Example
Two examples are now given, a small informal one with the result, and a bigger with a step by step application of the algorithm.
Small Example
The picture below shows the result of Thompson's construction on . The pink oval corresponds to a, the teal oval corresponds to a*, the green oval corresponds to b, the orange oval corresponds to a*b, and the blue oval corresponds to ε.
Application of the algorithm
As an example, the picture shows the result of Thompson's construction algorithm on the regular expression * that denotes the set of binary numbers that are multiples of 3: The upper right part shows the logical structure of the expression, with "." denoting concatenation ; subexpressions are named - for reference purposes. The left part shows the nondeterministic finite automaton resulting from Thompson's algorithm, with the and state of each subexpression colored in and, respectively. An ε as transition label is omitted for clarity — unlabelled transitions are in fact ε transitions. The entry and exit state corresponding to the root expression is the start and accept state of the automaton, respectively. The algorithm's steps are as follows: An equivalent minimal deterministic automaton is shown below.
Relation to other algorithms
Thompson's is one of several algorithms for constructing NFAs from regular expressions; an earlier algorithm was given by McNaughton and Yamada. Converse to Thompson's construction, Kleene's algorithm transforms a finite automaton into a regular expression. Glushkov's construction algorithm is similar to Thompson's construction, once the ε-transitions are removed.