Matching wildcards

In computer science, an algorithm for matching wildcards is useful in comparing text strings that may contain wildcard syntax. Common uses of these algorithms include command-line interfaces, e.g. the Bourne shell or Microsoft Windows command-line or text editor or file manager, as well as the interfaces for some search engines and databases. Wildcard matching is a subset of the problem of matching regular expressions and string matching in general.

The problem

A wildcard matcher tests a wildcard pattern p against an input string s. It performs an anchored match, returns true only when p matches the entirety of s.
The pattern can be based on any common syntax, but on Windows programmers tend to only discuss a simplified syntax supported by the native C runtime:

No escape characters are defined
Wildcards: matches exactly one occurrence of any character. matches arbitrary many occurrences of any character.

This article mainly discusses the Windows formulation of the problem, unless otherwise stated.

Definition

Stated in zero-based indices, the wildcard-matching problem can be defined recursively as:
where m_ij is the result of matching the pattern p against the text t truncated at i and j characters respectively. This is the formulation used by Richter's algorithm and the Snippets algorithm found in Cantatore's collection. This description is similar to the Levenshtein distance.

Recursive algorithms

The recursion generally happens on matching * when there is more suffix to match against. This is a form of backtracking, also done by some regular expression matchers.

Rich Salz' wildmat algorithm
Filip's algorithm and Vignesh Murugesan's Algorithm
Martin Richter's algorithm
C library fnmatch implementations :
* Guido van Rossum's BSD libc fnmatch, also part of Apple libc
* Glibc fnmatch

The general form of these algorithms are the same. On recursion the algorithm slices the input into substrings, and considers a match to have happened when ONE of the substrings return a positive match. For, it would greedily call,, and. They usually differ by less-important things like support for features and by more important factors such as minor but highly effective optimizations. Some of them include:

The ABORT signal against over-recursion. While it is correct to naively recurse by the entire rest of the strings on * and making sure that ONE of the substrings return a positive match, the running time becomes exponential for rejecting a match with many * in the text. Lars Mathiesen changes the return to three classes, match, no-match, and ABORT The ABORT value is returned when the text is consumed too early or when another asterisk match has failed, guaranteeing a linear performance with respect to the number of asterisks. Git/Rsync's wildmatch ABORT also covers invalid inputs. The new INN uwildmat does the same.
Asterisk advancement in recursion. This wildmatch tweak is relatively more minor. It applies to when the recursion wants to match "*X" on "abcX": when an asterisk is followed by a literal like "X", it is obvious that only the last comparison with equal lengths would have a chance of producing a match. This is seen earlier in uwildmat in 2000 and more implicitly in van Rossum's fnmatch for.

Martin Richter's algorithm is an exception to this pattern, although the overall operation is equivalent. On * it recurses into increasing either of the indexes, following the dynamic programming formulation of the problem. The "ABORT" technique is applicable to it as well. On typical patterns it is slower than the greedy-call implementations.
The recursive algorithms are in general easier to reason about, and with the ABORT modification they perform acceptably in terms of worst-case complexity. On strings without * they take linear-to-string-size time to match since there is a fixed one-to-one relation.

Non-recursive algorithms

The following are developed by critics of the recursive algorithms:

Kirk J. Krauss's wildcard-matching algorithm, used by IBM
Alessandro Cantatore's collection of wildcard matching algorithms
Dogan Kurt's iterative matcher and slower NFA matcher.
Siler's algorithm

The following is not:

Jack Handy's algorithm

The iterative functions above implement backtracking by saving an old set of pattern/text pointers, and reverting to it should a match fails. According to Kurt, since only one successful match is required, only one such set needs to be saved.
In addition, the problem of wildcard matching can be converted into regular expression matching using a naive text-replacement approach. Although non-recursive regular expression matchers such as Thompson's construction are less used in practice due to lack of backreference support, wildcard matching in general does not come with a similarly rich set of features. The Russ Cox implementation of Thompson NFA can be trivially modified for such. Gustavo Navarro's -based nrgrep algorithm provides a more steamlined implementation with emphasis on efficient suffixes. See also.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Matching wildcards

The problem

Definition

Related problems

History

Recursive algorithms

Non-recursive algorithms