Stop words


In computing, stop words are words which are filtered out before or after processing of natural language data. Though "stop words" usually refers to the most common words in a language, there is no single universal list of stop words used by all natural language processing tools, and indeed not all tools even use such a list. Some tools specifically avoid removing these stop words to support phrase search.
Any group of words can be chosen as the stop words for a given purpose. For some search engines, these are some of the most common, short function words, such as the, is, at, which, and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as "The Who", "The The", or "Take That". Other search engines remove some of the most common words—including lexical words, such as "want"—from a query in order to improve performance.
Hans Peter Luhn, one of the pioneers in information retrieval, is credited with coining the phrase and using the concept. The phrase "stop word", which is not in Luhn's 1959 presentation, and the associated terms "stop list" and "stoplist" appear in the literature shortly afterward.
A predecessor concept was used in creating some concordances. For example, the first Hebrew concordance, Me’ir nativ, contained a one-page list of unindexed words, with nonsubstantive prepositions and conjunctions which are similar to modern stop words.
In SEO terminology, stop words are the most common words that most search engines avoid, saving space and time in processing large data during crawling or indexing. This helps search engines to save space in their databases.