Deep linguistic processing is a natural language processing framework which draws on theoretical and descriptive linguistics. It models language predominantly by way of theoretical syntactic/semantic theory. Deep linguistic processing approaches differ from "shallower" methods in that they yield more expressive and structural representations which directly capture long-distance dependencies and underlying predicate-argument structures.
The knowledge-intensive approach of deep linguistic processing requires considerable computational power, and has in the past sometimes been judged as being intractable. However, research in the early 2000s had made considerable advancement in efficiency of deep processing. Today, efficiency is no longer a major problem for applications using deep linguistic processing.
Contrast to "shallow linguistic processing"
Traditionally, deep linguistic processing has been concerned with computational grammar development. These grammars were manually developed, maintained and were computationally expensive to run. In recent years, machine learning approaches have fundamentally altered the field of natural language processing. The rapid creation of robust and wide-coverage machine learning NLP tools requires substantially lesser amount of manual labor. Thus deep linguistic processing methods have received less attention. However, it is the belief of some computational linguists that in order for computers to understand natural language or inference, detailed syntactic and semantic representation is necessary. Moreover, while humans can easily understand a sentence and its meaning, shallow linguistic processing might lack human language 'understanding'. For example:
In sentence, a shallow information extraction system might infer wrongly that Microsoft's headquarters was located in Georgia. While as humans, we understand from the sentence that Microsoft office was never in Georgia.
In sentence, a shallow system could wrongly infer that Israel was established in May 1971. Humans know that it is the National Institute for Psychobiology that was established in 1971.
In summary of the comparison between deep and shallow language processing, deep linguistic processing provides a knowledge-rich analysis of language through manually developed grammars and language resources. Whereas, shallow linguistic processing provides a knowledge-lean analysis of language through statistical/machine learning manipulation of texts and/orannotated linguistic resource.
Sub-communities
"Deep" computational linguists are divided in different sub-communities based on the grammatical formalism they adopted for deep linguistic processing. The major sub-communities includes the:
DEep Linguistic Processing with HPSG - INitiative collaboration working with the HPSG formalism. The is the central conference to share knowledge/advancement of HPSG based deep processing.
is international collaboration on LFG-based grammar and semantics development. The is the central conference to share knowledge/advancement of LFG based deep processing.
XTAG Research group working with the TAG formalism. The is the central conference to share knowledge/advancement of TAG based deep processing.
The shortlist above is not exhaustively representative of all the communities working on deep linguistic processing.