Language resource


In linguistics and language technology, a language resource is a ` of linguistic material used in the construction, improvement and/or evaluation of language processing applications, in language and language-mediated research studies and applications'.
According to Bird & Simons, this includes
  1. data, i.e. `any information that documents or describes a language, such as a published monograph, a computer data file, or even a shoebox full of handwritten index cards. The information could range in content from unanalyzed sound recordings to fully transcribed and annotated texts to a complete descriptive grammar',
  2. tools, i.e., `computational resources that facilitate creating, viewing, querying, or otherwise using language data', and
  3. advice, i.e., `any information about what data sources are reliable, what tools are appropriate in a given situation, what practices to follow when creating new data'. The latter aspect is usually referred to as `best practices' or ` standards'.
In a narrower sense, language resource is specifically applied to resources that are available in digital form, and then, `encompassing data sets in machine readable form, and tools/technologies/services used for their processing and management.'

Typology

As of May 2020, no widely used standard typology of language resources has been established. Important classes of language resources include
  1. data
  2. # lexical resources, e.g., machine-readable dictionaries,
  3. # linguistic corpora, i.e., digital collections of natural language data,
  4. # linguistic data bases such as the Cross-Linguistic Linked Data collection,
  5. tools
  6. # linguistic annotations and tools for creating such annotations in a manual or semiautomated fashion,
  7. # applications for search and retrieval over such data, for automated annotation,
  8. metadata and vocabularies
  9. # vocabularies, repositories of linguistic terminology and language metadata, e.g., MetaShare, the ISO 12620 data category registry, or the Glottolog database.

    Language resource publication, dissemination and creation

A major concern of the language resource community has been to develop infrastructures and platforms to present, discuss and disseminate language resources. Selected contributions in this regard include:
As for the development of standards and best practices for language resources, these are subject of several community groups and standardization efforts, including