Text Retrieval Conference

The Text REtrieval Conference is an ongoing series of workshops focusing on a list of different information retrieval research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology and the Intelligence Advanced Research Projects Activity, and began in 1992 as part of the TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies and to increase the speed of lab-to-product transfer of technology.
Each track has a challenge wherein NIST provides participating groups with data sets and test problems. Depending on track, test problems might be questions, topics, or target extractable features. Uniform scoring is performed so the systems can be fairly evaluated. After evaluation of the results, a workshop provides a place for participants to collect together thoughts and ideas and present current and future research work.Text Retrieval Conference started in 1992, funded by DARPA and Run by NIST. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies.

Goals

Encourage retrieval search based on large text collections
Increase communication among industry,academia, and government by creating an open forum for the exchange of research ideas
Speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements retrieval methodologies on real world problems
To increase the availability of appropriate evaluation techniques for use by industry and academia including development of new evaluation techniques more applicable to current systems

TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provide a set of documents and questions. Participants run their own retrieval system on the data and return to NIST a list of retrieved top-ranked documents.NIST pools the individual result judges the retrieved documents for correctness and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences.

Relevance judgments in TREC

TREC uses binary relevance criterion that is either the document is relevant or not relevant. Since size of TREC collection is large, it is impossible to calculate the absolute recall for each query. In order to assess the relevance of documents in relation to a query, TREC uses a specific method call pooling for calculating relative recall. All the relevant documents that occurred in the top 100 documents for each system and for each query are combined together to produce a pool of relevant documents. Recall being the proportion of the pool of relevant documents that a single system retrieved for a query topic.

Various TRECs

In 1992 TREC-1 was held at NIST. The first conference attracted 28 groups of researchers from academia and industry. It demonstrated a wide range of different approaches to the retrieval of text from large document collections.Finally TREC1 revealed the facts that automatic construction of queries from natural language query statements seems to work. Techniques based on natural language processing were no better no worse than those based on vector or probabilistic approach.
TREC2 Took place in august 1993. 31 group of researchers where participated in this. Two types of retrieval were examined. Retrieval using an ‘ad hoc’query and retrieval using a ‘routing query.
In TREC-3 a small group experiments worked with Spanish language collection and others dealt with interactive query formulation in multiple databases.
TREC-4 they made even shorter to investigate the problems with very short user statements
TREC-5 includes both short and long versions of the topics with the goal of carrying out deeper investigation into which types of techniques work well on various lengths of topics.
In TREC-6 Three new tracks speech, cross language, high precision information retrieval were introduced. The goal of cross language information retrieval is to facilitate research on system that are able to retrieve relevant document regardless of language of the source document.
TREC-7 contained seven tracks out of which two were new Query track and very large corpus track. The goal of the query track was to create a large query collection.
TREC-8 contain seven tracks out of which two –question answering and web tracks were new. The objective of QA query is to explore the possibilities of providing answers to specific natural language queries
TREC-9 Includes seven tracks
In TREC-10 Video tracks introduced Video tracks design to promote research in content based retrieval from digital video.
In TREC-11Novelity tracks introduced. The goal of novelty track is to investigate systems abilities to locate relevant and new information within the ranked set of documents returned by a traditional document retrieval system.
TREC-12 held in 2003 added three new tracks Genome track, robust retrieval track, HARD (Highly Accurate Retrieval from Documents

Tracks

Current tracks

New tracks are added as new research needs are identified, this list is current for TREC 2018.

- Goal: run in parallel CLEF 2018, NTCIR-14, TREC 2018 to develop and tune an IR reproducibility evaluation protocol.
- Goal: an ad hoc search task over news documents.
- Goal: to develop systems capable of answering complex information needs by collating information from an entire corpus.
- Goal: to research technologies to automatically process social media streams during emergency situations.
- Goal: partnership with The Washington Post to develop test collections in news environment.
- Goal: a specialization of the Clinical Decision Support track to focus on linking oncology patient data to clinical trials.
- Goal: to explore techniques for real-time update summaries from social media streams.
Past tracks
Chemical Track - Goal: to develop and evaluate technology for large scale search in chemistry-related documents, including academic papers and patents, to better meet the needs of professional searchers, and specifically patent searchers and chemists.
- Goal: to investigate techniques for linking medical cases to information relevant for patient care
- Goal: to investigate search techniques for complex information needs that are highly dependent on context and user interests.
Crowdsourcing Track - Goal: to provide a collaborative venue for exploring crowdsourcing methods both for evaluating search and for performing search tasks.
Genomics Track - Goal: to study the retrieval of genomic data, not just gene sequences but also supporting documentation such as research papers, lab reports, etc. Last ran on TREC 2007.
- Goal: to investigate domain-specific search algorithms that adapt to the dynamic information needs of professional users as they explore in complex domains.
Enterprise Track - Goal: to study search over the data of an organization to complete some task. Last ran on TREC 2008.
Entity Track - Goal: to perform entity-related search on Web data. These search tasks address common information needs that are not that well modeled as ad hoc document search.
Cross-Language Track - Goal: to investigate the ability of retrieval systems to find documents topically regardless of source language. After 1999, this track spun off into CLEF.
FedWeb Track - Goal: to select best resources to forward a query to, and merge the results so that most relevant are on the top.
Federated Web Search Track - Goal: to investigate techniques for the selection and combination of search results from a large number of real on-line web search services.
Filtering Track - Goal: to binarily decide retrieval of new incoming documents given a stable information need.
HARD Track - Goal: to achieve High Accuracy Retrieval from Documents by leveraging additional information about the searcher and/or the search context.
Interactive Track - Goal: to study user interaction with text retrieval systems.
Knowledge Base Acceleration Track - Goal: to develop techniques to dramatically improve the efficiency of knowledge base curators by having the system suggest modifications/extensions to the KB based on its monitoring of the data streams.
Legal Track - Goal: to develop search technology that meets the needs of lawyers to engage in effective discovery in digital document collections.
- Goal: to generate answers to real questions originating from real users via a live question stream, in real time.
Medical Records Track - Goal: to explore methods for searching unstructured information found in patient medical records.
Microblog Track - Goal: to examine the nature of real-time information needs and their satisfaction in the context of microblogging environments such as Twitter.
Natural language processing Track - Goal: to examine how specific tools developed by computational linguists might improve retrieval.
Novelty Track - Goal: to investigate systems' abilities to locate new information.
- Goal: to explore an evaluation paradigm for IR that involves real users of operational search engines. For this first year of the track the task will be ad hoc Academic Search.
Question Answering Track - Goal: to achieve more information retrieval than just document retrieval by answering factoid, list and definition-style questions.
Real-Time Summarization Track - Goal: to explore techniques for constructing real-time update summaries from social media streams in response to users' information needs.
Robust Retrieval Track - Goal: to focus on individual topic effectiveness.
Relevance Feedback Track - Goal: to further deep evaluation of relevance feedback processes.
Session Track - Goal: to develop methods for measuring multiple-query sessions where information needs drift or get more or less specific over the session.
Spam Track - Goal: to provide a standard evaluation of current and proposed spam filtering approaches.
- Goal: to test whether systems can induce the possible tasks users might be trying to accomplish given a query.
Temporal Summarization Track - Goal: to develop systems that allow users to efficiently monitor the information associated with an event over time.
Terabyte Track - Goal: to investigate whether/how the IR community can scale traditional IR test-collection-based evaluation to significantly large collections.
- Goal:: to evaluate methods to achieve very high recall, including methods that include a human assessor in the loop.
Video Track - Goal: to research in automatic segmentation, indexing, and content-based retrieval of digital video.
Web Track - Goal: to explore information seeking behaviors common in general web search.
Related events

In 1997, a Japanese counterpart of TREC was launched, called , and in 2000, CLEF, a European counterpart, specifically vectored towards the study of cross-language information retrieval was launched. Forum for Information Retrieval Evaluation started in 2008 with the aim of building a South Asian counterpart for TREC, CLEF, and NTCIR,

Conference contributions to search effectiveness

NIST claims that within the first six years of the workshops, the effectiveness of retrieval systems approximately doubled. The conference was also the first to hold large-scale evaluations of non-English documents, speech, video and retrieval across languages. Additionally, the challenges have inspired a large body of . Technology first developed in TREC is now included in many of the world's commercial search engines. An independent report by RTII found that "about one-third of the improvement in web search engines from 1999 to 2009 is attributable to TREC. Those enhancements likely saved up to 3 billion hours of time using web search engines.... Additionally, the report showed that for every $1 that NIST and its partners invested in TREC, at least $3.35 to $5.07 in benefits were accrued to U.S. information retrieval researchers in both the private sector and academia."
While one study suggests that the state of the art for ad hoc search has not advanced substantially in the past decade, it is referring just to search for topically relevant documents in small news and web collections of a few gigabytes. There have been advances in other types of ad hoc search in the past decade. For example, test collections were created for known-item web search which found improvements from the use of anchor text, title weighting and url length, which were not useful techniques on the older ad hoc test collections. In 2009, a new billion-page web collection was introduced, and spam filtering was found to be a useful technique for ad hoc web search, unlike in past test collections.
The test collections developed at TREC are useful not just for helping researchers advance the state of the art, but also for allowing developers of new retrieval products to evaluate their effectiveness on standard tests. In the past decade, TREC has created new tests for enterprise e-mail search, genomics search, spam filtering, e-Discovery, and several other retrieval domains.
TREC systems often provide a baseline for further research. Examples include:

Hal Varian, Chief Economist at Google, says ''Better data makes for better science. The history of information retrieval illustrates this principle well," and describes TREC's contribution.
TREC's Legal track has influenced the e-Discovery community both in research and in evaluation of commercial vendors.
The IBM researcher team building IBM Watson, which beat the world's best Jeopardy! players, used data and systems from TREC's QA Track as baseline performance measurements.
Participation

The conference is made up of a varied, international group of researchers and developers. In 2003, there were 93 groups from both academia and industry from 22 countries participating.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...