Corpus-assisted discourse studies

Corpus-assisted discourse studies, or CADS, is related historically and methodologically to the discipline of corpus linguistics. The principal endeavor of corpus-assisted discourse studies is the investigation, and comparison of features of particular discourse types, integrating into the analysis the techniques and tools developed within corpus linguistics. These include the compilation of specialised corpora and analyses of word and word-cluster frequency lists, comparative keyword lists and, above all, concordances.
A broader conceptualisation of corpus-assisted discourse studies would include any study that aims to bring together corpus linguistics and discourse analysis. Such research is often labelled as corpus-based or corpus-assisted discourse analysis, with the term CADS coined by a research group in Italy for a specific type of corpus-based discourse analysis.

Aims

Corpus-assisted discourse studies aim to uncover non-obvious meaning, that is, meaning which might not be readily available to naked-eye perusal. Much of what carries meaning in texts is not open to direct observation: “you cannot understand the world just by looking at it”. We use language “semi-automatically”, in the sense that speakers and writers make semi-conscious choices within the various complex overlapping systems of which language is composed, including those of transitivity, modality, lexical sets, modification, and so on. Authors themselves are, famously, generally unaware of all the meanings their texts convey. By combining the quantitative research approach, that is, statistical analysis of large amounts of the discourse in question - more precisely, large numbers of tokens of the discourse type under study contained in a corpus - with the more qualitative research approach typical of discourse analysis, that is, the close, detailed examination of particular stretches of discourse it may be possible to better understand the processes at play in the discourse type and to gain access to non-obvious meanings.
Aims can differ in other types of corpus-based or corpus-assisted discourse analysis; but in general such studies combine quantitative and qualitative research and aim to shed light on discourses, registers, discourse patterns, etc., with the help of a corpus linguistic approach. Specific aims and techniques depend on the relevant project.

In different countries

In German-speaking countries: Pioneering work in corpus-based discourse analysis was conducted in Europe, in particular by Hardt-Mautner/Mautner and Stubbs. CADS and other types of corpus-based discourse analysis are inspired by this important early work.
In Italy: A considerable body of research has been conducted in Italy either by individual researchers or under the aegis of combined inter-university projects such as Newspool and CorDis. It has concentrated on political and media language, mainly because a nucleus of linguists in Italian universities work in Political Science faculties and are increasingly interested in the use of corpus techniques to conduct a particular type of sociopolitical discourse analysis, including the unearthing of noteworthy ideological metaphors and motifs in the language of political figures and institutions. Italian researchers also developed Modern diachronic corpus-assisted discourse studies . This approach contrasts the language contained in comparable corpora from different but recent points in time in order to track changes in modern language usage but also social, cultural and political changes over modern times, as reflected - and shared among people - in language. It is this Italian body of research that makes most use of the label CADS.
In the UK: Linguists in the UK tend to undertake corpus-based critical discourse analysis. CDA generally adopts a leftist political stance, focusing on the ways that social and political domination is reproduced by text and talk. This type of corpus-based research was originally associated with Lancaster University, but has spread more widely since. Such work typically studies the discourses around particular groups of people or concepts/events.
In Australia: Corpus-based discourse analysis is undertaken by a growing number of Australian researchers, most often on media texts. Some of this work aims to elucidate specific features of discourse types, while other work is rooted in the tradition of corpus-based critical discourse analysis.
Comparison with traditional corpus linguistics

Traditional corpus linguistics has, quite naturally, tended to privilege the quantitative approach. In the drive to produce more authentic dictionaries and grammars of a language, it has been characterised by the compilation of some very large corpora of heterogeneric discourse types in the desire to obtain an overview of the greatest quantity and variety of discourse types possible, in other words, of the chimerical but useful fiction called the “general language”. This has led to the construction of immensely valuable research tools such as the Bank of English and the British National Corpus. Some branches of corpus linguistics have also promoted an approach that is "corpus-driven", in which we need, grammatically speaking, a mental tabula rasa to free ourselves of the baleful prejudice exerted by traditional models and allow the data to speak entirely for itself.
The aim of corpus-assisted discourse studies and related approaches is radically different. Here the aim of the exercise is to acquaint oneself as much as possible with the discourse type in hand. Researchers typically engage with their corpus in a variety of ways. As well as via wordlists and concordancing, intuitions for further research can also arise from reading or watching or listening to parts of the data-set, a process which can help provide a feel for how things are done linguistically in the discourse-type being studied.
Corpus-assisted discourse analysis is also typically characterised by the compilation of ad hoc specialised corpora, since very frequently there exists no previously available collection of the discourse type in question. Often, other corpora are utilized in the course of a study for purposes of comparison. These may include pre-existing corpora or may themselves need to be compiled by the researcher. In some sense, all work with corpora – just as all work with discourse - is properly comparative. Even when a single corpus is employed, it is used to test the data it contains against another body of data. This may consist of the researcher's intuitions, or the data found in reference works such as dictionaries and grammars, or it may be statements made by previous authors in the field.

CADS as a specific type of corpus-based discourse analysis

Researchers in Italy have developed CADS as a specific type of corpus-based discourse analysis, creating a standard set of methods:
'A basic, standard methodology in CADS may resemble the following:'

Step 1: Decide upon the research question;
Step 2: Choose, compile or edit an appropriate corpus;
Step 3: Choose, compile or edit an appropriate reference corpus / corpora;
Step 4: Make frequency lists and run a keywords comparison of the corpora;
Step 5: Determine the existence of sets of key items;
Step 6: Concordance interesting key items ;
Step 7: refine the research question and return to Step 2.

This basic procedure can of course vary according to individual research circumstances and requirements.
A particular way of conceptualising research questions has also been proposed in such CADS projects:

Given that P is a discourse participant and G is a goal, often a political goal:

How does P achieve G with language?
What does this tell us about P?
Comparative studies: how do P1 and P2 differ in their use of language? Does this tell us anything about their different principles and objectives?

A second general type of CADS research question, which might be asked of interactive discourse data, has been conceptualised as follows:

Given that P is a particular participant or set of participants, DT is the discourse type, and R is an observed relationship between or among participants:
How do achieve / maintain R in DT ?

Another common type of research question has been conceptualised thus:

Given that A is an author, Ph is a phenomenon or practice or behaviour, and DT is a particular discourse type.
A has said P is the case in DT
Is Ph the case in DT?

This is a classic “hypothesis-testing” research question: we test the hypothesis that whatever practice has been observed by a previous author in some discourse type will be observable in another. It is a process we might call para-replication, that is, the replication of an experiment with either a fresh set of texts of the same discourse type or of a related discourse type, “in order to see whether were an artefact of one single data set”.
A final example of conceptualising a CADS research question is the following:

Given that P is a participant or category thereof, and LF is a particular language feature:
Do and use LF in the same way?

Such research aims to ascertain whether different participants use a particular linguistic feature in the same or different ways. The research may proceed to attempt to explain why this is the case.

Some research to date

Studies that bring together corpus linguistics and discourse analysis include the following:

How ideas about groups of people and race are constructed and disseminated through repeated language use.
A study of German loan words in English and their connection to cultural stereotyping.
Analyses of the language of the Euro-sceptic debate in the UK.
The typical language strategies, metaphors and motifs used by journalists and spokespersons in US press conferences, and how these reflect their respective world-views.
How prediction is effected in economic texts, that is, how economic forecasts are presented and hedged.
How government witnesses in the Hutton Inquiry constructed their professional identity.
The typical language features of US television series and how they are similar or different to unscripted conversation.
How speakers use appraisal on Twitter to create shared communities.
How Islam and Muslims are represented in the British press.
The CorDis project. How the conflict in Iraq was discussed and reported in the Senate and Parliament, in US press briefings and the Hutton Inquiry, in US/UK newspapers and TV news.
The SiBol project analysed the differences between two corpora of UK quality newspaper texts, the first dating from 1993, the second from 2005.
The Intune project 2004-9. An EU-funded venture to investigate how the press in France, Italy, Poland and the UK represent issues relating to European citizenship and identity .

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Corpus-assisted discourse studies

Aims

In different countries

Comparison with traditional corpus linguistics

CADS as a specific type of corpus-based discourse analysis

Some research to date