Document structuring

Document Structuring is a subtask of Natural language generation, which involves deciding the order and grouping of sentences in a generated text. It is closely related to the Content determination NLG task.

Example

Assume we have four sentences which we want to include in a generated text

It will rain on Saturday
It will be sunny on Sunday
Max temperature will be 10 °C on Saturday
Max temperature will be 15 °C on Sunday

There are 24 orderings of these messages, including

It will rain on Saturday. It will be sunny on Sunday. Max temperature will be 10 °C on Saturday. Max temperature will be 15 °C on Sunday.
It will be sunny on Sunday. Max temperature will be 10 °C on Saturday. Max temperature will be 15 °C on Sunday. It will rain on Saturday.
Max temperature will be 15 °C on Sunday. Max temperature will be 10 °C on Saturday. It will be sunny on Sunday. It will rain on Saturday.

Some of these orderings are better than others. For example, of the texts shown above, human readers prefer over and.
For any ordering, there are also many ways in which sentences can be grouped into paragraphs and higher-level structures such as sections. For example, there are 8 ways in which the sentences in can be grouped into paragraphs, including

As with ordering, human readers prefer some groupings over others; for example, is preferred over.
The document structuring task is to choose an ordering and grouping of sentences which results in a coherent and well-organised text from the reader's perspective.

Algorithms and models

There are three basic approaches to document structuring: schemas, corpus-based, and heuristic.
Schemas are templates which explicitly specify sentence ordering and grouping for a document. Typically they are constructed by manually analysing a corpus of human-written texts in the target genre, and extracting a document template from these texts. Schemas work well in practice for texts which are short and/or have a standardised structure, but have problems in generating texts which are longer and do not have a fixed structure.
Corpus-based structuring techniques use statistical corpus analysis techniques to automatically build ordering and/or grouping models. Such techniques are common in Automatic summarisation, where a computer program automatically generates a summary of a textual document. In principle they could be applied to text generated from non-linguistic data, but this work is in its infancy; part of the challenge is that texts generated by Natural Language Generation systems are generally expected to be of fairly high quality, which is not always the case for texts generated by automatic summarisation systems.
The final approach is heuristic-based structuring. Such algorithms perform the structuring task based on heuristic rules, which can come from theories of rhetoric,
psycholinguistic models, and/or a combination of intuition and feedback from pilot experiments with potential users. Heuristic-based structuring is appealing intellectually, but it can be difficult to get it to work well in practice, in part because heuristics often depend on semantic information which is not always available. On the other hand, heuristic rules can focus on what is best for text readers, whereas the other approaches focus on imitating authors.

Narrative

Perhaps the ultimate document structuring challenge is to generate a good narrative—in other words, a text which starts by setting the scene and giving an introduction/overview; then describes a set of events in a clear fashion so readers can easily see how the individual events are related and link together; and concludes with a summary/ending. Note that narrative in this sense applies to factual texts as well as stories. Current NLG systems do not do a good job of generating narratives, and this is a major source of user criticism.
Generating good narratives is a challenge for all aspects of NLG, but the most fundamental challenge is probably in document structuring.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...