Dialog manager
A dialog manager is a component of a dialog system, responsible for the state and flow of the conversation. Usually:
- The input to the DM is the human utterance, usually converted to some system-specific semantic representation by the Natural language understanding component. For example, in a flight-planning dialog system, the input may look like "ORDER".
- The DM usually maintains some state variables, such as the dialog history, the latest unanswered question, etc., depending on the system.
- The output of the DM is a list of instructions to other parts of the dialog system, usually in a semantic representation, for example "TELL". This semantic representation is usually converted to human language by the Natural language generation component.
The only thing common to all DMs is that they are stateful, in contrast to other parts of the DS, which are just stateless functions. The DM roles can roughly be divided into these groups:
- Input-control DMs, which enable context-dependent processing of the human utterances.
- Output-control DMs. which enable state-dependent generation of text.
- Strategic flow-control
- Tactic flow-control
Input-control DM
- Computer: Where do you want to depart from?
- * Human: Tel Aviv.
- Computer: Where do you want to arrive at?
- * Human: Gaza.
This function is on the border between NLU and DM: in some systems it's included in the NLU, such as the context-dependent rules of ; while in other systems it is included in the DM, such as the NP resolution module of .
Another function between the NLU and DM is, determining which input utterances are part of a single utterance. Here is an example from a job negotiation dialog:
- I offer a salary of 20,000 NIS
- and a car
- The pension conditions will be decided later
Output-control DM
The computer output may be made more natural, by remembering the dialog history. For example, allows the author to define question-answer pairs, such that for each question there are several possible answers. The DM selects the best answer for the question, unless it was already used, in which case it selects the 2nd best answer, etc.A similar feature exists in : Each time the DS uses a certain rule, the DM marks this rule as "used", so that it won't be used again.
A recent DS for technical assistance uses advanced machine-learned rules to select the best terms for describing items. For example, if the DM notices that it's speaking with an adult, it will use terms such as "the left hand"; if it notices that it's speaking with a child, it will use less technical terms such as "the hand where you wear the clock".
This function is on the border between DM and NLG.
Strategic flow-control DM
The main role of a DM is to decide what action the dialog agent should take at each point of the dialog.A simple way to do this is to let the author completely specify the dialog structure. For example, a specification of a tutorial dialog structure may look like:
- Computer: "What forces act on the electron?"
- * Human: "Electric force".
- ** Computer: "Correct"
- **
- Computer: "What forces act on the mass?"
- * Human: "Electric force".
- ** Computer: "Incorrect, the mass has no charge".
- **
There are many languages and frameworks that allow authors to specify dialog structures, such as: ,
, , and .
Additionally, the dialog structure can be described as a state-chart, using a standard language such as SCXML. This is done in .
It is quite tedious for authors to write a full dialog structure. There are many improvements that allow authors to describe a dialog in a higher abstraction level, while putting more burden on the DM.
Hierarchical structure
allows the author an advanced, multi-level dialog structure description, such as:- Room reservation task:
- * Login
- ** Ask user name
- ** Ask user password
- * Room selection
- ** Building selection
- ** Room number selection
- * Time selection
- * Finish
This structure encourages code reuse, for example, the login module can be used in other dialogs.
They also to allow dynamic dialog-task construction, where the structure is not fixed in advance but constructed on the fly, based on information selected from a backend. For instance, in a system that helps aircraft maintenance personnel throughout the execution of maintenance tasks, the structure of the dialog depends on the structure of the maintenance task and is constructed dynamically.
Topic tracking
Frameworks for chatter-bots, such as ChatScript, allow to control the conversation structure with topics. The author can create rules that capture the topic that- topic: CHILDHOOD
- t: I had a happy childhood.
- t: But it ended too early.
- ...
This, too, allows authors to reuse topics, and combine several independent topics to create a smarter chatter-bot.
Form Filling
A common use of dialog systems is as a replacement to forms. For example, a flight-reservation agent should ask the human about his origin time and place, and destination time and place - just as if the human is filling a form with these 4 slots.A simple solution is to use system-initiative, where the dialog system asks the user about each piece of information in turn, and the user must fill them in that exact order, like in this dialog :
- Welcome to the flight confirmation system. What is your flight number?
- * United 123 on August 8 from Los Angeles
- What is your departure city?
- * I told you, Los Angeles, on August 8
- I'm sorry, I didn't understand. What is your departure city?
- * Los Angeles leaving August 8th.
- What is the day of departure?
- * You don't listen! August 8!
- Please say the day of departure?
- * August 8
- Flight United 123 confirmed to depart Los Angeles for London at 2pm on August 8.
A common compromise between the two methods is mixed-initiative, where the system starts with asking questions, but users can barge in and change the dialog direction. The system understands the user even when he speaks about details he was not asked about yet.
However, describing such a system manually, as a state-chart, is very tedious, since the human may first say the origin and then the destination, or vice versa. In each of them, the human may first say the time and then the place, or vice versa.
So, there are DMs that allow the dialog author to just say what information is required, without specifying the exact order. For example, the author may write:
- TRAVEL =
Such DSs were developed in MIT, for example, Wheels, Jupiter, and more.
Simple DMs handle slot-filling binarily: either a slot is "filled", or it is "empty". More advanced DMs also keep track of the degree of grounding - how sure are we, that we really understood what the user said: whether it was "Just recently introduced", "Introduced again", "acknowledged", "repeated", etc. We can also allow the author to specify, for each piece of information, the degree to which we NEED it to be understood, e.g. sensitive information need higher degree. The DM uses this information to control the course of dialog, e.g., if the human said something about a sensitive subject, and we are not sure we understood, then the DM will issue a confirmation question. See .
Information state
The DS, developed during the project, allows authors to define a complex information state, and write general rules that process this state. Here is a sample rule:integrateAnswer:
- preconditions:
- * in
- * fst
- * relevant_answer
- effects:
- * pop
- * reduce
- * add
This may help authors re-use general rules for dialog management rules, based on dialog theories. DSs developed with TrindiKit include: GoDiS, MIDAS, EDIS and SRI Autorate.
The information state approach was developed later in projects such as and the toolkit.
Another example of an information state based dialogue manager is . It uses a propositional information state to encode the current state and selects the next action using a Markov decision process. This dialogue manager is implemented in the .
General Planning
A generalization of this approach is to let the author define the goals of the agent, and let the DM construct a plan to achieve that goal. The plan is made of operations. Each speech act is an operation. Each operation has preconditions and postconditions, for example:Inform:
- Precondition: Knows AND Wants
- Effect: Knows
- Body: Believes
A similar approach is taken in . Using SOAR allows the incorporation of complex emotional and social models, for example: the agent can decide, based on the human actions, whether he wants to cooperate with him, avoid him, or even attack him.
A similar approach is taken in . They split the dialog management into several modules:
- Reference manager - Given a word, decide what object in the world it refers to.
- Task manager - Identify the problem-solving acts that the user tries to achieve.
- Interpretation manager - in addition to calling the first two, also identify discourse obligations, for example: "respond to the latest question".
- Behavioral agent - decides how to accomplish the goal that the user wants. The agent employs several task-specific agents that do the actual planning.
- Grammatical Framework, see .
- IPSIM, in the Circuit Fixit system; see Smith, Hipp & Biermann.
Tactic flow-control DM
In addition to following the general structure and goals of the dialog, some DMs also make some tactical conversational decisions - local decisions that affect the quality of conversation.Error handling
The ASR and NLU modules are usually not 100% sure they understood the user; they usually return a confidence score reflecting the quality of understanding. In such cases, the DM should decide whether to:- Just assume that the most probable interpretation is correct, and continue the conversation ;
- Continue the conversation, but add some words that show understanding, such as "OK, you want to go to a restaurant. Where exactly?".
- Ask the user what exactly he intended to say : "Do you mean X?" "Did you say X or Y?", etc.
- Tell the user "I didn't understand, please say this again".
Error handling was researched extensively by , which allows the author to manually control the error handling strategy in each part of the dialog.
Initiative control
Some DSs have several modes of operation: the default mode is user-initiative, where the system just asks "what can I do for you?" and lets the user navigate the conversation. This is good for experienced users. However, if there are many misunderstandings between the user and the system, the DM may decide to switch to mixed-initiative or system-initiative - ask the user explicit questions, and accept one answer at a time.Pedagogical decisions
Tactical decisions of a different type are done by . In many points during the lesson, the DM should decide:- Whether to Tell the pupil some fact, or try to Elicit this fact from him by asking guiding questions.
- Whether to ask the pupil to Justify his answer, or just Skip the justification and continue.
Learned tactics
Instead of letting a human expert write a complex set of decision rules, it is more common to use reinforcement learning. The dialog is represented as a Markov Decision Process - a process where, in each state, the DM has to select an action, based on the state and the possible rewards from each action. In this setting, the dialog author should only define the reward function, for example: in tutorial dialogs, the reward is the increase in the student grade; in information seeking dialogs, the reward is positive if the human receives the information, but there is also a negative reward for each dialog step.RL techniques are then used to learn a policy, for example, what kind of confirmation should we use in each state? etc. This policy is later used by the DM in real dialogs.
A tutorial on this subject were written by .
A different way to learn dialog policies is to try to imitate humans, using Wizard of Oz experiments, in which a human sits in a hidden room and tells the computer what to say; see for example .