Java Speech API


The Java Speech API is an application programming interface for cross-platform support of command and control recognizers, dictation systems, and speech synthesizers. Although JSAPI defines an interface only there are several implementations created by third parties, for example FreeTTS.

Core technologies

Two core speech technologies are supported through the Java Speech API: speech synthesis and speech recognition.

Speech synthesis

Speech synthesis provides the reverse process of producing synthetic speech from text generated by an application, an applet, or a user. It is often referred to as text-to-speech technology.
The major steps in producing speech from text are as follows:
The result of these first two steps is a spoken form of the written text. Here are examples of the differences between written and spoken text:
St. Matthew's hospital is on Main St.
-> “Saint Matthew's hospital is on Main Street”
Add $20 to account 55374.
-> “Add twenty dollars to account five five, three seven four.”
The remaining steps convert the spoken text to speech:
Speech synthesizers can make errors in any of the processing steps described above. Human ears are well-tuned to detecting these errors, but careful work by developers can minimize errors and improve the speech output quality. While the Java Speech API 1 relied on the Java Speech API Markup Language, the newer release utilizes SSML to provide many ways for you to improve the output quality of a speech synthesizer.

Speech recognition

Speech recognition provides computers with the ability to listen to spoken language and determine what has been said. In other words, it processes audio input containing speech by converting it to text.
The major steps of a typical speech recognizer are as follows:
A grammar is an object in the Java Speech API that indicates what words a user is expected to say and in what patterns those words may occur. Grammars are important to speech recognizers because they constrain the recognition process. These constraints make recognition faster and more accurate because the recognizer does not have to check for bizarre sentences.
The Java Speech API 1 supports two basic grammar types: rule grammars and dictation grammars. These types differ in various ways, including how applications set up the grammars; the types of sentences they allow; how results are provided; the amount of computational resources required; and how they are used in application design. Rule grammars are defined in JSAPI 1 by JSGF, the Java Speech Grammar Format. The newer JSAPI 2 supports the more recent SRGS format. JSAPI 2 does not offer support for dictation.

The Java Speech API’s classes and interfaces

The different classes and interfaces that form the Java Speech API are grouped into the following three packages:
The EngineManager class is like a factory class that all Java Speech API applications use. It provides static methods to enable the access of speech synthesis and speech recognition engines. The Engine interface encapsulates the generic operations that a Java Speech API-compliant speech engine should provide for speech applications.
Speech applications can primarily use methods to perform actions such as retrieving the properties and state of the speech engine and allocating and deallocating resources for a speech engine. In addition, the Engine interface exposes mechanisms to pause and resume the audio stream generated or processed by the speech engine. Streams can be manipulated by the AudioManager. The Engine interface is subclassed by the Synthesizer and Recognizer interfaces, which define additional speech synthesis and speech recognition functionality. The Synthesizer interface encapsulates the operations that a Java Speech API-compliant speech synthesis engine should provide for speech applications.
The Java Speech API is based on event-handling. Events generated by the speech engine can be identified and handled as required. Speech events can be handled through the EngineListener interface, and more specifically through the RecognizerListener and the SynthesizerListener.

Related Specifications

The Java Speech API was written before the Java Community Process and targeted the Java Platform, Standard Edition. Subsequently, the Java Speech API 2 was created as under the JCP. This API targets the Java Platform, Micro Edition, but also complies with Java SE.