Google Chrome Extensions

chrome.ttsEngine

Use the chrome.ttsEngine module to implement a text-to-speech (TTS) engine using an extension. If your extension registers using this API, it will receive events containing an utterance to be spoken and other parameters when any extension or packaged app uses the tts module to generate speech. Your extension can then use any available web technology to synthesize and output the speech, and send events back to the calling function to report the status.

Overview

An extension can register itself as a speech engine. By doing so, it can intercept some or all calls to functions such as speak() and stop() and provide an alternate implementation. Extensions are free to use any available web technology to provide speech, including streaming audio from a server, HTML5 audio, Native Client, or Flash. An extension could even do something different with the utterances, like display closed captions in a pop-up window or send them as log messages to a remote server.

Manifest

To implement a TTS engine, an extension must declare the "ttsEngine" permission and then declare all voices it provides in the extension manifest, like this:

{
  "name": "My TTS Engine",
  "version": "1.0",
  "permissions": ["ttsEngine"],
  "tts_engine": {
    "voices": [
      {
        "voice_name": "Alice",
        "lang": "en-US",
        "gender": "female",
        "event_types": ["start", "marker", "end"]
      },
      {
        "voice_name": "Pat",
        "lang": "en-US",
        "event_types": ["end"]
      }
    ]
  },
  "background": {
    "page": "background.html",
    "persistent": false
  }
}

An extension can specify any number of voices.

The voice_name parameter is required. The name should be descriptive enough that it identifies the name of the voice and the engine used. In the unlikely event that two extensions register voices with the same name, a client can specify the ID of the extension that should do the synthesis.

The gender parameter is optional. If your voice corresponds to a male or female voice, you can use this parameter to help clients choose the most appropriate voice for their application.

The lang parameter is optional, but highly recommended. Almost always, a voice can synthesize speech in just a single language. When an engine supports more than one language, it can easily register a separate voice for each language. Under rare circumstances where a single voice can handle more than one language, it's easiest to just list two separate voices and handle them using the same logic internally. However, if you want to create a voice that will handle utterances in any language, leave out the lang parameter from your extension's manifest.

Finally, the event_types parameter is required if the engine can send events to update the client on the progress of speech synthesis. At a minimum, supporting the 'end' event type to indicate when speech is finished is highly recommended, otherwise Chrome cannot schedule queued utterances.

Note: If your TTS engine does not support the 'end' event type, Chrome cannot queue utterances because it has no way of knowing when your utterance has finished. To help mitigate this, Chrome passes an additional boolean enqueue option to your engine's onSpeak handler, giving you the option of implementing your own queueing. This is discouraged because then clients are unable to queue utterances that should get spoken by different speech engines.

The possible event types that you can send correspond to the event types that the speak() method receives:

The 'interrupted' and 'cancelled' events are not sent by the speech engine; they are generated automatically by Chrome.

Text-to-speech clients can get the voice information from your extension's manifest by calling getVoices(), assuming you've registered speech event listeners as described below.

Handling speech events

To generate speech at the request of clients, your extension must register listeners for both onSpeak and onStop, like this:

var speakListener = function(utterance, options, sendTtsEvent) {
  sendTtsEvent({'event_type': 'start', 'charIndex': 0})

  // (start speaking)

  sendTtsEvent({'event_type': 'end', 'charIndex': utterance.length})
};

var stopListener = function() {
  // (stop all speech)
};

chrome.ttsEngine.onSpeak.addListener(speakListener);
chrome.ttsEngine.onStop.addListener(stopListener);

Important: If your extension does not register listeners for both onSpeak and onStop, it will not intercept any speech calls, regardless of what is in the manifest.

The decision of whether or not to send a given speech request to an extension is based solely on whether the extension supports the given voice parameters in its manifest and has registered listeners for onSpeak and onStop. In other words, there's no way for an extension to receive a speech request and dynamically decide whether to handle it.

API Reference: chrome.ttsEngine

Events

onSpeak

chrome.ttsEngine.onSpeak.addListener(function(string utterance, object options) {...});

Called when the user makes a call to tts.speak() and one of the voices from this extension's manifest is the first to match the options object.

Listener Parameters

utterance ( string )
The text to speak, specified as either plain text or an SSML document. If your engine does not support SSML, you should strip out all XML markup and synthesize only the underlying text content. The value of this parameter is guaranteed to be no more than 32,768 characters. If this engine does not support speaking that many characters at a time, the utterance should be split into smaller chunks and queued internally without returning an error.
options ( object )
Options specified to the tts.speak() method.
voiceName ( optional string )
The name of the voice to use for synthesis.
lang ( optional string )
The language to be used for synthesis, in the form language-region. Examples: 'en', 'en-US', 'en-GB', 'zh-CN'.
gender ( optional enumerated string ["male", "female"] )
Gender of voice for synthesized speech.
rate ( optional double )
Speaking rate relative to the default rate for this voice. 1.0 is the default rate, normally around 180 to 220 words per minute. 2.0 is twice as fast, and 0.5 is half as fast. This value is guaranteed to be between 0.1 and 10.0, inclusive. When a voice does not support this full range of rates, don't return an error. Instead, clip the rate to the range the voice supports.
pitch ( optional double )
Speaking pitch between 0 and 2 inclusive, with 0 being lowest and 2 being highest. 1.0 corresponds to this voice's default pitch.
volume ( optional double )
Speaking volume between 0 and 1 inclusive, with 0 being lowest and 1 being highest, with a default of 1.0.

Callback function

The sendTtsEvent parameter should specify a function that looks like this:

function(tts.TtsEvent event) {...};
event ( tts.TtsEvent )
The event from the text-to-speech engine indicating the status of this utterance.

onStop

chrome.ttsEngine.onStop.addListener(function() {...});

Fired when a call is made to tts.stop and this extension may be in the middle of speaking. If an extension receives a call to onStop and speech is already stopped, it should do nothing (not raise an error).

Sample Extensions that use chrome.ttsEngine

  • Console TTS Engine – A "silent" TTS engine that prints text to a small window rather than synthesizing speech.