Home

watson-speech 0.19.3

IBM Watson Speech Services for Web Browsers

Build Status npm-version

Allows you to easily add voice recognition and synthesis to any web app with minimal code.

Warning This library is still has a few rough edges and may yet see breaking changes.

For Web Browsers Only

This library is primarily intended for use in browsers. Check out watson-developer-cloud to use Watson services (speech and others) from Node.js.

However, a server-side component is required to generate auth tokens. The examples/ folder includes example Node.js and Python servers, and SDKs are available for Node.js, Java, Python, and there is also a REST API.

Installation - standalone

Pre-compiled bundles are available from on GitHub Releases - just download the file and drop it into your website: https://github.com/watson-developer-cloud/speech-javascript-sdk/releases

Installation - npm with browserify

This library is built with browserify and easy to use in browserify-based projects :

npm install --save watson-speech

API & Examples

The basic API is outlined below, see complete API docs at http://watson-developer-cloud.github.io/speech-javascript-sdk/v0.19.3/

See several examples at https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/v0.19.3/examples/static/

All API methods require an auth token that must be generated server-side. (See https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/v0.19.3/examples/ for a couple of basic examples in Node.js and Python.)

WatsonSpeech.TextToSpeech

.synthesize({text, token}) -> <audio>

Speaks the supplied text through an automatically-created <audio> element. Currently limited to text that can fit within a GET URL (this is particularly an issue on Internet Explorer before Windows 10 where the max length is around 1000 characters after the token is accounted for.)

Options:

  • text - the text to transcribe // todo: list supported languages
  • voice - the desired playback voice's name - see .getVoices(). Note that the voices are language-specific.
  • autoPlay - set to false to prevent the audio from automatically playing

WatsonSpeech.SpeechToText

.recognizeMicrophone({token}) -> Stream

Options:

  • keepMic: if true, preserves the MicrophoneStream for subsequent calls, preventing additional permissions requests in Firefox
  • Other options passed to RecognizeStream
  • Other options passed to WritableElementStream if options.outputElement is set

Requires the getUserMedia API, so limited browser compatibility (see http://caniuse.com/#search=getusermedia) Also note that Chrome requires https (with a few exceptions for localhost and such) - see https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features

Pipes results through a FormatStream by default, set options.format=false to disable.

Known issue: Firefox continues to display a microphone icon in the address bar after recording has ceased. This is a browser bug.

.recognizeFile({data, token}) -> Stream

Can recognize and optionally attempt to play a File or Blob (such as from an <input type="file"/> or from an ajax request.)

Options:

  • data: a Blob or File instance.
  • play: (optional, default=false) Attempt to also play the file locally while uploading it for transcription
  • Other options passed to RecognizeStream
  • Other options passed to WritableElementStream if options.outputElement is set

playrequires that the browser support the format; most browsers support wav and ogg/opus, but not flac.) Will emit a playback-error on the RecognizeStream if playback fails. Playback will automatically stop when .stop() is called on the RecognizeStream.

Pipes results through a TimingStream by if options.play=true, set options.realtime=false to disable.

Pipes results through a FormatStream by default, set options.format=false to disable.

Changes

There have been a few breaking changes in recent releases:

  • Removed SpeechToText.recognizeElement() due to quality issues
  • renamed recognizeBlob to recognizeFile to make the primary usage more apparent
  • Changed playFile option of recognizeBlob() to just play, corrected default

See CHANGELOG.md for a complete list of changes.

todo

  • Further solidify API
  • break components into standalone npm modules where it makes sense
  • run integration tests on travis (fall back to offline server for pull requests)
  • add even more tests
  • better cross-browser testing (IE, Safari, mobile browsers - maybe saucelabs?)
  • update node-sdk to use current version of this lib's RecognizeStream (and also provide the FormatStream + anything else that might be handy)
  • move result and results events to node wrapper (along with the deprecation notice)
  • improve docs
  • consider a wrapper to match https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
  • support a "hard" stop that prevents any further data events, even for already uploaded audio, ensure timing stream also implements this.
  • look for bug where single-word final results may omit word confidence (possibly due to FormatStream?)
  • fix bug where TimingStream shows words slightly before they're spoken