Classes

The following classes are available globally.

  • The IBM Watson™ Speech to Text service provides APIs that use IBM’s speech-recognition capabilities to produce transcripts of spoken audio. The service can transcribe speech from various languages and audio formats. In addition to basic transcription, the service can produce detailed information about many different aspects of the audio. It returns all JSON response content in the UTF-8 character set. The service supports two types of models: previous-generation models that include the terms Broadband and Narrowband in their names, and next-generation models that include the terms Multimedia and Telephony in their names. Broadband and multimedia models have minimum sampling rates of 16 kHz. Narrowband and telephony models have minimum sampling rates of 8 kHz. The next-generation models offer high throughput and greater transcription accuracy. For speech recognition, the service supports synchronous and asynchronous HTTP Representational State Transfer (REST) interfaces. It also supports a WebSocket interface that provides a full-duplex, low-latency communication channel: Clients send requests and audio to the service and receive results over a single connection asynchronously. The service also offers two customization interfaces. Use language model customization to expand the vocabulary of a base model with domain-specific terminology. Use acoustic model customization to adapt a base model for the acoustic characteristics of your audio. For language model customization, the service also supports grammars. A grammar is a formal language specification that lets you restrict the phrases that the service can recognize. Language model customization is available for most previous- and next-generation models. Acoustic model customization is available for all previous-generation models. Grammars are beta functionality that is available for all previous-generation models that support language model customization.

    See more

    Declaration

    Swift

    public class SpeechToText
  • The IBM Watson Speech to Text service enables you to add speech transcription capabilities to your application. It uses machine intelligence to combine information about grammar and language structure to generate an accurate transcription. Transcriptions are supported for various audio formats and languages.

    This class enables fine-tuned control of a WebSockets session with the Speech to Text service. Although it is a more complex interface than the SpeechToText class, it provides more control and customizability of the session.

    See more

    Declaration

    Swift

    public class SpeechToTextSession