SpeechToTextSession

public class SpeechToTextSession

The IBM Watson Speech to Text service enables you to add speech transcription capabilities to your application. It uses machine intelligence to combine information about grammar and language structure to generate an accurate transcription. Transcriptions are supported for various audio formats and languages.

This class enables fine-tuned control of a WebSockets session with the Speech to Text service. Although it is a more complex interface than the SpeechToText class, it provides more control and customizability of the session.

  • The URL that shall be used to stream audio for transcription.

    Declaration

    Swift

    public var websocketsURL: String { get set }
  • The default HTTP headers for all requests to the service.

    Declaration

    Swift

    public var defaultHeaders: [String : String]
  • The results of the most recent recognition request.

    Declaration

    Swift

    public var results: SpeechRecognitionResults { get }
  • Invoked when the session connects to the Speech to Text service.

    Declaration

    Swift

    public var onConnect: (() -> Void)? { get set }
  • Invoked with microphone audio when a recording audio queue buffer has been filled. If microphone audio is being compressed, then the audio data is in Opus format. If uncompressed, then the audio data is in 16-bit mono PCM format at 16 kHZ.

    Declaration

    Swift

    public var onMicrophoneData: ((Data) -> Void)?
  • Invoked every 0.025s when recording with the average dB power of the microphone.

    Declaration

    Swift

    public var onPowerData: ((Float32) -> Void)? { get set }
  • Invoked when transcription results are received for a recognition request.

    Declaration

    Swift

    public var onResults: ((SpeechRecognitionResults) -> Void)? { get set }
  • Invoked when an error or warning occurs.

    Declaration

    Swift

    public var onError: ((WatsonError) -> Void)? { get set }
  • Invoked when the session disconnects from the Speech to Text service.

    Declaration

    Swift

    public var onDisconnect: (() -> Void)?
  • Undocumented

    Declaration

    Swift

    public init(
        authenticator: Authenticator,
        model: String? = nil,
        baseModelVersion: String? = nil,
        languageCustomizationID: String? = nil,
        acousticCustomizationID: String? = nil,
        learningOptOut: Bool? = nil,
        endOfPhraseSilenceTime: Double? = nil,
        splitTranscriptAtPhraseEnd: Bool? = nil,
        customerID: String? = nil)
  • Connect to the Speech to Text service.

    If set, the onConnect() callback will be invoked after the session connects to the service.

    Declaration

    Swift

    public func connect()
  • Start a recognition request.

    Declaration

    Swift

    public func startRequest(settings: RecognitionSettings)

    Parameters

    settings

    The configuration to use for this recognition request.

  • Send an audio file to transcribe.

    Declaration

    Swift

    public func recognize(audio: URL)

    Parameters

    audio

    The audio file to transcribe.

  • Send audio data to transcribe.

    Declaration

    Swift

    public func recognize(audio: Data)

    Parameters

    audio

    The audio data to transcribe.

  • Start streaming microphone audio data to transcribe.

    By default, microphone audio data is compressed to Opus format to reduce latency and bandwidth. To disable Opus compression and send linear PCM data instead, set compress to false.

    If compression is enabled, the recognitions request’s contentType setting should be set to “audio/ogg;codecs=opus”. If compression is disabled, then the contentType settings should be set to “audio/l16;rate=16000;channels=1”.

    This function may cause the system to automatically prompt the user for permission to access the microphone. Use AVAudioSession.requestRecordPermission(_:) if you would rather prefer to ask for the user’s permission in advance.

    Declaration

    Swift

    public func startMicrophone(compress: Bool = true)

    Parameters

    compress

    Should microphone audio be compressed to Opus format? (Opus compression reduces latency and bandwidth.)

  • Stop streaming microphone audio data to transcribe.

    Declaration

    Swift

    public func stopMicrophone()
  • Stop the recognition request.

    Declaration

    Swift

    public func stopRequest()
  • Wait for any queued recognition requests to complete then disconnect from the service.

    Declaration

    Swift

    public func disconnect()