SpeechToText

public class SpeechToText

The IBM Watson™ Speech to Text service provides APIs that use IBM’s speech-recognition capabilities to produce transcripts of spoken audio. The service can transcribe speech from various languages and audio formats. In addition to basic transcription, the service can produce detailed information about many different aspects of the audio. It returns all JSON response content in the UTF-8 character set. The service supports two types of models: previous-generation models that include the terms Broadband and Narrowband in their names, and next-generation models that include the terms Multimedia and Telephony in their names. Broadband and multimedia models have minimum sampling rates of 16 kHz. Narrowband and telephony models have minimum sampling rates of 8 kHz. The next-generation models offer high throughput and greater transcription accuracy. For speech recognition, the service supports synchronous and asynchronous HTTP Representational State Transfer (REST) interfaces. It also supports a WebSocket interface that provides a full-duplex, low-latency communication channel: Clients send requests and audio to the service and receive results over a single connection asynchronously. The service also offers two customization interfaces. Use language model customization to expand the vocabulary of a base model with domain-specific terminology. Use acoustic model customization to adapt a base model for the acoustic characteristics of your audio. For language model customization, the service also supports grammars. A grammar is a formal language specification that lets you restrict the phrases that the service can recognize. Language model customization is available for most previous- and next-generation models. Acoustic model customization is available for all previous-generation models. Grammars are beta functionality that is available for all previous-generation models that support language model customization.

  • The base URL to use when contacting the service.

    Declaration

    Swift

    public var serviceURL: String?
  • Service identifiers

    Declaration

    Swift

    public static let defaultServiceName: String
  • The default HTTP headers for all requests to the service.

    Declaration

    Swift

    public var defaultHeaders: [String : String]
  • Undocumented

    Declaration

    Swift

    public let authenticator: Authenticator
  • Create a SpeechToText object.

    If an authenticator is not supplied, the initializer will retrieve credentials from the environment or a local credentials file and construct an appropriate authenticator using these credentials. The credentials file can be downloaded from your service instance on IBM Cloud as ibm-credentials.env. Make sure to add the credentials file to your project so that it can be loaded at runtime.

    If an authenticator is not supplied and credentials are not available in the environment or a local credentials file, initialization will fail by throwing an exception. In that case, try another initializer that directly passes in the credentials.

    • serviceName: String = defaultServiceName
  • Create a SpeechToText object.

    Declaration

    Swift

    public init(authenticator: Authenticator)

    Parameters

    authenticator

    The Authenticator object used to authenticate requests to the service

  • Allow network requests to a server without verification of the server certificate. IMPORTANT: This should ONLY be used if truly intended, as it is unsafe otherwise.

    Declaration

    Swift

    public func disableSSLVerification()
  • List models.

    Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models. See also: Listing models.

    Declaration

    Swift

    public func listModels(
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<SpeechModels>?, WatsonError?) -> Void)

    Parameters

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Get a model.

    Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. See also: Listing models.

    Declaration

    Swift

    public func getModel(
        modelID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<SpeechModel>?, WatsonError?) -> Void)

    Parameters

    modelID

    The identifier of the model in the form of its name from the output of the List models method. (Note: The model ar-AR_BroadbandModel is deprecated; use ar-MS_BroadbandModel instead.).

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Recognize audio.

    Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.) See also: Making a basic HTTP request.

    Streaming mode

    For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds. See also:

    • Audio transmission
    • Timeouts ### Audio formats (content types) The service accepts audio in the following formats (MIME types).
    • For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
    • For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".) Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
    • audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
    • audio/basic (Required. Use only with narrowband models.)
    • audio/flac
    • audio/g729 (Use only with narrowband models.)
    • audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
    • audio/mp3
    • audio/mpeg
    • audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
    • audio/ogg (The service automatically detects the codec of the input audio.)
    • audio/ogg;codecs=opus
    • audio/ogg;codecs=vorbis
    • audio/wav (Provide audio with a maximum of nine channels.)
    • audio/webm (The service automatically detects the codec of the input audio.)
    • audio/webm;codecs=opus
    • audio/webm;codecs=vorbis The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails. See also: Supported audio formats.

      Next-generation models

      The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service’s previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy. You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Many next-generation models also support the low_latency parameter, which is not available with previous-generation models. But next-generation models do not support all of the parameters that are available for use with previous-generation models. For more information about all parameters that are supported for use with next-generation models, see Supported features for next-generation models. See also: Next-generation languages and models.

      Multipart speech recognition

      Note: The Watson SDKs do not support multipart speech recognition. The HTTP POST method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request. Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords. See also: Making a multipart HTTP request.

    Declaration

    Swift

    public func recognize(
        audio: Data,
        contentType: String? = nil,
        model: String? = nil,
        languageCustomizationID: String? = nil,
        acousticCustomizationID: String? = nil,
        baseModelVersion: String? = nil,
        customizationWeight: Double? = nil,
        inactivityTimeout: Int? = nil,
        keywords: [String]? = nil,
        keywordsThreshold: Double? = nil,
        maxAlternatives: Int? = nil,
        wordAlternativesThreshold: Double? = nil,
        wordConfidence: Bool? = nil,
        timestamps: Bool? = nil,
        profanityFilter: Bool? = nil,
        smartFormatting: Bool? = nil,
        speakerLabels: Bool? = nil,
        customizationID: String? = nil,
        grammarName: String? = nil,
        redaction: Bool? = nil,
        audioMetrics: Bool? = nil,
        endOfPhraseSilenceTime: Double? = nil,
        splitTranscriptAtPhraseEnd: Bool? = nil,
        speechDetectorSensitivity: Double? = nil,
        backgroundAudioSuppression: Double? = nil,
        lowLatency: Bool? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<SpeechRecognitionResults>?, WatsonError?) -> Void)

    Parameters

    audio

    The audio to transcribe.

    contentType

    The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

    model

    The identifier of the model that is to be used for the recognition request. (Note: The model ar-AR_BroadbandModel is deprecated; use ar-MS_BroadbandModel instead.) See Previous-generation languages and models and Next-generation languages and models.

    languageCustomizationID

    The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition. Note: Use this parameter instead of the deprecated customization_id parameter.

    acousticCustomizationID

    The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.

    baseModelVersion

    The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.

    customizationWeight

    If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request. Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases. See Using customization weight.

    inactivityTimeout

    The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

    keywords

    An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords. You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive. See Keyword spotting.

    keywordsThreshold

    A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.

    maxAlternatives

    The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

    wordAlternativesThreshold

    A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.

    wordConfidence

    If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

    timestamps

    If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

    profanityFilter

    If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English and Japanese transcription only. See Profanity filtering.

    smartFormatting

    If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting. Beta: The parameter is beta functionality. Applies to US English, Japanese, and Spanish transcription only. See Smart formatting.

    speakerLabels

    If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter. Beta: The parameter is beta functionality.

    • For previous-generation models, the parameter can be used for Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
    • For next-generation models, the parameter can be used for English (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only. Restrictions and limitations apply to the use of speaker labels for both types of models. See Speaker labels.
    customizationID

    Deprecated. Use the language_customization_id parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.

    grammarName

    The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model’s words resource. Beta: The parameter is beta functionality. See Using a grammar for speech recognition.

    redaction

    If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction. When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1). Beta: The parameter is beta functionality. Applies to US English, Japanese, and Korean transcription only. See Numeric redaction.

    audioMetrics

    If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics. See Audio metrics.

    endOfPhraseSilenceTime

    If true, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases. Specify a value for the pause interval in the range of 0.0 to 120.0.

    • A value greater than 0 specifies the interval that the service is to use for speech recognition.
    • A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter. The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds. See End of phrase silence time.
    splitTranscriptAtPhraseEnd

    If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval. See Split transcript at phrase end.

    speechDetectorSensitivity

    The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity. Specify a value between 0.0 and 1.0:

    • 0.0 suppresses all audio (no speech is transcribed).
    • 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
    • 1.0 suppresses no audio (speech detection sensitivity is disabled). The values increase on a monotonic curve. See Speech detector sensitivity.
    backgroundAudioSuppression

    The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise. Specify a value in the range of 0.0 to 1.0:

    • 0.0 (the default) provides no suppression (background audio suppression is disabled).
    • 0.5 provides a reasonable level of audio suppression for general usage.
    • 1.0 suppresses all audio (no audio is transcribed). The values increase on a monotonic curve. See Background audio suppression.
    lowLatency

    If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used. The parameter is not available for previous-generation Broadband and Narrowband models. It is available only for some next-generation models. For a list of next-generation models that support low latency, see Supported next-generation language models.

    • For more information about the low_latency parameter, see Low latency.
    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Register a callback.

    Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or allowlist, the callback URL if it is not already registered by sending a GET request to the callback URL. The service passes a random alphanumeric challenge string via the challenge_string parameter of the request. The request includes an Accept header that specifies text/plain as the required response type. To be registered successfully, the callback URL must respond to the GET request from the service. The response must send status code 200 and must include the challenge string in its body. Set the Content-Type response header to text/plain. Upon receiving this response, the service responds to the original registration request with response code 201. The service sends only a single GET request to the callback URL. If the service does not receive a reply with a response code of 200 and a body that echoes the challenge string sent by the service within five seconds, it does not allowlist the URL; it instead sends status code 400 in response to the request to register a callback. If the requested callback URL is already allowlisted, the service responds to the initial registration request with response code 200. If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of the challenge string in its response to the POST request. It sends this signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications. After you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time. See also: Registering a callback URL.

    Declaration

    Swift

    public func registerCallback(
        callbackURL: String,
        userSecret: String? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<RegisterStatus>?, WatsonError?) -> Void)

    Parameters

    callbackURL

    An HTTP or HTTPS URL to which callback notifications are to be sent. To be allowlisted, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.

    userSecret

    A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Unregister a callback.

    Unregisters a callback URL that was previously allowlisted with a Register a callback request for use with the asynchronous interface. Once unregistered, the URL can no longer be used with asynchronous recognition requests. See also: Unregistering a callback URL.

    Declaration

    Swift

    public func unregisterCallback(
        callbackURL: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    callbackURL

    The callback URL that is to be unregistered.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Create a job.

    Creates a job for a new asynchronous recognition request. The job is owned by the instance of the service whose credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

    • By callback notification: Include the callback_url parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.
    • By polling the service: Omit the callback_url, events, and user_token parameters. You must then use the Check jobs or Check a job methods to check the status of the job, using the latter to retrieve the results when the job is complete. The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Using the HTTPS Check a job method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity. The method supports the same basic parameters as other HTTP and WebSocket recognition requests. It also supports the following parameters specific to the asynchronous interface:
    • callback_url
    • events
    • user_token
    • results_ttl You can pass a maximum of 1 GB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.) See also: Creating a job. ### Streaming mode For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds. See also:
    • Audio transmission
    • Timeouts ### Audio formats (content types) The service accepts audio in the following formats (MIME types).
    • For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
    • For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".) Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
    • audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
    • audio/basic (Required. Use only with narrowband models.)
    • audio/flac
    • audio/g729 (Use only with narrowband models.)
    • audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
    • audio/mp3
    • audio/mpeg
    • audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
    • audio/ogg (The service automatically detects the codec of the input audio.)
    • audio/ogg;codecs=opus
    • audio/ogg;codecs=vorbis
    • audio/wav (Provide audio with a maximum of nine channels.)
    • audio/webm (The service automatically detects the codec of the input audio.)
    • audio/webm;codecs=opus
    • audio/webm;codecs=vorbis The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails. See also: Supported audio formats.

      Next-generation models

      The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service’s previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy. You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Many next-generation models also support the low_latency parameter, which is not available with previous-generation models. But next-generation models do not support all of the parameters that are available for use with previous-generation models. For more information about all parameters that are supported for use with next-generation models, see Supported features for next-generation models. See also: Next-generation languages and models.

    Declaration

    Swift

    public func createJob(
        audio: Data,
        contentType: String? = nil,
        model: String? = nil,
        callbackURL: String? = nil,
        events: String? = nil,
        userToken: String? = nil,
        resultsTtl: Int? = nil,
        languageCustomizationID: String? = nil,
        acousticCustomizationID: String? = nil,
        baseModelVersion: String? = nil,
        customizationWeight: Double? = nil,
        inactivityTimeout: Int? = nil,
        keywords: [String]? = nil,
        keywordsThreshold: Double? = nil,
        maxAlternatives: Int? = nil,
        wordAlternativesThreshold: Double? = nil,
        wordConfidence: Bool? = nil,
        timestamps: Bool? = nil,
        profanityFilter: Bool? = nil,
        smartFormatting: Bool? = nil,
        speakerLabels: Bool? = nil,
        customizationID: String? = nil,
        grammarName: String? = nil,
        redaction: Bool? = nil,
        processingMetrics: Bool? = nil,
        processingMetricsInterval: Double? = nil,
        audioMetrics: Bool? = nil,
        endOfPhraseSilenceTime: Double? = nil,
        splitTranscriptAtPhraseEnd: Bool? = nil,
        speechDetectorSensitivity: Double? = nil,
        backgroundAudioSuppression: Double? = nil,
        lowLatency: Bool? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<RecognitionJob>?, WatsonError?) -> Void)

    Parameters

    audio

    The audio to transcribe.

    contentType

    The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

    model

    The identifier of the model that is to be used for the recognition request. (Note: The model ar-AR_BroadbandModel is deprecated; use ar-MS_BroadbandModel instead.) See Previous-generation languages and models and Next-generation languages and models.

    callbackURL

    A URL to which callback notifications are to be sent. The URL must already be successfully allowlisted by using the Register a callback method. You can include the same callback URL with any number of job creation requests. Omit the parameter to poll the service for job completion and results. Use the user_token parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.

    events

    If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are

    • recognitions.started generates a callback notification when the service begins to process the job.
    • recognitions.completed generates a callback notification when the job is complete. You must use the Check a job method to retrieve the results before they time out or are deleted.
    • recognitions.completed_with_results generates a callback notification when the job is complete. The notification includes the results of the request.
    • recognitions.failed generates a callback notification if the service experiences an error while processing the job. The recognitions.completed and recognitions.completed_with_results events are incompatible. You can specify only of the two events. If the job includes a callback URL, omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. If the job does not include a callback URL, omit the parameter.
    userToken

    If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

    resultsTtl

    The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

    languageCustomizationID

    The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition. Note: Use this parameter instead of the deprecated customization_id parameter.

    acousticCustomizationID

    The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.

    baseModelVersion

    The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.

    customizationWeight

    If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request. Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases. See Using customization weight.

    inactivityTimeout

    The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

    keywords

    An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords. You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive. See Keyword spotting.

    keywordsThreshold

    A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.

    maxAlternatives

    The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

    wordAlternativesThreshold

    A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.

    wordConfidence

    If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

    timestamps

    If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

    profanityFilter

    If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English and Japanese transcription only. See Profanity filtering.

    smartFormatting

    If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting. Beta: The parameter is beta functionality. Applies to US English, Japanese, and Spanish transcription only. See Smart formatting.

    speakerLabels

    If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter. Beta: The parameter is beta functionality.

    • For previous-generation models, the parameter can be used for Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
    • For next-generation models, the parameter can be used for English (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only. Restrictions and limitations apply to the use of speaker labels for both types of models. See Speaker labels.
    customizationID

    Deprecated. Use the language_customization_id parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.

    grammarName

    The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model’s words resource. Beta: The parameter is beta functionality. See Using a grammar for speech recognition.

    redaction

    If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction. When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1). Beta: The parameter is beta functionality. Applies to US English, Japanese, and Korean transcription only. See Numeric redaction.

    processingMetrics

    If true, requests processing metrics about the service’s transcription of the input audio. The service returns processing metrics at the interval specified by the processing_metrics_interval parameter. It also returns processing metrics for transcription events, for example, for final and interim results. By default, the service returns no processing metrics. See Processing metrics.

    processingMetricsInterval

    Specifies the interval in real wall-clock seconds at which the service is to return processing metrics. The parameter is ignored unless the processing_metrics parameter is set to true. The parameter accepts a minimum value of 0.1 seconds. The level of precision is not restricted, so you can specify values such as 0.25 and 0.125. The service does not impose a maximum value. If you want to receive processing metrics only for transcription events instead of at periodic intervals, set the value to a large number. If the value is larger than the duration of the audio, the service returns processing metrics only for transcription events. See Processing metrics.

    audioMetrics

    If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics. See Audio metrics.

    endOfPhraseSilenceTime

    If true, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases. Specify a value for the pause interval in the range of 0.0 to 120.0.

    • A value greater than 0 specifies the interval that the service is to use for speech recognition.
    • A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter. The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds. See End of phrase silence time.
    splitTranscriptAtPhraseEnd

    If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval. See Split transcript at phrase end.

    speechDetectorSensitivity

    The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity. Specify a value between 0.0 and 1.0:

    • 0.0 suppresses all audio (no speech is transcribed).
    • 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
    • 1.0 suppresses no audio (speech detection sensitivity is disabled). The values increase on a monotonic curve. See Speech detector sensitivity.
    backgroundAudioSuppression

    The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise. Specify a value in the range of 0.0 to 1.0:

    • 0.0 (the default) provides no suppression (background audio suppression is disabled).
    • 0.5 provides a reasonable level of audio suppression for general usage.
    • 1.0 suppresses all audio (no audio is transcribed). The values increase on a monotonic curve. See Background audio suppression.
    lowLatency

    If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used. The parameter is not available for previous-generation Broadband and Narrowband models. It is available only for some next-generation models. For a list of next-generation models that support low latency, see Supported next-generation language models.

    • For more information about the low_latency parameter, see Low latency.
    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Check jobs.

    Returns the ID and status of the latest 100 outstanding jobs associated with the credentials with which it is called. The method also returns the creation and update times of each job, and, if a job was created with a callback URL and a user token, the user token for the job. To obtain the results for a job whose status is completed or not one of the latest 100 outstanding jobs, use the [Check a job[(#checkjob) method. A job and its results remain available until you delete them with the Delete a job method or until the job’s time to live expires, whichever comes first. See also: Checking the status of the latest jobs.

    Declaration

    Swift

    public func checkJobs(
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<RecognitionJobs>?, WatsonError?) -> Void)

    Parameters

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Check a job.

    Returns information about the specified job. The response always includes the status of the job and its creation and update times. If the status is completed, the response includes the results of the recognition request. You must use credentials for the instance of the service that owns a job to list information about it. You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available. Use the Check jobs method to request information about the most recent jobs associated with the calling credentials. See also: Checking the status and retrieving the results of a job.

    Declaration

    Swift

    public func checkJob(
        id: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<RecognitionJob>?, WatsonError?) -> Void)

    Parameters

    id

    The identifier of the asynchronous job that is to be used for the request. You must make the request with credentials for the instance of the service that owns the job.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Delete a job.

    Deletes the specified job. You cannot delete a job that the service is actively processing. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must use credentials for the instance of the service that owns a job to delete it. See also: Deleting a job.

    Declaration

    Swift

    public func deleteJob(
        id: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    id

    The identifier of the asynchronous job that is to be used for the request. You must make the request with credentials for the instance of the service that owns the job.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Create a custom language model.

    Creates a new custom language model for a specified base model. The custom language model can be used only with the base model for which it is created. The model is owned by the instance of the service whose credentials are used to create it. You can create a maximum of 1024 custom language models per owning credentials. The service returns an error if you attempt to create more than 1024 models. You do not lose any models, but you cannot create any more until your model count is below the limit. See also: Create a custom language model.

    Declaration

    Swift

    public func createLanguageModel(
        name: String,
        baseModelName: String,
        dialect: String? = nil,
        description: String? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<LanguageModel>?, WatsonError?) -> Void)

    Parameters

    name

    A user-defined name for the new custom language model. Use a name that is unique among all custom language models that you own. Use a localized name that matches the language of the custom model. Use a name that describes the domain of the custom model, such as Medical custom model or Legal custom model.

    baseModelName

    The name of the base language model that is to be customized by the new custom language model. The new custom model can be used only with the base model that it customizes. To determine whether a base model supports language model customization, use the Get a model method and check that the attribute custom_language_model is set to true. You can also refer to Language support for customization.

    dialect

    The dialect of the specified language that is to be used with the custom language model. For most languages, the dialect matches the language of the base model by default. For example, en-US is used for the US English language models. All dialect values are case-insensitive. The parameter is meaningful only for Spanish language models, for which you can always safely omit the parameter to have the service create the correct mapping. For Spanish, the service creates a custom language model that is suited for speech in one of the following dialects:

    • es-ES for Castilian Spanish (es-ES models)
    • es-LA for Latin American Spanish (es-AR, es-CL, es-CO, and es-PE models)
    • es-US for Mexican (North American) Spanish (es-MX models) If you specify the dialect parameter for a non-Spanish language model, its value must match the language of the base model. If you specify the dialect for a Spanish language model, its value must match one of the defined mappings (es-ES, es-LA, or es-MX).

    description

    A description of the new custom language model. Use a localized description that matches the language of the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • List custom language models.

    Lists information about all custom language models that are owned by an instance of the service. Use the language parameter to see all custom language models for the specified language. Omit the parameter to see all custom language models for all languages. You must use credentials for the instance of the service that owns a model to list information about it. See also: Listing custom language models.

    Declaration

    Swift

    public func listLanguageModels(
        language: String? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<LanguageModels>?, WatsonError?) -> Void)

    Parameters

    language

    The identifier of the language for which custom language or custom acoustic models are to be returned. Omit the parameter to see all custom language or custom acoustic models that are owned by the requesting credentials. (Note: The identifier ar-AR is deprecated; use ar-MS instead.) To determine the languages for which customization is available, see Language support for customization.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Get a custom language model.

    Gets information about a specified custom language model. You must use credentials for the instance of the service that owns a model to list information about it. See also: Listing custom language models.

    Declaration

    Swift

    public func getLanguageModel(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<LanguageModel>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Delete a custom language model.

    Deletes an existing custom language model. The custom model cannot be deleted if another request, such as adding a corpus or grammar to the model, is currently being processed. You must use credentials for the instance of the service that owns a model to delete it. See also: Deleting a custom language model.

    Declaration

    Swift

    public func deleteLanguageModel(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Train a custom language model.

    Initiates the training of a custom language model with new resources such as corpora, grammars, and custom words. After adding, modifying, or deleting resources for a custom language model, use this method to begin the actual training of the model on the latest data. You can specify whether the custom language model is to be trained with all words from its words resource or only with words that were added or modified by the user directly. You must use credentials for the instance of the service that owns a model to train it. The training method is asynchronous. It can take on the order of minutes to complete depending on the amount of data on which the service is being trained and the current load on the service. The method returns an HTTP 200 response code to indicate that the training process has begun. You can monitor the status of the training by using the Get a custom language model method to poll the model’s status. Use a loop to check the status every 10 seconds. The method returns a LanguageModel object that includes status and progress fields. A status of available means that the custom model is trained and ready to use. The service cannot accept subsequent training requests or requests to add new resources until the existing request completes. See also: Train the custom language model.

    Training failures

    Training can fail to start for the following reasons:

    • The service is currently handling another request for the custom model, such as another training request or a request to add a corpus or grammar to the model.
    • No training data have been added to the custom model.
    • The custom model contains one or more invalid corpora, grammars, or words (for example, a custom word has an invalid sounds-like pronunciation). You can correct the invalid resources or set the strict parameter to false to exclude the invalid resources from the training. The model must contain at least one valid resource for training to succeed.

    Declaration

    Swift

    public func trainLanguageModel(
        customizationID: String,
        wordTypeToAdd: String? = nil,
        customizationWeight: Double? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<TrainingResponse>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    wordTypeToAdd

    For custom models that are based on previous-generation models, the type of words from the custom language model’s words resource on which to train the model:

    • all (the default) trains the model on all new words, regardless of whether they were extracted from corpora or grammars or were added or modified by the user.
    • user trains the model only on custom words that were added or modified by the user directly. The model is not trained on new words extracted from corpora or grammars. For custom models that are based on next-generation models, the service ignores the parameter. The words resource contains only custom words that the user adds or modifies directly, so the parameter is unnecessary.
    customizationWeight

    Specifies a customization weight for the custom language model. The customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for speech recognition. Specify a value between 0.0 and 1.0; the default is 0.3. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases. The value that you assign is used for all recognition requests that use the model. You can override it for any recognition request by specifying a customization weight for that request. See Using customization weight.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Reset a custom language model.

    Resets a custom language model by removing all corpora, grammars, and words from the model. Resetting a custom language model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved, but the model’s words resource is removed and must be re-created. You must use credentials for the instance of the service that owns a model to reset it. See also: Resetting a custom language model.

    Declaration

    Swift

    public func resetLanguageModel(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Upgrade a custom language model.

    Initiates the upgrade of a custom language model to the latest version of its base language model. The upgrade method is asynchronous. It can take on the order of minutes to complete depending on the amount of data in the custom model and the current load on the service. A custom model must be in the ready or available state to be upgraded. You must use credentials for the instance of the service that owns a model to upgrade it. The method returns an HTTP 200 response code to indicate that the upgrade process has begun successfully. You can monitor the status of the upgrade by using the Get a custom language model method to poll the model’s status. The method returns a LanguageModel object that includes status and progress fields. Use a loop to check the status every 10 seconds. While it is being upgraded, the custom model has the status upgrading. When the upgrade is complete, the model resumes the status that it had prior to upgrade. The service cannot accept subsequent requests for the model until the upgrade completes. Note: Upgrading is necessary only for custom language models that are based on previous-generation models. Only a single version of a custom model that is based on a next-generation model is ever available. See also: Upgrading a custom language model.

    Declaration

    Swift

    public func upgradeLanguageModel(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • List corpora.

    Lists information about all corpora from a custom language model. The information includes the name, status, and total number of words for each corpus. For custom models that are based on previous-generation models, it also includes the number of out-of-vocabulary (OOV) words from the corpus. You must use credentials for the instance of the service that owns a model to list its corpora. See also: Listing corpora for a custom language model.

    Declaration

    Swift

    public func listCorpora(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Corpora>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Add a corpus.

    Adds a single corpus text file of new training data to a custom language model. Use multiple requests to submit multiple corpus text files. You must use credentials for the instance of the service that owns a model to add a corpus to it. Adding a corpus does not affect the custom language model until you train the model for the new data by using the Train a custom language model method. Submit a plain text file that contains sample sentences from the domain of interest to enable the service to parse the words in context. The more sentences you add that represent the context in which speakers use words from the domain, the better the service’s recognition accuracy. The call returns an HTTP 201 response code if the corpus is valid. The service then asynchronously processes and automatically extracts data from the contents of the corpus. This operation can take on the order of minutes to complete depending on the current load on the service, the total number of words in the corpus, and, for custom models that are based on previous-generation models, the number of new (out-of-vocabulary) words in the corpus. You cannot submit requests to add additional resources to the custom model or to train the model until the service’s analysis of the corpus for the current request completes. Use the Get a corpus method to check the status of the analysis. For custom models that are based on previous-generation models, the service auto-populates the model’s words resource with words from the corpus that are not found in its base vocabulary. These words are referred to as out-of-vocabulary (OOV) words. After adding a corpus, you must validate the words resource to ensure that each OOV word’s definition is complete and valid. You can use the List custom words method to examine the words resource. You can use other words method to eliminate typos and modify how words are pronounced as needed. To add a corpus file that has the same name as an existing corpus, set the allow_overwrite parameter to true; otherwise, the request fails. Overwriting an existing corpus causes the service to process the corpus text file and extract its data anew. For a custom model that is based on a previous-generation model, the service first removes any OOV words that are associated with the existing corpus from the model’s words resource unless they were also added by another corpus or grammar, or they have been modified in some way with the Add custom words or Add a custom word method. The service limits the overall amount of data that you can add to a custom model to a maximum of 10 million total words from all sources combined. For a custom model that is based on a previous-generation model, you can add no more than 90 thousand custom (OOV) words to a model. This includes words that the service extracts from corpora and grammars, and words that you add directly. See also:

    Declaration

    Swift

    public func addCorpus(
        customizationID: String,
        corpusName: String,
        corpusFile: Data,
        allowOverwrite: Bool? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    corpusName

    The name of the new corpus for the custom language model. Use a localized name that matches the language of the custom model and reflects the contents of the corpus.

    • Include a maximum of 128 characters in the name.
    • Do not use characters that need to be URL-encoded. For example, do not use spaces, slashes, backslashes, colons, ampersands, double quotes, plus signs, equals signs, questions marks, and so on in the name. (The service does not prevent the use of these characters. But because they must be URL-encoded wherever used, their use is strongly discouraged.)
    • Do not use the name of an existing corpus or grammar that is already defined for the custom model.
    • Do not use the name user, which is reserved by the service to denote custom words that are added or modified by the user.
    • Do not use the name base_lm or default_lm. Both names are reserved for future use by the service.
    corpusFile

    A plain text file that contains the training data for the corpus. Encode the file in UTF-8 if it contains non-ASCII characters; the service assumes UTF-8 encoding if it encounters non-ASCII characters. Make sure that you know the character encoding of the file. You must use that same encoding when working with the words in the custom language model. For more information, see Character encoding for custom words. With the curl command, use the --data-binary option to upload the file for the request.

    allowOverwrite

    If true, the specified corpus overwrites an existing corpus with the same name. If false, the request fails if a corpus with the same name already exists. The parameter has no effect if a corpus with the same name does not already exist.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Get a corpus.

    Gets information about a corpus from a custom language model. The information includes the name, status, and total number of words for the corpus. For custom models that are based on previous-generation models, it also includes the number of out-of-vocabulary (OOV) words from the corpus. You must use credentials for the instance of the service that owns a model to list its corpora. See also: Listing corpora for a custom language model.

    Declaration

    Swift

    public func getCorpus(
        customizationID: String,
        corpusName: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Corpus>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    corpusName

    The name of the corpus for the custom language model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Delete a corpus.

    Deletes an existing corpus from a custom language model. Removing a corpus does not affect the custom model until you train the model with the Train a custom language model method. You must use credentials for the instance of the service that owns a model to delete its corpora. For custom models that are based on previous-generation models, the service removes any out-of-vocabulary (OOV) words that are associated with the corpus from the custom model’s words resource unless they were also added by another corpus or grammar, or they were modified in some way with the Add custom words or Add a custom word method. See also: Deleting a corpus from a custom language model.

    Declaration

    Swift

    public func deleteCorpus(
        customizationID: String,
        corpusName: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    corpusName

    The name of the corpus for the custom language model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • List custom words.

    Lists information about custom words from a custom language model. You can list all words from the custom model’s words resource, only custom words that were added or modified by the user, or, for a custom model that is based on a previous-generation model, only out-of-vocabulary (OOV) words that were extracted from corpora or are recognized by grammars. You can also indicate the order in which the service is to return words; by default, the service lists words in ascending alphabetical order. You must use credentials for the instance of the service that owns a model to list information about its words. See also: Listing words from a custom language model.

    Declaration

    Swift

    public func listWords(
        customizationID: String,
        wordType: String? = nil,
        sort: String? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Words>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    wordType

    The type of words to be listed from the custom language model’s words resource:

    • all (the default) shows all words.
    • user shows only custom words that were added or modified by the user directly.
    • corpora shows only OOV that were extracted from corpora.
    • grammars shows only OOV words that are recognized by grammars. For a custom model that is based on a next-generation model, only all and user apply. Both options return the same results. Words from other sources are not added to custom models that are based on next-generation models.

    sort

    Indicates the order in which the words are to be listed, alphabetical or by count. You can prepend an optional + or - to an argument to indicate whether the results are to be sorted in ascending or descending order. By default, words are sorted in ascending alphabetical order. For alphabetical ordering, the lexicographical precedence is numeric values, uppercase letters, and lowercase letters. For count ordering, values with the same count are ordered alphabetically. With the curl command, URL-encode the + symbol as %2B.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Add custom words.

    Adds one or more custom words to a custom language model. You can use this method to add words or to modify existing words in a custom model’s words resource. For custom models that are based on previous-generation models, the service populates the words resource for a custom model with out-of-vocabulary (OOV) words from each corpus or grammar that is added to the model. You can use this method to modify OOV words in the model’s words resource. For a custom model that is based on a previous-generation model, the words resource for a model can contain a maximum of 90 thousand custom (OOV) words. This includes words that the service extracts from corpora and grammars and words that you add directly. You must use credentials for the instance of the service that owns a model to add or modify custom words for the model. Adding or modifying custom words does not affect the custom model until you train the model for the new data by using the Train a custom language model method. You add custom words by providing a CustomWords object, which is an array of CustomWord objects, one per word. Use the object’s word parameter to identify the word that is to be added. You can also provide one or both of the optional display_as or sounds_like fields for each word.

    • The display_as field provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in training data. For example, you might indicate that the word IBM is to be displayed as IBM&trade;.
    • The sounds_like field, which can be used only with a custom model that is based on a previous-generation model, provides an array of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users. Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on. For example, you might specify that the word IEEE can sound like i triple e. You can specify a maximum of five sounds-like pronunciations for a word. If you omit the sounds_like field, the service attempts to set the field to its pronunciation of the word. It cannot generate a pronunciation for all words, so you must review the word’s definition to ensure that it is complete and valid. If you add a custom word that already exists in the words resource for the custom model, the new definition overwrites the existing data for the word. If the service encounters an error with the input data, it returns a failure code and does not add any of the words to the words resource. The call returns an HTTP 201 response code if the input data is valid. It then asynchronously processes the words to add them to the model’s words resource. The time that it takes for the analysis to complete depends on the number of new words that you add but is generally faster than adding a corpus or grammar. You can monitor the status of the request by using the Get a custom language model method to poll the model’s status. Use a loop to check the status every 10 seconds. The method returns a Customization object that includes a status field. A status of ready means that the words have been added to the custom model. The service cannot accept requests to add new data or to train the model until the existing request completes. You can use the List custom words or Get a custom word method to review the words that you add. Words with an invalid sounds_like field include an error field that describes the problem. You can use other words-related methods to correct errors, eliminate typos, and modify how words are pronounced as needed. See also:
    • Add words to the custom language model
    • Working with custom words for previous-generation models
    • Working with custom words for next-generation models
    • Validating a words resource for previous-generation models
    • Validating a words resource for next-generation models.

    Declaration

    Swift

    public func addWords(
        customizationID: String,
        words: [CustomWord],
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    words

    An array of CustomWord objects that provides information about each custom word that is to be added to or updated in the custom language model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Add a custom word.

    Adds a custom word to a custom language model. You can use this method to add a word or to modify an existing word in the words resource. For custom models that are based on previous-generation models, the service populates the words resource for a custom model with out-of-vocabulary (OOV) words from each corpus or grammar that is added to the model. You can use this method to modify OOV words in the model’s words resource. For a custom model that is based on a previous-generation models, the words resource for a model can contain a maximum of 90 thousand custom (OOV) words. This includes words that the service extracts from corpora and grammars and words that you add directly. You must use credentials for the instance of the service that owns a model to add or modify a custom word for the model. Adding or modifying a custom word does not affect the custom model until you train the model for the new data by using the Train a custom language model method. Use the word_name parameter to specify the custom word that is to be added or modified. Use the CustomWord object to provide one or both of the optional display_as or sounds_like fields for the word.

    • The display_as field provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in training data. For example, you might indicate that the word IBM is to be displayed as IBM&trade;.
    • The sounds_like field, which can be used only with a custom model that is based on a previous-generation model, provides an array of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users. Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on. For example, you might specify that the word IEEE can sound like i triple e. You can specify a maximum of five sounds-like pronunciations for a word. If you omit the sounds_like field, the service attempts to set the field to its pronunciation of the word. It cannot generate a pronunciation for all words, so you must review the word’s definition to ensure that it is complete and valid. If you add a custom word that already exists in the words resource for the custom model, the new definition overwrites the existing data for the word. If the service encounters an error, it does not add the word to the words resource. Use the Get a custom word method to review the word that you add. See also:
    • Add words to the custom language model
    • Working with custom words for previous-generation models
    • Working with custom words for next-generation models
    • Validating a words resource for previous-generation models
    • Validating a words resource for next-generation models.

    Declaration

    Swift

    public func addWord(
        customizationID: String,
        wordName: String,
        word: String? = nil,
        soundsLike: [String]? = nil,
        displayAs: String? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    wordName

    The custom word that is to be added to or updated in the custom language model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words. URL-encode the word if it includes non-ASCII characters. For more information, see Character encoding.

    word

    For the Add custom words method, you must specify the custom word that is to be added to or updated in the custom model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words. Omit this parameter for the Add a custom word method.

    soundsLike

    For a custom model that is based on a previous-generation model, an array of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users.

    • For a word that is not in the service’s base vocabulary, omit the parameter to have the service automatically generate a sounds-like pronunciation for the word.
    • For a word that is in the service’s base vocabulary, use the parameter to specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary. A word can have at most five sounds-like pronunciations. A pronunciation can include at most 40 characters not including spaces. For a custom model that is based on a next-generation model, omit this field. Custom models based on next-generation models do not support the sounds_like field. The service ignores the field.
    displayAs

    An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Get a custom word.

    Gets information about a custom word from a custom language model. You must use credentials for the instance of the service that owns a model to list information about its words. See also: Listing words from a custom language model.

    Declaration

    Swift

    public func getWord(
        customizationID: String,
        wordName: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Word>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    wordName

    The custom word that is to be read from the custom language model. URL-encode the word if it includes non-ASCII characters. For more information, see Character encoding.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Delete a custom word.

    Deletes a custom word from a custom language model. You can remove any word that you added to the custom model’s words resource via any means. However, if the word also exists in the service’s base vocabulary, the service removes the word only from the words resource; the word remains in the base vocabulary. Removing a custom word does not affect the custom model until you train the model with the Train a custom language model method. You must use credentials for the instance of the service that owns a model to delete its words. See also: Deleting a word from a custom language model.

    Declaration

    Swift

    public func deleteWord(
        customizationID: String,
        wordName: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    wordName

    The custom word that is to be deleted from the custom language model. URL-encode the word if it includes non-ASCII characters. For more information, see Character encoding.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • List grammars.

    Lists information about all grammars from a custom language model. The information includes the total number of out-of-vocabulary (OOV) words, name, and status of each grammar. You must use credentials for the instance of the service that owns a model to list its grammars. Grammars are available for all languages and models that support language customization. Note: Grammars are supported only for use with previous-generation models. They are not supported for next-generation models. See also: Listing grammars from a custom language model.

    Declaration

    Swift

    public func listGrammars(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Grammars>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Add a grammar.

    Adds a single grammar file to a custom language model. Submit a plain text file in UTF-8 format that defines the grammar. Use multiple requests to submit multiple grammar files. You must use credentials for the instance of the service that owns a model to add a grammar to it. Adding a grammar does not affect the custom language model until you train the model for the new data by using the Train a custom language model method. The call returns an HTTP 201 response code if the grammar is valid. The service then asynchronously processes the contents of the grammar and automatically extracts new words that it finds. This operation can take a few seconds or minutes to complete depending on the size and complexity of the grammar, as well as the current load on the service. You cannot submit requests to add additional resources to the custom model or to train the model until the service’s analysis of the grammar for the current request completes. Use the Get a grammar method to check the status of the analysis. The service populates the model’s words resource with any word that is recognized by the grammar that is not found in the model’s base vocabulary. These are referred to as out-of-vocabulary (OOV) words. You can use the List custom words method to examine the words resource and use other words-related methods to eliminate typos and modify how words are pronounced as needed. To add a grammar that has the same name as an existing grammar, set the allow_overwrite parameter to true; otherwise, the request fails. Overwriting an existing grammar causes the service to process the grammar file and extract OOV words anew. Before doing so, it removes any OOV words associated with the existing grammar from the model’s words resource unless they were also added by another resource or they have been modified in some way with the Add custom words or Add a custom word method. The service limits the overall amount of data that you can add to a custom model to a maximum of 10 million total words from all sources combined. Also, you can add no more than 90 thousand OOV words to a model. This includes words that the service extracts from corpora and grammars and words that you add directly. Grammars are available for all languages and models that support language customization. Note: Grammars are supported only for use with previous-generation models. They are not supported for next-generation models. See also:

    Declaration

    Swift

    public func addGrammar(
        customizationID: String,
        grammarName: String,
        grammarFile: String,
        contentType: String,
        allowOverwrite: Bool? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    grammarName

    The name of the new grammar for the custom language model. Use a localized name that matches the language of the custom model and reflects the contents of the grammar.

    • Include a maximum of 128 characters in the name.
    • Do not use characters that need to be URL-encoded. For example, do not use spaces, slashes, backslashes, colons, ampersands, double quotes, plus signs, equals signs, questions marks, and so on in the name. (The service does not prevent the use of these characters. But because they must be URL-encoded wherever used, their use is strongly discouraged.)
    • Do not use the name of an existing grammar or corpus that is already defined for the custom model.
    • Do not use the name user, which is reserved by the service to denote custom words that are added or modified by the user.
    • Do not use the name base_lm or default_lm. Both names are reserved for future use by the service.
    grammarFile

    A plain text file that contains the grammar in the format specified by the Content-Type header. Encode the file in UTF-8 (ASCII is a subset of UTF-8). Using any other encoding can lead to issues when compiling the grammar or to unexpected results in decoding. The service ignores an encoding that is specified in the header of the grammar. With the curl command, use the --data-binary option to upload the file for the request.

    contentType

    The format (MIME type) of the grammar file:

    • application/srgs for Augmented Backus-Naur Form (ABNF), which uses a plain-text representation that is similar to traditional BNF grammars.
    • application/srgs+xml for XML Form, which uses XML elements to represent the grammar.
    allowOverwrite

    If true, the specified grammar overwrites an existing grammar with the same name. If false, the request fails if a grammar with the same name already exists. The parameter has no effect if a grammar with the same name does not already exist.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Get a grammar.

    Gets information about a grammar from a custom language model. The information includes the total number of out-of-vocabulary (OOV) words, name, and status of the grammar. You must use credentials for the instance of the service that owns a model to list its grammars. Grammars are available for all languages and models that support language customization. Note: Grammars are supported only for use with previous-generation models. They are not supported for next-generation models. See also: Listing grammars from a custom language model.

    Declaration

    Swift

    public func getGrammar(
        customizationID: String,
        grammarName: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Grammar>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    grammarName

    The name of the grammar for the custom language model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Delete a grammar.

    Deletes an existing grammar from a custom language model. The service removes any out-of-vocabulary (OOV) words associated with the grammar from the custom model’s words resource unless they were also added by another resource or they were modified in some way with the Add custom words or Add a custom word method. Removing a grammar does not affect the custom model until you train the model with the Train a custom language model method. You must use credentials for the instance of the service that owns a model to delete its grammar. Grammars are available for all languages and models that support language customization. Note: Grammars are supported only for use with previous-generation models. They are not supported for next-generation models. See also: Deleting a grammar from a custom language model.

    Declaration

    Swift

    public func deleteGrammar(
        customizationID: String,
        grammarName: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    grammarName

    The name of the grammar for the custom language model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Create a custom acoustic model.

    Creates a new custom acoustic model for a specified base model. The custom acoustic model can be used only with the base model for which it is created. The model is owned by the instance of the service whose credentials are used to create it. You can create a maximum of 1024 custom acoustic models per owning credentials. The service returns an error if you attempt to create more than 1024 models. You do not lose any models, but you cannot create any more until your model count is below the limit. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Create a custom acoustic model.

    Declaration

    Swift

    public func createAcousticModel(
        name: String,
        baseModelName: String,
        description: String? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<AcousticModel>?, WatsonError?) -> Void)

    Parameters

    name

    A user-defined name for the new custom acoustic model. Use a name that is unique among all custom acoustic models that you own. Use a localized name that matches the language of the custom model. Use a name that describes the acoustic environment of the custom model, such as Mobile custom model or Noisy car custom model.

    baseModelName

    The name of the base language model that is to be customized by the new custom acoustic model. The new custom model can be used only with the base model that it customizes. (Note: The model ar-AR_BroadbandModel is deprecated; use ar-MS_BroadbandModel instead.) To determine whether a base model supports acoustic model customization, refer to Language support for customization.

    description

    A description of the new custom acoustic model. Use a localized description that matches the language of the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • List custom acoustic models.

    Lists information about all custom acoustic models that are owned by an instance of the service. Use the language parameter to see all custom acoustic models for the specified language. Omit the parameter to see all custom acoustic models for all languages. You must use credentials for the instance of the service that owns a model to list information about it. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Listing custom acoustic models.

    Declaration

    Swift

    public func listAcousticModels(
        language: String? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<AcousticModels>?, WatsonError?) -> Void)

    Parameters

    language

    The identifier of the language for which custom language or custom acoustic models are to be returned. Omit the parameter to see all custom language or custom acoustic models that are owned by the requesting credentials. (Note: The identifier ar-AR is deprecated; use ar-MS instead.) To determine the languages for which customization is available, see Language support for customization.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Get a custom acoustic model.

    Gets information about a specified custom acoustic model. You must use credentials for the instance of the service that owns a model to list information about it. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Listing custom acoustic models.

    Declaration

    Swift

    public func getAcousticModel(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<AcousticModel>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Delete a custom acoustic model.

    Deletes an existing custom acoustic model. The custom model cannot be deleted if another request, such as adding an audio resource to the model, is currently being processed. You must use credentials for the instance of the service that owns a model to delete it. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Deleting a custom acoustic model.

    Declaration

    Swift

    public func deleteAcousticModel(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Train a custom acoustic model.

    Initiates the training of a custom acoustic model with new or changed audio resources. After adding or deleting audio resources for a custom acoustic model, use this method to begin the actual training of the model on the latest audio data. The custom acoustic model does not reflect its changed data until you train it. You must use credentials for the instance of the service that owns a model to train it. The training method is asynchronous. Training time depends on the cumulative amount of audio data that the custom acoustic model contains and the current load on the service. When you train or retrain a model, the service uses all of the model’s audio data in the training. Training a custom acoustic model takes approximately as long as the length of its cumulative audio data. For example, it takes approximately 2 hours to train a model that contains a total of 2 hours of audio. The method returns an HTTP 200 response code to indicate that the training process has begun. You can monitor the status of the training by using the Get a custom acoustic model method to poll the model’s status. Use a loop to check the status once a minute. The method returns an AcousticModel object that includes status and progress fields. A status of available indicates that the custom model is trained and ready to use. The service cannot train a model while it is handling another request for the model. The service cannot accept subsequent training requests, or requests to add new audio resources, until the existing training request completes. You can use the optional custom_language_model_id parameter to specify the GUID of a separately created custom language model that is to be used during training. Train with a custom language model if you have verbatim transcriptions of the audio files that you have added to the custom model or you have either corpora (text files) or a list of words that are relevant to the contents of the audio files. For training to succeed, both of the custom models must be based on the same version of the same base model, and the custom language model must be fully trained and available. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also:

    • Train the custom acoustic model
    • Using custom acoustic and custom language models together ### Training failures Training can fail to start for the following reasons:
    • The service is currently handling another request for the custom model, such as another training request or a request to add audio resources to the model.
    • The custom model contains less than 10 minutes or more than 200 hours of audio data.
    • You passed a custom language model with the custom_language_model_id query parameter that is not in the available state. A custom language model must be fully trained and available to be used to train a custom acoustic model.
    • You passed an incompatible custom language model with the custom_language_model_id query parameter. Both custom models must be based on the same version of the same base model.
    • The custom model contains one or more invalid audio resources. You can correct the invalid audio resources or set the strict parameter to false to exclude the invalid resources from the training. The model must contain at least one valid resource for training to succeed.

    Declaration

    Swift

    public func trainAcousticModel(
        customizationID: String,
        customLanguageModelID: String? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<TrainingResponse>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    customLanguageModelID

    The customization ID (GUID) of a custom language model that is to be used during training of the custom acoustic model. Specify a custom language model that has been trained with verbatim transcriptions of the audio resources or that contains words that are relevant to the contents of the audio resources. The custom language model must be based on the same version of the same base model as the custom acoustic model, and the custom language model must be fully trained and available. The credentials specified with the request must own both custom models.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Reset a custom acoustic model.

    Resets a custom acoustic model by removing all audio resources from the model. Resetting a custom acoustic model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved, but the model’s audio resources are removed and must be re-created. The service cannot reset a model while it is handling another request for the model. The service cannot accept subsequent requests for the model until the existing reset request completes. You must use credentials for the instance of the service that owns a model to reset it. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Resetting a custom acoustic model.

    Declaration

    Swift

    public func resetAcousticModel(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Upgrade a custom acoustic model.

    Initiates the upgrade of a custom acoustic model to the latest version of its base language model. The upgrade method is asynchronous. It can take on the order of minutes or hours to complete depending on the amount of data in the custom model and the current load on the service; typically, upgrade takes approximately twice the length of the total audio contained in the custom model. A custom model must be in the ready or available state to be upgraded. You must use credentials for the instance of the service that owns a model to upgrade it. The method returns an HTTP 200 response code to indicate that the upgrade process has begun successfully. You can monitor the status of the upgrade by using the Get a custom acoustic model method to poll the model’s status. The method returns an AcousticModel object that includes status and progress fields. Use a loop to check the status once a minute. While it is being upgraded, the custom model has the status upgrading. When the upgrade is complete, the model resumes the status that it had prior to upgrade. The service cannot upgrade a model while it is handling another request for the model. The service cannot accept subsequent requests for the model until the existing upgrade request completes. If the custom acoustic model was trained with a separately created custom language model, you must use the custom_language_model_id parameter to specify the GUID of that custom language model. The custom language model must be upgraded before the custom acoustic model can be upgraded. Omit the parameter if the custom acoustic model was not trained with a custom language model. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Upgrading a custom acoustic model.

    Declaration

    Swift

    public func upgradeAcousticModel(
        customizationID: String,
        customLanguageModelID: String? = nil,
        force: Bool? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    customLanguageModelID

    If the custom acoustic model was trained with a custom language model, the customization ID (GUID) of that custom language model. The custom language model must be upgraded before the custom acoustic model can be upgraded. The custom language model must be fully trained and available. The credentials specified with the request must own both custom models.

    force

    If true, forces the upgrade of a custom acoustic model for which no input data has been modified since it was last trained. Use this parameter only to force the upgrade of a custom acoustic model that is trained with a custom language model, and only if you receive a 400 response code and the message No input data modified since last training. See Upgrading a custom acoustic model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • List audio resources.

    Lists information about all audio resources from a custom acoustic model. The information includes the name of the resource and information about its audio data, such as its duration. It also includes the status of the audio resource, which is important for checking the service’s analysis of the resource in response to a request to add it to the custom acoustic model. You must use credentials for the instance of the service that owns a model to list its audio resources. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Listing audio resources for a custom acoustic model.

    Declaration

    Swift

    public func listAudio(
        customizationID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<AudioResources>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Add an audio resource.

    Adds an audio resource to a custom acoustic model. Add audio content that reflects the acoustic characteristics of the audio that you plan to transcribe. You must use credentials for the instance of the service that owns a model to add an audio resource to it. Adding audio data does not affect the custom acoustic model until you train the model for the new data by using the Train a custom acoustic model method. You can add individual audio files or an archive file that contains multiple audio files. Adding multiple audio files via a single archive file is significantly more efficient than adding each file individually. You can add audio resources in any format that the service supports for speech recognition. You can use this method to add any number of audio resources to a custom model by calling the method once for each audio or archive file. You can add multiple different audio resources at the same time. You must add a minimum of 10 minutes and a maximum of 200 hours of audio that includes speech, not just silence, to a custom acoustic model before you can train it. No audio resource, audio- or archive-type, can be larger than 100 MB. To add an audio resource that has the same name as an existing audio resource, set the allow_overwrite parameter to true; otherwise, the request fails. The method is asynchronous. It can take several seconds or minutes to complete depending on the duration of the audio and, in the case of an archive file, the total number of audio files being processed. The service returns a 201 response code if the audio is valid. It then asynchronously analyzes the contents of the audio file or files and automatically extracts information about the audio such as its length, sampling rate, and encoding. You cannot submit requests to train or upgrade the model until the service’s analysis of all audio resources for current requests completes. To determine the status of the service’s analysis of the audio, use the Get an audio resource method to poll the status of the audio. The method accepts the customization ID of the custom model and the name of the audio resource, and it returns the status of the resource. Use a loop to check the status of the audio every few seconds until it becomes ok. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Add audio to the custom acoustic model.

    Content types for audio-type resources

    You can add an individual audio file in any format that the service supports for speech recognition. For an audio-type resource, use the Content-Type parameter to specify the audio format (MIME type) of the audio file, including specifying the sampling rate, channels, and endianness where indicated.

    • audio/alaw (Specify the sampling rate (rate) of the audio.)
    • audio/basic (Use only with narrowband models.)
    • audio/flac
    • audio/g729 (Use only with narrowband models.)
    • audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
    • audio/mp3
    • audio/mpeg
    • audio/mulaw (Specify the sampling rate (rate) of the audio.)
    • audio/ogg (The service automatically detects the codec of the input audio.)
    • audio/ogg;codecs=opus
    • audio/ogg;codecs=vorbis
    • audio/wav (Provide audio with a maximum of nine channels.)
    • audio/webm (The service automatically detects the codec of the input audio.)
    • audio/webm;codecs=opus
    • audio/webm;codecs=vorbis The sampling rate of an audio file must match the sampling rate of the base model for the custom model: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the service labels the audio file as invalid. See also: Supported audio formats. ### Content types for archive-type resources You can add an archive file (.zip or .tar.gz file) that contains audio files in any format that the service supports for speech recognition. For an archive-type resource, use the Content-Type parameter to specify the media type of the archive file:
    • application/zip for a .zip file
    • application/gzip for a .tar.gz file. When you add an archive-type resource, the Contained-Content-Type header is optional depending on the format of the files that you are adding:
    • For audio files of type audio/alaw, audio/basic, audio/l16, or audio/mulaw, you must use the Contained-Content-Type header to specify the format of the contained audio files. Include the rate, channels, and endianness parameters where necessary. In this case, all audio files contained in the archive file must have the same audio format.
    • For audio files of all other types, you can omit the Contained-Content-Type header. In this case, the audio files contained in the archive file can have any of the formats not listed in the previous bullet. The audio files do not need to have the same format. Do not use the Contained-Content-Type header when adding an audio-type resource.

      Naming restrictions for embedded audio files

      The name of an audio file that is contained in an archive-type resource can include a maximum of 128 characters. This includes the file extension and all elements of the name (for example, slashes).

    Declaration

    Swift

    public func addAudio(
        customizationID: String,
        audioName: String,
        audioResource: Data,
        contentType: String? = nil,
        containedContentType: String? = nil,
        allowOverwrite: Bool? = nil,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    audioName

    The name of the new audio resource for the custom acoustic model. Use a localized name that matches the language of the custom model and reflects the contents of the resource.

    • Include a maximum of 128 characters in the name.
    • Do not use characters that need to be URL-encoded. For example, do not use spaces, slashes, backslashes, colons, ampersands, double quotes, plus signs, equals signs, questions marks, and so on in the name. (The service does not prevent the use of these characters. But because they must be URL-encoded wherever used, their use is strongly discouraged.)
    • Do not use the name of an audio resource that has already been added to the custom model.
    audioResource

    The audio resource that is to be added to the custom acoustic model, an individual audio file or an archive file. With the curl command, use the --data-binary option to upload the file for the request.

    contentType

    For an audio-type resource, the format (MIME type) of the audio. For more information, see Content types for audio-type resources in the method description. For an archive-type resource, the media type of the archive file. For more information, see Content types for archive-type resources in the method description.

    containedContentType

    For an archive-type resource, specify the format of the audio files that are contained in the archive file if they are of type audio/alaw, audio/basic, audio/l16, or audio/mulaw. Include the rate, channels, and endianness parameters where necessary. In this case, all audio files that are contained in the archive file must be of the indicated type. For all other audio formats, you can omit the header. In this case, the audio files can be of multiple types as long as they are not of the types listed in the previous paragraph. The parameter accepts all of the audio formats that are supported for use with speech recognition. For more information, see Content types for audio-type resources in the method description. For an audio-type resource, omit the header.

    allowOverwrite

    If true, the specified audio resource overwrites an existing audio resource with the same name. If false, the request fails if an audio resource with the same name already exists. The parameter has no effect if an audio resource with the same name does not already exist.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Get an audio resource.

    Gets information about an audio resource from a custom acoustic model. The method returns an AudioListing object whose fields depend on the type of audio resource that you specify with the method’s audio_name parameter:

    • For an audio-type resource, the object’s fields match those of an AudioResource object: duration, name, details, and status.
    • For an archive-type resource, the object includes a container field whose fields match those of an AudioResource object. It also includes an audio field, which contains an array of AudioResource objects that provides information about the audio files that are contained in the archive. The information includes the status of the specified audio resource. The status is important for checking the service’s analysis of a resource that you add to the custom model.
    • For an audio-type resource, the status field is located in the AudioListing object.
    • For an archive-type resource, the status field is located in the AudioResource object that is returned in the container field. You must use credentials for the instance of the service that owns a model to list its audio resources. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Listing audio resources for a custom acoustic model.

    Declaration

    Swift

    public func getAudio(
        customizationID: String,
        audioName: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<AudioListing>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    audioName

    The name of the audio resource for the custom acoustic model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Delete an audio resource.

    Deletes an existing audio resource from a custom acoustic model. Deleting an archive-type audio resource removes the entire archive of files. The service does not allow deletion of individual files from an archive resource. Removing an audio resource does not affect the custom model until you train the model on its updated data by using the Train a custom acoustic model method. You can delete an existing audio resource from a model while a different resource is being added to the model. You must use credentials for the instance of the service that owns a model to delete its audio resources. Note: Acoustic model customization is supported only for use with previous-generation models. It is not supported for next-generation models. See also: Deleting an audio resource from a custom acoustic model.

    Declaration

    Swift

    public func deleteAudio(
        customizationID: String,
        audioName: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customizationID

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

    audioName

    The name of the audio resource for the custom acoustic model.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Delete labeled data.

    Deletes all data that is associated with a specified customer ID. The method deletes all data for the customer ID, regardless of the method by which the information was added. The method has no effect if no data is associated with the customer ID. You must issue the request with credentials for the same instance of the service that was used to associate the customer ID with the data. You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes the data. Note: If you delete an instance of the service from the service console, all data associated with that service instance is automatically deleted. This includes all custom language models, corpora, grammars, and words; all custom acoustic models and audio resources; all registered endpoints for the asynchronous HTTP interface; and all data related to speech recognition requests. See also: Information security.

    Declaration

    Swift

    public func deleteUserData(
        customerID: String,
        headers: [String: String]? = nil,
        completionHandler: @escaping (WatsonResponse<Void>?, WatsonError?) -> Void)

    Parameters

    customerID

    The customer ID for which all data is to be deleted.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Perform speech recognition for audio data using WebSockets.

    Declaration

    Swift

    public func recognizeUsingWebSocket(
        audio: Data,
        settings: RecognitionSettings,
        model: String? = nil,
        baseModelVersion: String? = nil,
        languageCustomizationID: String? = nil,
        acousticCustomizationID: String? = nil,
        learningOptOut: Bool? = nil,
        endOfPhraseSilenceTime: Double? = nil,
        splitTranscriptAtPhraseEnd: Bool? = nil,
        customerID: String? = nil,
        headers: [String: String]? = nil,
        callback: RecognizeCallback)

    Parameters

    audio

    The audio data to transcribe.

    settings

    The configuration to use for this recognition request.

    model

    The language and sample rate of the audio. For supported models, visit https://cloud.ibm.com/docs/services/speech-to-text/input.html#models.

    baseModelVersion

    The version of the specified base model that is to be used for all requests sent over the connection. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.

    languageCustomizationID

    The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.

    acousticCustomizationID

    The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. By default, no custom acoustic model is used.

    learningOptOut

    If true, then this request will not be logged for training.

    customerID

    Associates a customer ID with all data that is passed over the connection. By default, no customer ID is associated with the data.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Perform speech recognition for microphone audio. To stop the microphone, invoke stopRecognizeMicrophone().

    Microphone audio is compressed to Opus format unless otherwise specified by the compress parameter. With compression enabled, the settings should specify a contentType of “audio/ogg;codecs=opus”. With compression disabled, the settings should specify a contentType of “audio/l16;rate=16000;channels=1”.

    This function may cause the system to automatically prompt the user for permission to access the microphone. Use AVAudioSession.requestRecordPermission(_:) if you would prefer to ask for the user’s permission in advance.

    Declaration

    Swift

    public func recognizeMicrophone(
        settings: RecognitionSettings,
        model: String? = nil,
        baseModelVersion: String? = nil,
        languageCustomizationID: String? = nil,
        acousticCustomizationID: String? = nil,
        learningOptOut: Bool? = nil,
        customerID: String? = nil,
        compress: Bool = true,
        configureSession: Bool = true,
        headers: [String: String]? = nil,
        callback: RecognizeCallback)

    Parameters

    settings

    The configuration for this transcription request.

    model

    The language and sample rate of the audio. For supported models, visit https://cloud.ibm.com/docs/services/speech-to-text/input.html#models.

    baseModelVersion

    The version of the specified base model that is to be used for all requests sent over the connection. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.

    languageCustomizationID

    The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.

    acousticCustomizationID

    The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. By default, no custom acoustic model is used.

    learningOptOut

    If true, then this request will not be logged for training.

    customerID

    Associates a customer ID with all data that is passed over the connection. By default, no customer ID is associated with the data.

    compress

    Should microphone audio be compressed to Opus format? (Opus compression reduces latency and bandwidth.)

    configureSession

    A Boolean value that specifies whether to configure the AVAudioSession. When true, the AVAudioSession is set to a standard configuration for microphone input. When false, the current AVAudioSession configuration is used. To use an AVAudioSession configuration other than the standard microphone configuration, set the configuration in your application and specify false for the configureSession parameter. Default is true.

    headers

    A dictionary of request headers to be sent with this request.

    completionHandler

    A function executed when the request completes with a successful result or error

  • Stop performing speech recognition for microphone audio.

    When invoked, this function will

    1. Stop recording audio from the microphone.
    2. Send a stop message to stop the current recognition request.
    3. Wait to receive all recognition results then disconnect from the service.

    Declaration

    Swift

    public func stopRecognizeMicrophone()