RecognizeParams | ibm-watson

Parameters for the recognize operation.

Hierarchy

RecognizeParams

Properties

Optional acousticCustomizationId

acousticCustomizationId: string

The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.

audio

audio: ReadableStream | Buffer

The audio to transcribe.

Optional audioMetrics

audioMetrics: boolean

If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

See Audio metrics.

Optional backgroundAudioSuppression

backgroundAudioSuppression: number

The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value in the range of 0.0 to 1.0:

0.0 (the default) provides no suppression (background audio suppression is disabled).
0.5 provides a reasonable level of audio suppression for general usage.
1.0 suppresses all audio (no audio is transcribed).

The values increase on a monotonic curve. See [Background audio suppression](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-suppression).

Optional baseModelVersion

baseModelVersion: string

The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.

Optional contentType

contentType: ContentType | string

The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Optional customizationId

customizationId: string

Deprecated.* Use the language_customization_id parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.

Optional customizationWeight

customizationWeight: number

If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.

Optional endOfPhraseSilenceTime

endOfPhraseSilenceTime: number

If true, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.

A value greater than 0 specifies the interval that the service is to use for speech recognition.
A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.

The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

See [End of phrase silence time](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#silence-time).

Optional grammarName

grammarName: string

The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.

Beta: The parameter is beta functionality.

See Using a grammar for speech recognition.

Optional headers

headers: OutgoingHttpHeaders

Optional inactivityTimeout

inactivityTimeout: number

The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Optional keywords

keywords: string[]

An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.

Optional keywordsThreshold

keywordsThreshold: number

A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.

Optional languageCustomizationId

languageCustomizationId: string

The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition.

Note: Use this parameter instead of the deprecated customization_id parameter.

Optional lowLatency

lowLatency: boolean

If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

The parameter is not available for previous-generation Broadband and Narrowband models. It is available only for some next-generation models. For a list of next-generation models that support low latency, see Supported next-generation language models.

For more information about the low_latency parameter, see [Low latency](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).

Optional maxAlternatives

maxAlternatives: number

The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Optional model

model: Model | string

The identifier of the model that is to be used for the recognition request. (Note: The model ar-AR_BroadbandModel is deprecated; use ar-MS_BroadbandModel instead.) See Previous-generation languages and models and Next-generation languages and models.

Optional profanityFilter

profanityFilter: boolean

If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English and Japanese transcription only. See Profanity filtering.

Optional redaction

redaction: boolean

If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1).

Beta: The parameter is beta functionality. Applies to US English, Japanese, and Korean transcription only.

See Numeric redaction.

Optional smartFormatting

smartFormatting: boolean

If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

Beta: The parameter is beta functionality. Applies to US English, Japanese, and Spanish transcription only.

See Smart formatting.

Optional speakerLabels

speakerLabels: boolean

If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.

Beta: The parameter is beta functionality.

For previous-generation models, the parameter can be used for Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
For next-generation models, the parameter can be used for English (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only.

Restrictions and limitations apply to the use of speaker labels for both types of models. See [Speaker labels](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-speaker-labels).

Optional speechDetectorSensitivity

speechDetectorSensitivity: number

The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:

0.0 suppresses all audio (no speech is transcribed).
0.5 (the default) provides a reasonable compromise for the level of sensitivity.
1.0 suppresses no audio (speech detection sensitivity is disabled).

The values increase on a monotonic curve. See [Speech detector sensitivity](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-sensitivity).

Optional splitTranscriptAtPhraseEnd

splitTranscriptAtPhraseEnd: boolean

If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.

See Split transcript at phrase end.

Optional timestamps

timestamps: boolean

If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Optional wordAlternativesThreshold

wordAlternativesThreshold: number

A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.

Optional wordConfidence

wordConfidence: boolean

If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

Interface RecognizeParams

Hierarchy

Index

Properties

Properties

Optional acousticCustomizationId

audio

Optional audioMetrics

Optional backgroundAudioSuppression

Optional baseModelVersion

Optional contentType

Optional customizationId

Optional customizationWeight

Optional endOfPhraseSilenceTime

Optional grammarName

Optional headers

Optional inactivityTimeout

Optional keywords

Optional keywordsThreshold

Optional languageCustomizationId

Optional lowLatency

Optional maxAlternatives

Optional model

Optional profanityFilter

Optional redaction

Optional smartFormatting

Optional speakerLabels

Optional speechDetectorSensitivity

Optional splitTranscriptAtPhraseEnd

Optional timestamps

Optional wordAlternativesThreshold

Optional wordConfidence