watson_developer_cloud.speech_to_text_v1 module

### Service Overview
The service transcribes speech from various languages and audio formats to text with low

latency. The service supports transcription of the following languages: Brazilian Portuguese, French, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

class SpeechToTextV1(url='https://stream.watsonplatform.net/speech-to-text/api', username=None, password=None)[source]

Bases: watson_developer_cloud.watson_service.WatsonService

The Speech to Text V1 service.

default_url = 'https://stream.watsonplatform.net/speech-to-text/api'
get_model(model_id, **kwargs)[source]

Retrieves information about the model.

Returns information about a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.

Parameters:
  • model_id (str) – The identifier of the desired model in the form of its name from the output of Get models.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the SpeechModel response.

Return type:

dict

list_models(**kwargs)[source]

Retrieves the models available for the service.

Returns a list of all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.

Parameters:headers (dict) – A dict containing the request headers
Returns:A dict containing the SpeechModels response.
Return type:dict
models(**kwargs)
recognize(model=None, customization_id=None, acoustic_customization_id=None, customization_weight=None, version=None, audio=None, content_type=None, inactivity_timeout=None, keywords=None, keywords_threshold=None, max_alternatives=None, word_alternatives_threshold=None, word_confidence=None, timestamps=None, profanity_filter=None, smart_formatting=None, speaker_labels=None, **kwargs)[source]

Sends audio for speech recognition in sessionless mode.

Parameters:
  • model (str) – The identifier of the model that is to be used for the recognition request.
  • customization_id (str) – The GUID of a custom language model that is to be used with the request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
  • acoustic_customization_id (str) – The GUID of a custom acoustic model that is to be used with the request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
  • customization_weight (float) – NON-MULTIPART ONLY: If you specify a customization ID with the request, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for speech recognition. Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases.
  • version (str) – The version of the specified base model that is to be used for speech recognition. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. For more information, see [Base model version](https://console.bluemix.net/docs/services/speech-to-text/input.html#version).
  • audio (str) – NON-MULTIPART ONLY: Audio to transcribe in the format specified by the Content-Type header. Required for a non-multipart request..
  • content_type (str) – The type of the input: audio/basic, audio/flac, audio/l16, audio/mp3, audio/mpeg, audio/mulaw, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/wav, audio/webm, audio/webm;codecs=opus, audio/webm;codecs=vorbis, or multipart/form-data.
  • inactivity_timeout (int) – NON-MULTIPART ONLY: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
  • keywords (list[str]) – NON-MULTIPART ONLY: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords.
  • keywords_threshold (float) – NON-MULTIPART ONLY: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
  • max_alternatives (int) – NON-MULTIPART ONLY: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
  • word_alternatives_threshold (float) – NON-MULTIPART ONLY: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
  • word_confidence (bool) – NON-MULTIPART ONLY: If true, confidence measure per word is returned.
  • timestamps (bool) – NON-MULTIPART ONLY: If true, time alignment for each word is returned.
  • profanity_filter (bool) – NON-MULTIPART ONLY: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.
  • smart_formatting (bool) – NON-MULTIPART ONLY: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.
  • speaker_labels (bool) – NON-MULTIPART ONLY: Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter. To determine whether a language model supports speaker labels, use the Get models method and check that the attribute speaker_labels is set to true. You can also refer to [Speaker labels](https://console.bluemix.net/docs/services/speech-to-text/output.html#speaker_labels).
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the SpeechRecognitionResults response.

Return type:

dict

recognize_with_websocket(audio=None, content_type='audio/l16; rate=44100', model='en-US_BroadbandModel', recognize_callback=None, customization_id=None, acoustic_customization_id=None, customization_weight=None, version=None, inactivity_timeout=None, interim_results=True, keywords=None, keywords_threshold=None, max_alternatives=1, word_alternatives_threshold=None, word_confidence=False, timestamps=False, profanity_filter=None, smart_formatting=False, speaker_labels=None, **kwargs)[source]

Sends audio for speech recognition using web sockets.

Parameters:
  • audio (str) – Audio to transcribe in the format specified by the Content-Type header.
  • content_type (str) – The type of the input: audio/basic, audio/flac, audio/l16, audio/mp3, audio/mpeg, audio/mulaw, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/wav, audio/webm, audio/webm;codecs=opus, audio/webm;codecs=vorbis, or multipart/form-data.
  • model (str) – The identifier of the model to be used for the recognition request.
  • recognize_callback (RecognizeCallback) – The instance handling events returned from the service.
  • customization_id (str) – The GUID of a custom language model that is to be used with the request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
  • acoustic_customization_id (str) – The GUID of a custom acoustic model that is to be used with the request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
  • customization_weight (float) – If you specify a customization_id with the request, you can use the customization_weight parameter to tell the service how much weight to give to words from the custom language model compared to those from the base model for speech recognition. Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases.
  • version (str) – The version of the specified base model that is to be used for speech recognition. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. For more information, see [Base model version](https://console.bluemix.net/docs/services/speech-to-text/input.html#version).
  • inactivity_timeout (int) – The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
  • interim_results (bool) – Send back non-final previews of each “sentence” as it is being processed. These results are ignored in text mode.
  • keywords (list[str]) – Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
  • keywords_threshold (float) – Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
  • max_alternatives (int) – Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
  • word_alternatives_threshold (float) – Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
  • word_confidence (bool) – If true, confidence measure per word is returned.
  • timestamps (bool) – If true, time alignment for each word is returned.
  • profanity_filter (bool) – If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.
  • smart_formatting (bool) – If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.
  • speaker_labels (bool) – Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter. To determine whether a language model supports speaker labels, use the GET /v1/models method and check that the attribute speaker_labels is set to true. You can also refer to [Speaker labels](https://console.bluemix.net/docs/services/speech-to-text/output.html#speaker_labels).
  • headers (dict) – A dict containing the request headers
Returns:

check_job(id, **kwargs)[source]

Checks the status of the specified asynchronous job.

Parameters:
  • id (str) – The ID of the job whose status is to be checked.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the RecognitionJob response.

Return type:

dict

check_jobs(**kwargs)[source]

Checks the status of all asynchronous jobs.

Parameters:headers (dict) – A dict containing the request headers
Returns:A dict containing the RecognitionJobs response.
Return type:dict
create_job(audio, content_type, model=None, callback_url=None, events=None, user_token=None, results_ttl=None, customization_id=None, acoustic_customization_id=None, customization_weight=None, version=None, inactivity_timeout=None, keywords=None, keywords_threshold=None, max_alternatives=None, word_alternatives_threshold=None, word_confidence=None, timestamps=None, profanity_filter=None, smart_formatting=None, speaker_labels=None, **kwargs)[source]

Creates a job for an asynchronous recognition request.

Parameters:
  • audio (str) – Audio to transcribe in the format specified by the Content-Type header.
  • content_type (str) – The type of the input: audio/basic, audio/flac, audio/l16, audio/mp3, audio/mpeg, audio/mulaw, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/wav, audio/webm, audio/webm;codecs=opus, or audio/webm;codecs=vorbis.
  • model (str) – The identifier of the model that is to be used for the recognition request.
  • callback_url (str) – A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the Register a callback method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the user_token parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.
  • events (str) – If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the Check a job method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.
  • user_token (str) – If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.
  • results_ttl (int) – The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.
  • customization_id (str) – The GUID of a custom language model that is to be used with the request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
  • acoustic_customization_id (str) – The GUID of a custom acoustic model that is to be used with the request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
  • customization_weight (float) – If you specify a customization ID with the request, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for speech recognition. Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases.
  • version (str) – The version of the specified base model that is to be used with the request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. For more information, see [Base model version](https://console.bluemix.net/docs/services/speech-to-text/input.html#version).
  • inactivity_timeout (int) – The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
  • keywords (list[str]) – Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords.
  • keywords_threshold (float) – Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
  • max_alternatives (int) – Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
  • word_alternatives_threshold (float) – Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
  • word_confidence (bool) – If true, confidence measure per word is returned.
  • timestamps (bool) – If true, time alignment for each word is returned.
  • profanity_filter (bool) – If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.
  • smart_formatting (bool) – If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.
  • speaker_labels (bool) – Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter. To determine whether a language model supports speaker labels, use the Get models method and check that the attribute speaker_labels is set to true. You can also refer to [Speaker labels](https://console.bluemix.net/docs/services/speech-to-text/output.html#speaker_labels).
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the RecognitionJob response.

Return type:

dict

delete_job(id, **kwargs)[source]

Deletes the specified asynchronous job.

Deletes the specified job. You cannot delete a job that the service is actively processing. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Parameters:
  • id (str) – The ID of the job that is to be deleted.
  • headers (dict) – A dict containing the request headers
Return type:

None

register_callback(callback_url, user_secret=None, **kwargs)[source]

Registers a callback URL for use with the asynchronous interface.

Parameters:
  • callback_url (str) – An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
  • user_secret (str) – A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the RegisterStatus response.

Return type:

dict

unregister_callback(callback_url, **kwargs)[source]

Removes the registration for an asynchronous callback URL.

Unregisters a callback URL that was previously white-listed with a POST register_callback request for use with the asynchronous interface. Once unregistered, the URL can no longer be used with asynchronous recognition requests.

Parameters:
  • callback_url (str) – The callback URL that is to be unregistered.
  • headers (dict) – A dict containing the request headers
Return type:

None

create_language_model(name, base_model_name, dialect=None, description=None, **kwargs)[source]

Creates a custom language model.

Creates a new custom language model for a specified base model. The custom language model can be used only with the base model for which it is created. The model is owned by the instance of the service whose credentials are used to create it.

Parameters:
  • name (str) – A user-defined name for the new custom language model. Use a name that is unique among all custom language models that you own. Use a localized name that matches the language of the custom model. Use a name that describes the domain of the custom model, such as Medical custom model or Legal custom model.
  • base_model_name (str) – The name of the base language model that is to be customized by the new custom language model. The new custom model can be used only with the base model that it customizes. To determine whether a base model supports language model customization, request information about the base model and check that the attribute custom_language_model is set to true, or refer to [Language support for customization](https://console.bluemix.net/docs/services/speech-to-text/custom.html#languageSupport).
  • dialect (str) – The dialect of the specified language that is to be used with the custom language model. The parameter is meaningful only for Spanish models, for which the service creates a custom language model that is suited for speech in one of the following dialects: * es-ES for Castilian Spanish (the default) * es-LA for Latin American Spanish * es-US for North American (Mexican) Spanish A specified dialect must be valid for the base model. By default, the dialect matches the language of the base model; for example, en-US for either of the US English language models.
  • description (str) – A description of the new custom language model. Use a localized description that matches the language of the custom model.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the LanguageModel response.

Return type:

dict

create_custom_model(**kwargs)
delete_language_model(customization_id, **kwargs)[source]

Deletes a custom language model.

Deletes an existing custom language model. The custom model cannot be deleted if another request, such as adding a corpus to the model, is currently being processed. You must use credentials for the instance of the service that owns a model to delete it.

Parameters:
  • customization_id (str) – The GUID of the custom language model that is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • headers (dict) – A dict containing the request headers
Return type:

None

delete_custom_model(**kwargs)
get_language_model(customization_id, **kwargs)[source]

Lists information about a custom language model.

Lists information about a specified custom language model. You must use credentials for the instance of the service that owns a model to list information about it.

Parameters:
  • customization_id (str) – The GUID of the custom language model for which information is to be returned. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the LanguageModel response.

Return type:

dict

get_custom_model(**kwargs)
list_language_models(language=None, **kwargs)[source]

Lists information about all custom language models.

Lists information about all custom language models that are owned by an instance of the service. Use the language parameter to see all custom language models for the specified language; omit the parameter to see all custom language models for all languages. You must use credentials for the instance of the service that owns a model to list information about it.

Parameters:
  • language (str) – The identifier of the language for which custom language models are to be returned (for example, en-US). Omit the parameter to see all custom language models owned by the requesting service credentials.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the LanguageModels response.

Return type:

dict

list_custom_models(**kwargs)
reset_language_model(customization_id, **kwargs)[source]

Resets a custom language model.

Resets a custom language model by removing all corpora and words from the model. Resetting a custom language model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved, but the model’s words resource is removed and must be re-created. You must use credentials for the instance of the service that owns a model to reset it.

Parameters:
  • customization_id (str) – The GUID of the custom language model that is to be reset. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • headers (dict) – A dict containing the request headers
Return type:

None

train_language_model(customization_id, word_type_to_add=None, customization_weight=None, **kwargs)[source]

Trains a custom language model.

Parameters:
  • customization_id (str) – The GUID of the custom language model that is to be trained. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • word_type_to_add (str) – The type of words from the custom language model’s words resource on which to train the model: * all (the default) trains the model on all new words, regardless of whether they were extracted from corpora or were added or modified by the user. * user trains the model only on new words that were added or modified by the user; the model is not trained on new words extracted from corpora.
  • customization_weight (float) – Specifies a customization weight for the custom language model. The customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for speech recognition. Specify a value between 0.0 and 1.0. The default value is 0.3. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases. The value that you assign is used for all recognition requests that use the model. You can override it for any recognition request by specifying a customization weight for that request.
  • headers (dict) – A dict containing the request headers
Return type:

None

train_custom_model(**kwargs)
upgrade_language_model(customization_id, **kwargs)[source]

Upgrades a custom language model.

Parameters:
  • customization_id (str) – The GUID of the custom language model that is to be upgraded. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • headers (dict) – A dict containing the request headers
Return type:

None

add_corpus(customization_id, corpus_name, corpus_file, allow_overwrite=None, corpus_file_content_type=None, corpus_filename=None, **kwargs)[source]

Adds a corpus text file to a custom language model.

Parameters:
  • customization_id (str) – The GUID of the custom language model to which a corpus is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • corpus_name (str) – The name of the corpus that is to be added to the custom language model. The name cannot contain spaces and cannot be the string user, which is reserved by the service to denote custom words added or modified by the user. Use a localized name that matches the language of the custom model.
  • corpus_file (file) – A plain text file that contains the training data for the corpus. Encode the file in UTF-8 if it contains non-ASCII characters; the service assumes UTF-8 encoding if it encounters non-ASCII characters. With cURL, use the –data-binary option to upload the file for the request.
  • allow_overwrite (bool) – Indicates whether the specified corpus is to overwrite an existing corpus with the same name. If a corpus with the same name already exists, the request fails unless allow_overwrite is set to true; by default, the parameter is false. The parameter has no effect if a corpus with the same name does not already exist.
  • corpus_file_content_type (str) – The content type of corpus_file.
  • corpus_filename (str) – The filename for corpus_file.
  • headers (dict) – A dict containing the request headers
Return type:

None

delete_corpus(customization_id, corpus_name, **kwargs)[source]

Deletes a corpus from a custom language model.

Parameters:
  • customization_id (str) – The GUID of the custom language model from which a corpus is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • corpus_name (str) – The name of the corpus that is to be deleted from the custom language model.
  • headers (dict) – A dict containing the request headers
Return type:

None

get_corpus(customization_id, corpus_name, **kwargs)[source]

Lists information about a corpus for a custom language model.

Lists information about a corpus from a custom language model. The information includes the total number of words and out-of-vocabulary (OOV) words, name, and status of the corpus. You must use credentials for the instance of the service that owns a model to list its corpora.

Parameters:
  • customization_id (str) – The GUID of the custom language model for which a corpus is to be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • corpus_name (str) – The name of the corpus about which information is to be listed.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the Corpus response.

Return type:

dict

list_corpora(customization_id, **kwargs)[source]

Lists information about all corpora for a custom language model.

Lists information about all corpora from a custom language model. The information includes the total number of words and out-of-vocabulary (OOV) words, name, and status of each corpus. You must use credentials for the instance of the service that owns a model to list its corpora.

Parameters:
  • customization_id (str) – The GUID of the custom language model for which corpora are to be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the Corpora response.

Return type:

dict

add_word(customization_id, word_name, sounds_like=None, display_as=None, **kwargs)[source]

Adds a custom word to a custom language model.

Parameters:
  • customization_id (str) – The GUID of the custom language model to which a word is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • word_name (str) – The custom word that is to be added to or updated in the custom model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words.
  • sounds_like (list[str]) – An array of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users. For a word that is not in the service’s base vocabulary, omit the parameter to have the service automatically generate a sounds-like pronunciation for the word. For a word that is in the service’s base vocabulary, use the parameter to specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary. A word can have at most five sounds-like pronunciations, and a pronunciation can include at most 40 characters not including spaces.
  • display_as (str) – An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.
  • headers (dict) – A dict containing the request headers
Return type:

None

add_custom_word(**kwargs)
add_words(customization_id, words, **kwargs)[source]

Adds one or more custom words to a custom language model.

Parameters:
  • customization_id (str) – The GUID of the custom language model to which words are to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • words (list[CustomWord]) – An array of objects that provides information about each custom word that is to be added to or updated in the custom language model.
  • headers (dict) – A dict containing the request headers
Return type:

None

add_custom_words(**kwargs)
delete_word(customization_id, word_name, **kwargs)[source]

Deletes a custom word from a custom language model.

Parameters:
  • customization_id (str) – The GUID of the custom language model from which a word is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • word_name (str) – The custom word that is to be deleted from the custom language model.
  • headers (dict) – A dict containing the request headers
Return type:

None

delete_custom_word(**kwargs)
get_word(customization_id, word_name, **kwargs)[source]

Lists a custom word from a custom language model.

Lists information about a custom word from a custom language model. You must use credentials for the instance of the service that owns a model to query information about its words.

Parameters:
  • customization_id (str) – The GUID of the custom language model from which a word is to be queried. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • word_name (str) – The custom word that is to be queried from the custom language model.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the Word response.

Return type:

dict

get_custom_word(**kwargs)
list_words(customization_id, word_type=None, sort=None, **kwargs)[source]

Lists all custom words from a custom language model.

Lists information about custom words from a custom language model. You can list all words from the custom model’s words resource, only custom words that were added or modified by the user, or only out-of-vocabulary (OOV) words that were extracted from corpora. You can also indicate the order in which the service is to return words; by default, words are listed in ascending alphabetical order. You must use credentials for the instance of the service that owns a model to query information about its words.

Parameters:
  • customization_id (str) – The GUID of the custom language model from which words are to be queried. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • word_type (str) – The type of words to be listed from the custom language model’s words resource: * all (the default) shows all words. * user shows only custom words that were added or modified by the user. * corpora shows only OOV that were extracted from corpora.
  • sort (str) – Indicates the order in which the words are to be listed, alphabetical or by count. You can prepend an optional + or - to an argument to indicate whether the results are to be sorted in ascending or descending order. By default, words are sorted in ascending alphabetical order. For alphabetical ordering, the lexicographical precedence is numeric values, uppercase letters, and lowercase letters. For count ordering, values with the same count are ordered alphabetically. With cURL, URL encode the + symbol as %2B.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the Words response.

Return type:

dict

list_custom_words(customization_id, word_type=None, sort=None)[source]
create_acoustic_model(name, base_model_name, description=None, **kwargs)[source]

Creates a custom acoustic model.

Creates a new custom acoustic model for a specified base model. The custom acoustic model can be used only with the base model for which it is created. The model is owned by the instance of the service whose credentials are used to create it.

Parameters:
  • name (str) – A user-defined name for the new custom acoustic model. Use a name that is unique among all custom acoustic models that you own. Use a localized name that matches the language of the custom model. Use a name that describes the acoustic environment of the custom model, such as Mobile custom model or Noisy car custom model.
  • base_model_name (str) – The name of the base language model that is to be customized by the new custom acoustic model. The new custom model can be used only with the base model that it customizes. To determine whether a base model supports acoustic model customization, refer to [Language support for customization](https://console.bluemix.net/docs/services/speech-to-text/custom.html#languageSupport).
  • description (str) – A description of the new custom acoustic model. Use a localized description that matches the language of the custom model.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the AcousticModel response.

Return type:

dict

delete_acoustic_model(customization_id, **kwargs)[source]

Deletes a custom acoustic model.

Deletes an existing custom acoustic model. The custom model cannot be deleted if another request, such as adding an audio resource to the model, is currently being processed. You must use credentials for the instance of the service that owns a model to delete it.

Parameters:
  • customization_id (str) – The GUID of the custom acoustic model that is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • headers (dict) – A dict containing the request headers
Return type:

None

get_acoustic_model(customization_id, **kwargs)[source]

Lists information about a custom acoustic model.

Lists information about a specified custom acoustic model. You must use credentials for the instance of the service that owns a model to list information about it.

Parameters:
  • customization_id (str) – The GUID of the custom acoustic model for which information is to be returned. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the AcousticModel response.

Return type:

dict

list_acoustic_models(language=None, **kwargs)[source]

Lists information about all custom acoustic models.

Lists information about all custom acoustic models that are owned by an instance of the service. Use the language parameter to see all custom acoustic models for the specified language; omit the parameter to see all custom acoustic models for all languages. You must use credentials for the instance of the service that owns a model to list information about it.

Parameters:
  • language (str) – The identifier of the language for which custom acoustic models are to be returned (for example, en-US). Omit the parameter to see all custom acoustic models owned by the requesting service credentials.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the AcousticModels response.

Return type:

dict

reset_acoustic_model(customization_id, **kwargs)[source]

Resets a custom acoustic model.

Resets a custom acoustic model by removing all audio resources from the model. Resetting a custom acoustic model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved, but the model’s audio resources are removed and must be re-created. You must use credentials for the instance of the service that owns a model to reset it.

Parameters:
  • customization_id (str) – The GUID of the custom acoustic model that is to be reset. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • headers (dict) – A dict containing the request headers
Return type:

None

train_acoustic_model(customization_id, custom_language_model_id=None, **kwargs)[source]

Trains a custom acoustic model.

Parameters:
  • customization_id (str) – The GUID of the custom acoustic model that is to be trained. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • custom_language_model_id (str) – The GUID of a custom language model that is to be used during training of the custom acoustic model. Specify a custom language model that has been trained with verbatim transcriptions of the audio resources or that contains words that are relevant to the contents of the audio resources.
  • headers (dict) – A dict containing the request headers
Return type:

None

upgrade_acoustic_model(customization_id, custom_language_model_id=None, **kwargs)[source]

Upgrades a custom acoustic model.

Parameters:
  • customization_id (str) – The GUID of the custom acoustic model that is to be upgraded. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • custom_language_model_id (str) – If the custom acoustic model was trained with a custom language model, the GUID of that custom language model. The custom language model must be upgraded before the custom acoustic model can be upgraded.
  • headers (dict) – A dict containing the request headers
Return type:

None

add_audio(customization_id, audio_name, audio_resource, content_type, contained_content_type=None, allow_overwrite=None, **kwargs)[source]

Adds an audio resource to a custom acoustic model.

Parameters:
  • customization_id (str) – The GUID of the custom acoustic model to which an audio resource is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • audio_name (str) – The name of the audio resource that is to be added to the custom acoustic model. The name cannot contain spaces. Use a localized name that matches the language of the custom model.
  • audio_resource (list[str]) – The audio resource that is to be added to the custom acoustic model, an individual audio file or an archive file.
  • content_type (str) – The type of the input: application/zip, application/gzip, audio/basic, audio/flac, audio/l16, audio/mp3, audio/mpeg, audio/mulaw, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/wav, audio/webm, audio/webm;codecs=opus, or audio/webm;codecs=vorbis.
  • contained_content_type (str) – For an archive-type resource that contains audio files whose format is not audio/wav, specifies the format of the audio files. The header accepts all of the audio formats supported for use with speech recognition and with the Content-Type header, including the rate, channels, and endianness parameters that are used with some formats. For a complete list of supported audio formats, see [Audio formats](/docs/services/speech-to-text/input.html#formats).
  • allow_overwrite (bool) – Indicates whether the specified audio resource is to overwrite an existing resource with the same name. If a resource with the same name already exists, the request fails unless allow_overwrite is set to true; by default, the parameter is false. The parameter has no effect if a resource with the same name does not already exist.
  • headers (dict) – A dict containing the request headers
Return type:

None

delete_audio(customization_id, audio_name, **kwargs)[source]

Deletes an audio resource from a custom acoustic model.

Parameters:
  • customization_id (str) – The GUID of the custom acoustic model from which an audio resource is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • audio_name (str) – The name of the audio resource that is to be deleted from the custom acoustic model.
  • headers (dict) – A dict containing the request headers
Return type:

None

get_audio(customization_id, audio_name, **kwargs)[source]

Lists information about an audio resource for a custom acoustic model.

Parameters:
  • customization_id (str) – The GUID of the custom acoustic model for which an audio resource is to be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • audio_name (str) – The name of the audio resource about which information is to be listed.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the AudioListing response.

Return type:

dict

list_audio(customization_id, **kwargs)[source]

Lists information about all audio resources for a custom acoustic model.

Lists information about all audio resources from a custom acoustic model. The information includes the name of the resource and information about its audio data, such as its duration. It also includes the status of the audio resource, which is important for checking the service’s analysis of the resource in response to a request to add it to the custom acoustic model. You must use credentials for the instance of the service that owns a model to list its audio resources.

Parameters:
  • customization_id (str) – The GUID of the custom acoustic model for which audio resources are to be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.
  • headers (dict) – A dict containing the request headers
Returns:

A dict containing the AudioResources response.

Return type:

dict

class AcousticModel(customization_id, created=None, language=None, versions=None, owner=None, name=None, description=None, base_model_name=None, status=None, progress=None, warnings=None)[source]

Bases: object

AcousticModel.

Attr str customization_id:
 The customization ID (GUID) of the custom acoustic model. Note: When you create a new custom acoustic model, the service returns only the GUID of the new model; it does not return the other fields of this object.
Attr str created:
 (optional) The date and time in Coordinated Universal Time (UTC) at which the custom acoustic model was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
Attr str language:
 (optional) The language identifier of the custom acoustic model (for example, en-US).
Attr list[str] versions:
 (optional) A list of the available versions of the custom acoustic model. Each element of the array indicates a version of the base model with which the custom model can be used. Multiple versions exist only if the custom model has been upgraded; otherwise, only a single version is shown.
Attr str owner:(optional) The GUID of the service credentials for the instance of the service that owns the custom acoustic model.
Attr str name:(optional) The name of the custom acoustic model.
Attr str description:
 (optional) The description of the custom acoustic model.
Attr str base_model_name:
 (optional) The name of the language model for which the custom acoustic model was created.
Attr str status:
 (optional) The current status of the custom acoustic model: * pending indicates that the model was created but is waiting either for training data to be added or for the service to finish analyzing added data. * ready indicates that the model contains data and is ready to be trained. * training indicates that the model is currently being trained. * available indicates that the model is trained and ready to use. * upgrading indicates that the model is currently being upgraded. * failed indicates that training of the model failed.
Attr int progress:
 (optional) A percentage that indicates the progress of the custom acoustic model’s current training. A value of 100 means that the model is fully trained. Note: The progress field does not currently reflect the progress of the training; the field changes from 0 to 100 when training is complete.
Attr str warnings:
 (optional) If the request included unknown parameters, the following message: Unexpected query parameter(s) [‘parameters’] detected, where parameters is a list that includes a quoted string for each unknown parameter.
class AcousticModels(customizations)[source]

Bases: object

AcousticModels.

Attr list[AcousticModel] customizations:
 An array of objects that provides information about each available custom acoustic model. The array is empty if the requesting service credentials own no custom acoustic models (if no language is specified) or own no custom acoustic models for the specified language.
class AudioDetails(type, codec=None, frequency=None, compression=None)[source]

Bases: object

AudioDetails.

Attr str type:The type of the audio resource: * audio for an individual audio file * archive for an archive (.zip or .tar.gz) file that contains audio files.
Attr str codec:(optional) For an audio-type resource, the codec in which the audio is encoded. Omitted for an archive-type resource.
Attr int frequency:
 (optional) For an audio-type resource, the sampling rate of the audio in Hertz (samples per second). Omitted for an archive-type resource.
Attr str compression:
 (optional) For an archive-type resource, the format of the compressed archive: * zip for a .zip file * gzip for a .tar.gz file Omitted for an audio-type resource.
class AudioListing(duration=None, name=None, details=None, status=None, container=None, audio=None)[source]

Bases: object

AudioListing.

Attr float duration:
 (optional) For an audio-type resource, the total seconds of audio in the resource. Omitted for an archive-type resource.
Attr str name:(optional) For an audio-type resource, the name of the resource. Omitted for an archive-type resource.
Attr AudioDetails details:
 (optional) For an audio-type resource, an AudioDetails object that provides detailed information about the resource. The object is empty until the service finishes processing the audio. Omitted for an archive-type resource.
Attr str status:
 (optional) For an audio-type resource, the status of the resource: * ok indicates that the service has successfully analyzed the audio data. The data can be used to train the custom model. * being_processed indicates that the service is still analyzing the audio data. The service cannot accept requests to add new audio resources or to train the custom model until its analysis is complete. * invalid indicates that the audio data is not valid for training the custom model (possibly because it has the wrong format or sampling rate, or because it is corrupted). Omitted for an archive-type resource.
Attr AudioResource container:
 (optional) For an archive-type resource, an object of type AudioResource that provides information about the resource. Omitted for an audio-type resource.
Attr list[AudioResource] audio:
 (optional) For an archive-type resource, an array of AudioResource objects that provides information about the audio-type resources that are contained in the resource. Omitted for an audio-type resource.
class AudioResource(duration, name, details, status)[source]

Bases: object

AudioResource.

Attr float duration:
 The total seconds of audio in the audio resource.
Attr str name:The name of the audio resource.
Attr AudioDetails details:
 An AudioDetails object that provides detailed information about the audio resource. The object is empty until the service finishes processing the audio.
Attr str status:
 The status of the audio resource: * ok indicates that the service has successfully analyzed the audio data. The data can be used to train the custom model. * being_processed indicates that the service is still analyzing the audio data. The service cannot accept requests to add new audio resources or to train the custom model until its analysis is complete. * invalid indicates that the audio data is not valid for training the custom model (possibly because it has the wrong format or sampling rate, or because it is corrupted). For an archive file, the entire archive is invalid if any of its audio files are invalid.
class AudioResources(total_minutes_of_audio, audio)[source]

Bases: object

AudioResources.

Attr float total_minutes_of_audio:
 The total minutes of accumulated audio summed over all of the valid audio resources for the custom acoustic model. You can use this value to determine whether the custom model has too little or too much audio to begin training.
Attr list[AudioResource] audio:
 An array of AudioResource objects that provides information about the audio resources of the custom acoustic model. The array is empty if the custom model has no audio resources.
class Corpora(corpora)[source]

Bases: object

Corpora.

Attr list[Corpus] corpora:
 Information about corpora of the custom model. The array is empty if the custom model has no corpora.
class Corpus(name, total_words, out_of_vocabulary_words, status, error=None)[source]

Bases: object

Corpus.

Attr str name:The name of the corpus.
Attr int total_words:
 The total number of words in the corpus. The value is 0 while the corpus is being processed.
Attr int out_of_vocabulary_words:
 The number of OOV words in the corpus. The value is 0 while the corpus is being processed.
Attr str status:
 The status of the corpus: * analyzed indicates that the service has successfully analyzed the corpus; the custom model can be trained with data from the corpus. * being_processed indicates that the service is still analyzing the corpus; the service cannot accept requests to add new corpora or words, or to train the custom model. * undetermined indicates that the service encountered an error while processing the corpus.
Attr str error:(optional) If the status of the corpus is undetermined, the following message: Analysis of corpus ‘name’ failed. Please try adding the corpus again by setting the ‘allow_overwrite’ flag to ‘true’.
class CustomWord(word=None, sounds_like=None, display_as=None)[source]

Bases: object

CustomWord.

Attr str word:(optional) When specifying an array of one or more words, you must specify the custom word that is to be added to or updated in the custom model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words. When adding or updating a single word directly, omit this field.
Attr list[str] sounds_like:
 (optional) An array of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users. For a word that is not in the service’s base vocabulary, omit the parameter to have the service automatically generate a sounds-like pronunciation for the word. For a word that is in the service’s base vocabulary, use the parameter to specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary. A word can have at most five sounds-like pronunciations, and a pronunciation can include at most 40 characters not including spaces.
Attr str display_as:
 (optional) An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.
class KeywordResult(normalized_text, start_time, end_time, confidence)[source]

Bases: object

KeywordResult.

Attr str normalized_text:
 A specified keyword normalized to the spoken phrase that matched in the audio input.
Attr float start_time:
 The start time in seconds of the keyword match.
Attr float end_time:
 The end time in seconds of the keyword match.
Attr float confidence:
 A confidence score for the keyword match in the range of 0 to 1.
class LanguageModel(customization_id, created=None, language=None, dialect=None, versions=None, owner=None, name=None, description=None, base_model_name=None, status=None, progress=None, warnings=None)[source]

Bases: object

LanguageModel.

Attr str customization_id:
 The customization ID (GUID) of the custom language model. Note: When you create a new custom language model, the service returns only the GUID of the new model; it does not return the other fields of this object.
Attr str created:
 (optional) The date and time in Coordinated Universal Time (UTC) at which the custom language model was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
Attr str language:
 (optional) The language identifier of the custom language model (for example, en-US).
Attr str dialect:
 (optional) The dialect of the language for the custom language model. By default, the dialect matches the language of the base model; for example, en-US for either of the US English language models. For Spanish models, the field indicates the dialect for which the model was created: * es-ES for Castilian Spanish (the default) * es-LA for Latin American Spanish * es-US for North American (Mexican) Spanish.
Attr list[str] versions:
 (optional) A list of the available versions of the custom language model. Each element of the array indicates a version of the base model with which the custom model can be used. Multiple versions exist only if the custom model has been upgraded; otherwise, only a single version is shown.
Attr str owner:(optional) The GUID of the service credentials for the instance of the service that owns the custom language model.
Attr str name:(optional) The name of the custom language model.
Attr str description:
 (optional) The description of the custom language model.
Attr str base_model_name:
 (optional) The name of the language model for which the custom language model was created.
Attr str status:
 (optional) The current status of the custom language model: * pending indicates that the model was created but is waiting either for training data to be added or for the service to finish analyzing added data. * ready indicates that the model contains data and is ready to be trained. * training indicates that the model is currently being trained. * available indicates that the model is trained and ready to use. * upgrading indicates that the model is currently being upgraded. * failed indicates that training of the model failed.
Attr int progress:
 (optional) A percentage that indicates the progress of the custom language model’s current training. A value of 100 means that the model is fully trained. Note: The progress field does not currently reflect the progress of the training; the field changes from 0 to 100 when training is complete.
Attr str warnings:
 (optional) If the request included unknown parameters, the following message: Unexpected query parameter(s) [‘parameters’] detected, where parameters is a list that includes a quoted string for each unknown parameter.
class LanguageModels(customizations)[source]

Bases: object

LanguageModels.

Attr list[LanguageModel] customizations:
 An array of objects that provides information about each available custom language model. The array is empty if the requesting service credentials own no custom language models (if no language is specified) or own no custom language models for the specified language.
class RecognitionJob(id, status, created, updated=None, url=None, user_token=None, results=None, warnings=None)[source]

Bases: object

RecognitionJob.

Attr str id:The ID of the job.
Attr str status:
 The current status of the job: * waiting: The service is preparing the job for processing. The service returns this status when the job is initially created or when it is waiting for capacity to process the job. The job remains in this state until the service has the capacity to begin processing it. * processing: The service is actively processing the job. * completed: The service has finished processing the job. If the job specified a callback URL and the event recognitions.completed_with_results, the service sent the results with the callback notification; otherwise, you must retrieve the results by checking the individual job. * failed: The job failed.
Attr str created:
 The date and time in Coordinated Universal Time (UTC) at which the job was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
Attr str updated:
 (optional) The date and time in Coordinated Universal Time (UTC) at which the job was last updated by the service. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD). Note: This field is returned only when you list information about a specific or all existing jobs.
Attr str url:(optional) The URL to use to request information about the job with the Check a job method. Note: This field is returned only when you create a new job.
Attr str user_token:
 (optional) The user token associated with a job that was created with a callback URL and a user token. Note: This field can be returned only when you list information about all existing jobs.
Attr list[SpeechRecognitionResults] results:
 (optional) If the status is completed, the results of the recognition request as an array that includes a single instance of a SpeechRecognitionResults object. Note: This field can be returned only when you list information about a specific existing job.
Attr list[str] warnings:
 (optional) An array of warning messages about invalid parameters included with the request. Each warning includes a descriptive message and a list of invalid argument strings, for example, “unexpected query parameter ‘user_token’, query parameter ‘callback_url’ was not specified”. The request succeeds despite the warnings. Note: This field can be returned only when you create a new job.
class RecognitionJobs(recognitions)[source]

Bases: object

RecognitionJobs.

Attr list[RecognitionJob] recognitions:
 An array of objects that provides the status for each of the user’s current jobs. The array is empty if the user has no current jobs.
class RegisterStatus(status, url)[source]

Bases: object

RegisterStatus.

Attr str status:
 The current status of the job: * created if the callback URL was successfully white-listed as a result of the call. * already created if the URL was already white-listed.
Attr str url:The callback URL that is successfully registered.
class SpeakerLabelsResult(_from, to, speaker, confidence, final_results)[source]

Bases: object

SpeakerLabelsResult.

Attr float _from:
 The start time of a word from the transcript. The value matches the start time of a word from the timestamps array.
Attr float to:The end time of a word from the transcript. The value matches the end time of a word from the timestamps array.
Attr int speaker:
 The numeric identifier that the service assigns to a speaker from the audio. Speaker IDs begin at 0 initially but can evolve and change across interim results (if supported by the method) and between interim and final results as the service processes the audio. They are not guaranteed to be sequential, contiguous, or ordered.
Attr float confidence:
 A score that indicates the service’s confidence in its identification of the speaker in the range of 0 to 1.
Attr bool final_results:
 An indication of whether the service might further change word and speaker-label results. A value of true means that the service guarantees not to send any further updates for the current or any preceding results; false means that the service might send further updates to the results.
class SpeechModel(name, language, rate, url, supported_features, description, sessions=None)[source]

Bases: object

SpeechModel.

Attr str name:The name of the model for use as an identifier in calls to the service (for example, en-US_BroadbandModel).
Attr str language:
 The language identifier for the model (for example, en-US).
Attr int rate:The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
Attr str url:The URI for the model.
Attr SupportedFeatures supported_features:
 Describes the additional service features supported with the model.
Attr str description:
 Brief description of the model.
Attr str sessions:
 (optional) The URI for the model for use with the Create a session method. (Returned only for requests for a single model with the Get a model method.).
class SpeechModels(models)[source]

Bases: object

SpeechModels.

Attr list[SpeechModel] models:
 Information about each available model.
class SpeechRecognitionAlternative(transcript, confidence=None, timestamps=None, word_confidence=None)[source]

Bases: object

SpeechRecognitionAlternative.

Attr str transcript:
 A transcription of the audio.
Attr float confidence:
 (optional) A score that indicates the service’s confidence in the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.
Attr list[str] timestamps:
 (optional) Time alignments for each word from the transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in seconds. Example: [[“hello”,0.0,1.2],[“world”,1.2,2.5]]. Available only for the best alternative.
Attr list[str] word_confidence:
 (optional) A confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. Example: [[“hello”,0.95],[“world”,0.866]]. Available only for the best alternative and only in results marked as final.
class SpeechRecognitionResult(final_results, alternatives, keywords_result=None, word_alternatives=None)[source]

Bases: object

SpeechRecognitionResult.

Attr bool final_results:
 An indication of whether the transcription results are final. If true, the results for this utterance are not updated further; no additional results are sent for a result_index once its results are indicated as final.
Attr list[SpeechRecognitionAlternative] alternatives:
 An array of alternative transcripts. The alternatives array can include additional requested output such as word confidence or timestamps.
Attr dict keywords_result:
 (optional) A dictionary (or associative array) whose keys are the strings specified for keywords if both that parameter and keywords_threshold are specified. A keyword for which no matches are found is omitted from the array. You can spot a maximum of 1000 keywords. The array is omitted if no keywords are found.
Attr list[WordAlternativeResults] word_alternatives:
 (optional) An array of alternative hypotheses found for words of the input audio if a word_alternatives_threshold is specified.
class SpeechRecognitionResults(results=None, result_index=None, speaker_labels=None, warnings=None)[source]

Bases: object

SpeechRecognitionResults.

Attr list[SpeechRecognitionResult] results:
 (optional) An array that can include interim and final results (interim results are returned only if supported by the method). Final results are guaranteed not to change; interim results might be replaced by further interim results and final results. The service periodically sends updates to the results list; the result_index is set to the lowest index in the array that has changed; it is incremented for new results.
Attr int result_index:
 (optional) An index that indicates a change point in the results array. The service increments the index only for additional results that it sends for new audio for the same request.
Attr list[SpeakerLabelsResult] speaker_labels:
 (optional) An array that identifies which words were spoken by which speakers in a multi-person exchange. Returned in the response only if speaker_labels is true. When interim results are also requested for methods that support them, it is possible for a SpeechRecognitionResults object to include only the speaker_labels field.
Attr list[str] warnings:
 (optional) An array of warning messages associated with the request: * Warnings for invalid parameters or JSON fields can include a descriptive message and a list of invalid argument strings, for example, “Unknown arguments:” or “Unknown url query arguments:” followed by a list of the form “invalid_arg_1, invalid_arg_2.” * The following warning is returned if the request passes a custom model that is based on an older version of a base model for which an updated version is available: “Using previous version of base model, because your custom model has been built with it. Please note that this version will be supported only for a limited time. Consider updating your custom model to the new base model. If you do not do that you will be automatically switched to base model when you used the non-updated custom model.” In both cases, the request succeeds despite the warnings.
class SupportedFeatures(custom_language_model, speaker_labels)[source]

Bases: object

SupportedFeatures.

Attr bool custom_language_model:
 Indicates whether the customization interface can be used to create a custom language model based on the language model.
Attr bool speaker_labels:
 Indicates whether the speaker_labels parameter can be used with the language model.
class Word(word, sounds_like, display_as, count, source, error=None)[source]

Bases: object

Word.

Attr str word:A word from the custom model’s words resource. The spelling of the word is used to train the model.
Attr list[str] sounds_like:
 An array of pronunciations for the word. The array can include the sounds-like pronunciation automatically generated by the service if none is provided for the word; the service adds this pronunciation when it finishes processing the word.
Attr str display_as:
 The spelling of the word that the service uses to display the word in a transcript. The field contains an empty string if no display-as value is provided for the word, in which case the word is displayed as it is spelled.
Attr int count:A sum of the number of times the word is found across all corpora. For example, if the word occurs five times in one corpus and seven times in another, its count is 12. If you add a custom word to a model before it is added by any corpora, the count begins at 1; if the word is added from a corpus first and later modified, the count reflects only the number of times it is found in corpora.
Attr list[str] source:
 An array of sources that describes how the word was added to the custom model’s words resource. For OOV words added from a corpus, includes the name of the corpus; if the word was added by multiple corpora, the names of all corpora are listed. If the word was modified or added by the user directly, the field includes the string user.
Attr list[WordError] error:
 (optional) If the service discovered one or more problems that you need to correct for the word’s definition, an array that describes each of the errors.
class WordAlternativeResult(confidence, word)[source]

Bases: object

WordAlternativeResult.

Attr float confidence:
 A confidence score for the word alternative hypothesis in the range of 0 to 1.
Attr str word:An alternative hypothesis for a word from the input audio.
class WordAlternativeResults(start_time, end_time, alternatives)[source]

Bases: object

WordAlternativeResults.

Attr float start_time:
 The start time in seconds of the word from the input audio that corresponds to the word alternatives.
Attr float end_time:
 The end time in seconds of the word from the input audio that corresponds to the word alternatives.
Attr list[WordAlternativeResult] alternatives:
 An array of alternative hypotheses for a word from the input audio.
class WordError(element)[source]

Bases: object

WordError.

Attr str element:
 A key-value pair that describes an error associated with the definition of a word in the words resource. Each pair has the format “element”: “message”, where element is the aspect of the definition that caused the problem and message describes the problem. The following example describes a problem with one of the word’s sounds-like definitions: “sounds_like_string”: “Numbers are not allowed in sounds-like. You can try for example ‘suggested_string’.” You must correct the error before you can train the model.
class Words(words)[source]

Bases: object

Words.

Attr list[Word] words:
 Information about each word in the custom model’s words resource. The array is empty if the custom model has no words.