Click or drag to resize

SpeechToText Class

This class wraps the Watson Speech to Text service. Speech to Text Service
Inheritance Hierarchy
SystemObject
  IBM.Watson.DeveloperCloud.Services.SpeechToText.v1SpeechToText

Namespace:  IBM.Watson.DeveloperCloud.Services.SpeechToText.v1
Assembly:  unity-documentation (in unity-documentation.exe) Version: 1.0.0.0 (1.0.0.0)
Syntax
C#
public class SpeechToText : IWatsonService

The SpeechToText type exposes the following members.

Constructors
  NameDescription
Public methodSpeechToText
Speech to Text constructor.
Top
Properties
  NameDescription
Public propertyAcousticCustomizationId
Specifies the Globally Unique Identifier (GUID) of a custom acoustic model that is to be used for all requests sent over the connection. The base model of the custom acoustic model must match the value of the model parameter. By default, no custom acoustic model is used. For more information, see https://console.bluemix.net/docs/services/speech-to-text/custom.html.
Public propertyAudioSent
True if AudioData has been sent and we are recognizing speech.
Public propertyCredentials
Gets and sets the credentials of the service. Replace the default endpoint if endpoint is defined.
Public propertyCustomizationId
Specifies the Globally Unique Identifier (GUID) of a custom language model that is to be used for all requests sent over the connection. The base model of the custom language model must match the value of the model parameter. By default, no custom language model is used. For more information, see https://console.bluemix.net/docs/services/speech-to-text/custom.html.
Public propertyCustomizationWeight
Specifies the weight the service gives to words from a specified custom language model compared to those from the base model for all requests sent over the connection. Specify a value between 0.0 and 1.0; the default value is 0.3. For more information, see https://console.bluemix.net/docs/services/speech-to-text/language-use.html#weight.
Public propertyDetectSilence
If true, then we will try not to send silent audio clips to the server. This can save bandwidth when no sound is happening.
Public propertyEnableInterimResults
If true, then we will get interim results while recognizing. The user will then need to check the Final flag on the results.
Public propertyEnableTimestamps
True to return timestamps of words with results.
Public propertyEnableWordConfidence
True to return word confidence with results.
Public propertyInactivityTimeout
NON-MULTIPART ONLY: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
Public propertyIsListening
True if StartListening() has been called.
Public propertyKeywords
NON-MULTIPART ONLY: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
Public propertyKeywordsThreshold
NON-MULTIPART ONLY: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
Public propertyLoadFile
Set this property to overload the internal file loading of this class.
Public propertyMaxAlternatives
Returns the maximum number of alternatives returned by recognize.
Public propertyOnError
This delegate is invoked when an error occurs.
Public propertyProfanityFilter
NON-MULTIPART ONLY: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.
Public propertyRecognizeModel
This property controls which recognize model we use when making recognize requests of the server.
Public propertySilenceThreshold
A value from 1.0 to 0.0 that determines what is considered silence. If the absolute value of the audio level is below this value then we consider it silence.
Public propertySmartFormatting
NON-MULTIPART ONLY: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.
Public propertySpeakerLabels
NON-MULTIPART ONLY: Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. If true, speaker labels are returned; if false (the default), they are not. Speaker labels can be returned only for the following language models: en-US_NarrowbandModel, en-US_BroadbandModel, es-ES_NarrowbandModel, es-ES_BroadbandModel, ja-JP_NarrowbandModel, and ja-JP_BroadbandModel. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
Public propertyStreamMultipart
If true sets `Transfer-Encoding` request header to `chunked` causing the audio to be streamed to the service. By default, audio is sent all at once as a one-shot delivery. See https://console.bluemix.net/docs/services/speech-to-text/input.html#transmission.
Public propertyUrl
Gets and sets the endpoint URL for the service.
Public propertyWordAlternativesThreshold
NON-MULTIPART ONLY: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
Top
Methods
  NameDescription
Public methodAddAcousticResource
Adds an audio resource to a custom acoustic model.
Public methodAddCustomCorpus
Overload method for AddCustomCorpus that takes string training data.
Public methodAddCustomWords(SpeechToTextSuccessCallbackBoolean, SpeechToTextFailCallback, String, Words, DictionaryString, Object)
Adds one or more custom words to a custom language model.The service populates the words resource for a custom model with out-of-vocabulary(OOV) words found in each corpus added to the model.You can use this method to add additional words or to modify existing words in the words resource.Only the owner of a custom model can use this method to add or modify custom words associated with the model.Adding or modifying custom words does not affect the custom model until you train the model for the new data by using the POST /v1/customizations/{customization_id}/train method. You add custom words by providing a Words object, which is an array of Word objects, one per word.You must use the object's word parameter to identify the word that is to be added. You can also provide one or both of the optional sounds_like and display_as fields for each word. The sounds_like field provides an array of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users.Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on.For example, you might specify that the word IEEE can sound like i triple e.You can specify a maximum of five sounds-like pronunciations for a word, and each pronunciation must adhere to the following rules: Use English alphabetic characters: a-z and A-Z. To pronounce a single letter, use the letter followed by a period, for example, N.C.A.A. for the word NCAA. Use real or made-up words that are pronounceable in the native language, for example, shuchensnie for the word Sczcesny. Substitute equivalent English letters for non-English letters, for example, s for ç or ny for ñ. Substitute non-accented letters for accented letters, for example a for à or e for è. Use the spelling of numbers, for example, seventy-five for 75. You can include multiple words separated by spaces, but the service enforces a maximum of 40 total characters not including spaces. The display_as field provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in corpora training data.For example, you might indicate that the word IBM(trademark) is to be displayed as IBM™. If you add a custom word that already exists in the words resource for the custom model, the new definition overrides the existing data for the word.If the service encounters an error with the input data, it returns a failure code and does not add any of the words to the words resource. The call returns an HTTP 201 response code if the input data is valid.It then asynchronously pre-processes the words to add them to the model's words resource. The time that it takes for the analysis to complete depends on the number of new words that you add but is generally faster than adding a corpus or training a model. You can use the GET /v1/customizations/{ customization_id}/words or GET /v1/customizations/{customization_id}/words/{word_name} method to review the words that you add.Words with an invalid sounds_like field include an error field that describes the problem.You can use other words methods to correct errors, eliminate typos, and modify how words are pronounced as needed. Note: This method is currently a beta release that is available for US English only.
Public methodAddCustomWords(SpeechToTextSuccessCallbackBoolean, SpeechToTextFailCallback, String, String, DictionaryString, Object)
Adds one or more custom words to a custom language model.The service populates the words resource for a custom model with out-of-vocabulary(OOV) words found in each corpus added to the model.You can use this method to add additional words or to modify existing words in the words resource.Only the owner of a custom model can use this method to add or modify custom words associated with the model.Adding or modifying custom words does not affect the custom model until you train the model for the new data by using the POST /v1/customizations/{customization_id}/train method. You add custom words by providing a Words object, which is an array of Word objects, one per word.You must use the object's word parameter to identify the word that is to be added. You can also provide one or both of the optional sounds_like and display_as fields for each word. The sounds_like field provides an array of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users.Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on.For example, you might specify that the word IEEE can sound like i triple e.You can specify a maximum of five sounds-like pronunciations for a word, and each pronunciation must adhere to the following rules: Use English alphabetic characters: a-z and A-Z. To pronounce a single letter, use the letter followed by a period, for example, N.C.A.A. for the word NCAA. Use real or made-up words that are pronounceable in the native language, for example, shuchensnie for the word Sczcesny. Substitute equivalent English letters for non-English letters, for example, s for ç or ny for ñ. Substitute non-accented letters for accented letters, for example a for à or e for è. Use the spelling of numbers, for example, seventy-five for 75. You can include multiple words separated by spaces, but the service enforces a maximum of 40 total characters not including spaces. The display_as field provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in corpora training data.For example, you might indicate that the word IBM(trademark) is to be displayed as IBM™. If you add a custom word that already exists in the words resource for the custom model, the new definition overrides the existing data for the word.If the service encounters an error with the input data, it returns a failure code and does not add any of the words to the words resource. The call returns an HTTP 201 response code if the input data is valid.It then asynchronously pre-processes the words to add them to the model's words resource. The time that it takes for the analysis to complete depends on the number of new words that you add but is generally faster than adding a corpus or training a model. You can use the GET /v1/customizations/{ customization_id}/words or GET /v1/customizations/{customization_id}/words/{word_name} method to review the words that you add.Words with an invalid sounds_like field include an error field that describes the problem.You can use other words methods to correct errors, eliminate typos, and modify how words are pronounced as needed. Note: This method is currently a beta release that is available for US English only.
Public methodAddCustomWords(SpeechToTextSuccessCallbackBoolean, SpeechToTextFailCallback, String, Boolean, String, DictionaryString, Object)
Adds one or more custom words to a custom language model.The service populates the words resource for a custom model with out-of-vocabulary(OOV) words found in each corpus added to the model.You can use this method to add additional words or to modify existing words in the words resource.Only the owner of a custom model can use this method to add or modify custom words associated with the model.Adding or modifying custom words does not affect the custom model until you train the model for the new data by using the POST /v1/customizations/{customization_id}/train method. You add custom words by providing a Words object, which is an array of Word objects, one per word.You must use the object's word parameter to identify the word that is to be added. You can also provide one or both of the optional sounds_like and display_as fields for each word. The sounds_like field provides an array of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users.Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on.For example, you might specify that the word IEEE can sound like i triple e.You can specify a maximum of five sounds-like pronunciations for a word, and each pronunciation must adhere to the following rules: Use English alphabetic characters: a-z and A-Z. To pronounce a single letter, use the letter followed by a period, for example, N.C.A.A. for the word NCAA. Use real or made-up words that are pronounceable in the native language, for example, shuchensnie for the word Sczcesny. Substitute equivalent English letters for non-English letters, for example, s for ç or ny for ñ. Substitute non-accented letters for accented letters, for example a for à or e for è. Use the spelling of numbers, for example, seventy-five for 75. You can include multiple words separated by spaces, but the service enforces a maximum of 40 total characters not including spaces. The display_as field provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in corpora training data.For example, you might indicate that the word IBM(trademark) is to be displayed as IBM™. If you add a custom word that already exists in the words resource for the custom model, the new definition overrides the existing data for the word.If the service encounters an error with the input data, it returns a failure code and does not add any of the words to the words resource. The call returns an HTTP 201 response code if the input data is valid.It then asynchronously pre-processes the words to add them to the model's words resource. The time that it takes for the analysis to complete depends on the number of new words that you add but is generally faster than adding a corpus or training a model. You can use the GET /v1/customizations/{ customization_id}/words or GET /v1/customizations/{customization_id}/words/{word_name} method to review the words that you add.Words with an invalid sounds_like field include an error field that describes the problem.You can use other words methods to correct errors, eliminate typos, and modify how words are pronounced as needed. Note: This method is currently a beta release that is available for US English only.
Public methodCreateAcousticCustomization
Creates a custom acoustic model.
Public methodCreateCustomization
Creates a new custom language model for a specified base language model. The custom language model can be used only with the base language model for which it is created. The new model is owned by the individual whose service credentials are used to create it. Note: This method is currently a beta release that is available for US English only.
Public methodDeleteAcousticCustomization
Deletes a custom acoustic model.
Public methodDeleteAcousticResource
Deletes an audio resource from a custom acoustic model.
Public methodDeleteCustomCorpus
Deletes an existing corpus from a custom language model. The service removes any out-of-vocabulary (OOV) words associated with the corpus from the custom model's words resource unless they were also added by another corpus or they have been modified in some way with the POST /v1/customizations/{customization_id}/words or PUT /v1/customizations/{customization_id}/words/{word_name} method. Removing a corpus does not affect the custom model until you train the model with the POST /v1/customizations/{customization_id}/train method. Only the owner of a custom model can use this method to delete a corpus from the model. Note: This method is currently a beta release that is available for US English only.
Public methodDeleteCustomization
Deletes an existing custom language model. Only the owner of a custom model can use this method to delete the model. Note: This method is currently a beta release that is available for US English only.
Public methodDeleteCustomWord
Deletes a custom word from a custom language model. You can remove any word that you added to the custom model's words resource via any means. However, if the word also exists in the service's base vocabulary, the service removes only the custom pronunciation for the word; the word remains in the base vocabulary. Removing a custom word does not affect the custom model until you train the model with the POST /v1/customizations/{customization_id}/train method.Only the owner of a custom model can use this method to delete a word from the model. Note: This method is currently a beta release that is available for US English only.
Public methodGetCustomAcousticModel
Lists information about a custom acoustic model.
Public methodGetCustomAcousticModels
Lists information about all custom acoustic models.
Public methodGetCustomAcousticResource
Lists information about an audio resource for a custom acoustic model.
Public methodGetCustomAcousticResources
Lists information about all audio resources for a custom acoustic model.
Public methodGetCustomCorpora
Lists information about all corpora that have been added to the specified custom language model. The information includes the total number of words and out-of-vocabulary (OOV) words, name, and status of each corpus. Only the owner of a custom model can use this method to list the model's corpora. Note: This method is currently a beta release that is available for US English only.
Public methodGetCustomCorpus
Lists information about all corpora that have been added to the specified custom language model. The information includes the total number of words and out-of-vocabulary (OOV) words, name, and status of each corpus. Only the owner of a custom model can use this method to list the model's corpora. Note: This method is currently a beta release that is available for US English only.
Public methodGetCustomization
Lists information about a custom language model. Only the owner of a custom model can use this method to query information about the model. Note: This method is currently a beta release that is available for US English only.
Public methodGetCustomizations
Lists information about all custom language models that are owned by the calling user. Use the language query parameter to see all custom models for the specified language; omit the parameter to see all custom models for all languages. Note: This method is currently a beta release that is available for US English only.
Public methodGetCustomWord
Lists information about a custom word from a custom language model. Only the owner of a custom model can use this method to query a word from the model. Note: This method is currently a beta release that is available for US English only.
Public methodGetCustomWords
Lists information about all custom words from a custom language model. You can list all words from the custom model's words resource, only custom words that were added or modified by the user, or only OOV words that were extracted from corpora. Only the owner of a custom model can use this method to query the words from the model. Note: This method is currently a beta release that is available for US English only.
Public methodGetModel
This function retrieves a specified languageModel.
Public methodGetModels
This function retrieves all the language models that the user may use by setting the RecognizeModel public property.
Public methodOnListen
This function should be invoked with the AudioData input after StartListening() method has been invoked. The user should continue to invoke this function until they are ready to call StopListening(), typically microphone input is sent to this function.
Public methodRecognize(SpeechToTextSuccessCallbackSpeechRecognitionEvent, SpeechToTextFailCallback, AudioClip, DictionaryString, Object)
This function POSTs the given audio clip the recognize function and convert speech into text. This function should be used only on AudioClips under 4MB once they have been converted into WAV format. Use the StartListening() for continuous recognition of text.
Public methodRecognize(SpeechToTextSuccessCallbackSpeechRecognitionEvent, SpeechToTextFailCallback, Byte, String, DictionaryString, Object)
This function POSTs the given audio clip the recognize function and convert speech into text. This function should be used only on AudioClips under 4MB once they have been converted into WAV format. Use the StartListening() for continuous recognition of text.
Public methodResetAcousticCustomization
Resets a custom acoustic model.
Public methodResetCustomization
Resets a custom language model by removing all corpora and words from the model.Resetting a custom model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved.Only the owner of a custom model can use this method to reset the model. Note: This method is currently a beta release that is available for US English only.
Public methodStartListening
This starts the service listening and it will invoke the callback for any recognized speech. OnListen() must be called by the user to queue audio data to send to the service. StopListening() should be called when you want to stop listening.
Public methodStopListening
Invoke this function stop this service from listening.
Public methodTrainAcousticCustomization
Trains an acoustic model
Public methodTrainCustomization
Initiates the training of a custom language model with new corpora, words, or both.After adding training data to the custom model with the corpora or words methods, use this method to begin the actual training of the model on the new data.You can specify whether the custom model is to be trained with all words from its words resources or only with words that were added or modified by the user.Only the owner of a custom model can use this method to train the model. This method is asynchronous and can take on the order of minutes to complete depending on the amount of data on which the service is being trained and the current load on the service.The method returns an HTTP 200 response code to indicate that the training process has begun. You can monitor the status of the training by using the GET /v1/customizations/{customization_id} method to poll the model's status. Use a loop to check the status every 10 seconds. The method returns a Customization object that includes status and progress fields. A status of available means that the custom model is trained and ready to use. If training is in progress, the progress field indicates the progress of the training as a percentage complete. Note: For this beta release, the progress field does not reflect the current progress of the training. The field changes from 0 to 100 when training is complete. Training can fail to start for the following reasons: No training data (corpora or words) have been added to the custom model. Pre-processing of corpora to generate a list of out-of-vocabulary (OOV) words is not complete. Pre-processing of words to validate or auto-generate sounds-like pronunciations is not complete. One or more words that were added to the custom model have invalid sounds-like pronunciations that you must fix. Note: This method is currently a beta release that is available for US English only.
Public methodUpgradeCustomization
Upgrades a custom language model to the latest release level of the Speech to Text service. The method bases the upgrade on the latest trained data stored for the custom model. If the corpora or words for the model have changed since the model was last trained, you must use the POST /v1/customizations/{customization_id}/train method to train the model on the new data. Only the owner of a custom model can use this method to upgrade the model. Note: This method is not currently implemented.It will be added for a future release of the API.
Top
See Also