public class SpeechToText
extends com.ibm.cloud.sdk.core.service.BaseService
The service supports two types of models: previous-generation models that include the terms `Broadband` and `Narrowband` in their names, and next-generation models that include the terms `Multimedia` and `Telephony` in their names. Broadband and multimedia models have minimum sampling rates of 16 kHz. Narrowband and telephony models have minimum sampling rates of 8 kHz. The next-generation models offer high throughput and greater transcription accuracy.
Effective **31 July 2023**, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see [Migrating to large speech models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).{: deprecated}
For speech recognition, the service supports synchronous and asynchronous HTTP Representational State Transfer (REST) interfaces. It also supports a WebSocket interface that provides a full-duplex, low-latency communication channel: Clients send requests and audio to the service and receive results over a single connection asynchronously.
The service also offers two customization interfaces. Use language model customization to expand the vocabulary of a base model with domain-specific terminology. Use acoustic model customization to adapt a base model for the acoustic characteristics of your audio. For language model customization, the service also supports grammars. A grammar is a formal language specification that lets you restrict the phrases that the service can recognize.
Language model customization and grammars are available for most previous- and next-generation models. Acoustic model customization is available for all previous-generation models.
API Version: 1.0.0 See: https://cloud.ibm.com/docs/speech-to-text
Modifier and Type | Field and Description |
---|---|
static String |
DEFAULT_SERVICE_NAME
Default service name used when configuring the `SpeechToText` client.
|
static String |
DEFAULT_SERVICE_URL
Default service endpoint URL.
|
Constructor and Description |
---|
SpeechToText()
Constructs an instance of the `SpeechToText` client.
|
SpeechToText(com.ibm.cloud.sdk.core.security.Authenticator authenticator)
Constructs an instance of the `SpeechToText` client.
|
SpeechToText(String serviceName)
Constructs an instance of the `SpeechToText` client.
|
SpeechToText(String serviceName,
com.ibm.cloud.sdk.core.security.Authenticator authenticator)
Constructs an instance of the `SpeechToText` client.
|
Modifier and Type | Method and Description |
---|---|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
addAudio(AddAudioOptions addAudioOptions)
Add an audio resource.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
addCorpus(AddCorpusOptions addCorpusOptions)
Add a corpus.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
addGrammar(AddGrammarOptions addGrammarOptions)
Add a grammar.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
addWord(AddWordOptions addWordOptions)
Add a custom word.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
addWords(AddWordsOptions addWordsOptions)
Add custom words.
|
com.ibm.cloud.sdk.core.http.ServiceCall<RecognitionJob> |
checkJob(CheckJobOptions checkJobOptions)
Check a job.
|
com.ibm.cloud.sdk.core.http.ServiceCall<RecognitionJobs> |
checkJobs()
Check jobs.
|
com.ibm.cloud.sdk.core.http.ServiceCall<RecognitionJobs> |
checkJobs(CheckJobsOptions checkJobsOptions)
Check jobs.
|
com.ibm.cloud.sdk.core.http.ServiceCall<AcousticModel> |
createAcousticModel(CreateAcousticModelOptions createAcousticModelOptions)
Create a custom acoustic model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<RecognitionJob> |
createJob(CreateJobOptions createJobOptions)
Create a job.
|
com.ibm.cloud.sdk.core.http.ServiceCall<LanguageModel> |
createLanguageModel(CreateLanguageModelOptions createLanguageModelOptions)
Create a custom language model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
deleteAcousticModel(DeleteAcousticModelOptions deleteAcousticModelOptions)
Delete a custom acoustic model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
deleteAudio(DeleteAudioOptions deleteAudioOptions)
Delete an audio resource.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
deleteCorpus(DeleteCorpusOptions deleteCorpusOptions)
Delete a corpus.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
deleteGrammar(DeleteGrammarOptions deleteGrammarOptions)
Delete a grammar.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
deleteJob(DeleteJobOptions deleteJobOptions)
Delete a job.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
deleteLanguageModel(DeleteLanguageModelOptions deleteLanguageModelOptions)
Delete a custom language model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
deleteUserData(DeleteUserDataOptions deleteUserDataOptions)
Delete labeled data.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
deleteWord(DeleteWordOptions deleteWordOptions)
Delete a custom word.
|
com.ibm.cloud.sdk.core.http.ServiceCall<AcousticModel> |
getAcousticModel(GetAcousticModelOptions getAcousticModelOptions)
Get a custom acoustic model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<AudioListing> |
getAudio(GetAudioOptions getAudioOptions)
Get an audio resource.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Corpus> |
getCorpus(GetCorpusOptions getCorpusOptions)
Get a corpus.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Grammar> |
getGrammar(GetGrammarOptions getGrammarOptions)
Get a grammar.
|
com.ibm.cloud.sdk.core.http.ServiceCall<LanguageModel> |
getLanguageModel(GetLanguageModelOptions getLanguageModelOptions)
Get a custom language model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<SpeechModel> |
getModel(GetModelOptions getModelOptions)
Get a model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Word> |
getWord(GetWordOptions getWordOptions)
Get a custom word.
|
com.ibm.cloud.sdk.core.http.ServiceCall<AcousticModels> |
listAcousticModels()
List custom acoustic models.
|
com.ibm.cloud.sdk.core.http.ServiceCall<AcousticModels> |
listAcousticModels(ListAcousticModelsOptions listAcousticModelsOptions)
List custom acoustic models.
|
com.ibm.cloud.sdk.core.http.ServiceCall<AudioResources> |
listAudio(ListAudioOptions listAudioOptions)
List audio resources.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Corpora> |
listCorpora(ListCorporaOptions listCorporaOptions)
List corpora.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Grammars> |
listGrammars(ListGrammarsOptions listGrammarsOptions)
List grammars.
|
com.ibm.cloud.sdk.core.http.ServiceCall<LanguageModels> |
listLanguageModels()
List custom language models.
|
com.ibm.cloud.sdk.core.http.ServiceCall<LanguageModels> |
listLanguageModels(ListLanguageModelsOptions listLanguageModelsOptions)
List custom language models.
|
com.ibm.cloud.sdk.core.http.ServiceCall<SpeechModels> |
listModels()
List models.
|
com.ibm.cloud.sdk.core.http.ServiceCall<SpeechModels> |
listModels(ListModelsOptions listModelsOptions)
List models.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Words> |
listWords(ListWordsOptions listWordsOptions)
List custom words.
|
com.ibm.cloud.sdk.core.http.ServiceCall<SpeechRecognitionResults> |
recognize(RecognizeOptions recognizeOptions)
Recognize audio.
|
okhttp3.WebSocket |
recognizeUsingWebSocket(RecognizeWithWebsocketsOptions recognizeOptions,
RecognizeCallback callback)
Sends audio and returns transcription results for recognition requests over a WebSocket
connection.
|
com.ibm.cloud.sdk.core.http.ServiceCall<RegisterStatus> |
registerCallback(RegisterCallbackOptions registerCallbackOptions)
Register a callback.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
resetAcousticModel(ResetAcousticModelOptions resetAcousticModelOptions)
Reset a custom acoustic model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
resetLanguageModel(ResetLanguageModelOptions resetLanguageModelOptions)
Reset a custom language model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<TrainingResponse> |
trainAcousticModel(TrainAcousticModelOptions trainAcousticModelOptions)
Train a custom acoustic model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<TrainingResponse> |
trainLanguageModel(TrainLanguageModelOptions trainLanguageModelOptions)
Train a custom language model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
unregisterCallback(UnregisterCallbackOptions unregisterCallbackOptions)
Unregister a callback.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
upgradeAcousticModel(UpgradeAcousticModelOptions upgradeAcousticModelOptions)
Upgrade a custom acoustic model.
|
com.ibm.cloud.sdk.core.http.ServiceCall<Void> |
upgradeLanguageModel(UpgradeLanguageModelOptions upgradeLanguageModelOptions)
Upgrade a custom language model.
|
configureClient, configureService, constructServiceUrl, constructServiceURL, disableRetries, enableGzipCompression, enableRetries, getAuthenticator, getClient, getDefaultHeaders, getEndPoint, getName, getServiceUrl, isJsonMimeType, isJsonPatchMimeType, setClient, setDefaultHeaders, setEndPoint, setServiceUrl, toString
public static final String DEFAULT_SERVICE_NAME
public static final String DEFAULT_SERVICE_URL
public SpeechToText()
public SpeechToText(com.ibm.cloud.sdk.core.security.Authenticator authenticator)
authenticator
- the Authenticator
instance to be configured for this clientpublic SpeechToText(String serviceName)
serviceName
- the service name to be used when configuring the client instancepublic SpeechToText(String serviceName, com.ibm.cloud.sdk.core.security.Authenticator authenticator)
serviceName
- the service name to be used when configuring the client instanceauthenticator
- the Authenticator
instance to be configured for this clientpublic okhttp3.WebSocket recognizeUsingWebSocket(RecognizeWithWebsocketsOptions recognizeOptions, RecognizeCallback callback)
The service imposes a data size limit of 100 MB per utterance (per recognition request). You can send multiple utterances over a single WebSocket connection. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. (For the audio/l16 format, you can specify the endianness.)
recognizeOptions
- the recognize optionscallback
- the RecognizeCallback
instance where results will be sentWebSocket
public com.ibm.cloud.sdk.core.http.ServiceCall<SpeechModels> listModels(ListModelsOptions listModelsOptions)
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
**See also:** [Listing all models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-list#models-list-all).
listModelsOptions
- the ListModelsOptions
containing the options for the callServiceCall
with a result of type SpeechModels
public com.ibm.cloud.sdk.core.http.ServiceCall<SpeechModels> listModels()
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
**See also:** [Listing all models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-list#models-list-all).
ServiceCall
with a result of type SpeechModels
public com.ibm.cloud.sdk.core.http.ServiceCall<SpeechModel> getModel(GetModelOptions getModelOptions)
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
**See also:** [Listing a specific model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-list#models-list-specific).
getModelOptions
- the GetModelOptions
containing the options for the callServiceCall
with a result of type SpeechModel
public com.ibm.cloud.sdk.core.http.ServiceCall<SpeechRecognitionResults> recognize(RecognizeOptions recognizeOptions)
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the `curl` command, use the `--data-binary` option to upload the file for the request.)
**See also:** [Making a basic HTTP request](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-http#HTTP-basic).
### Streaming mode
For requests to transcribe live audio as it becomes available, you must set the `Transfer-Encoding` header to `chunked` to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for `inactivity_timeout` seconds of streaming audio; use the `inactivity_timeout` parameter to change the default of 30 seconds.
**See also:** * [Audio transmission](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#transmission) * [Timeouts](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#timeouts)
### Audio formats (content types)
The service accepts audio in the following formats (MIME types). * For formats that are labeled **Required**, you must use the `Content-Type` header with the request to specify the format of the audio. * For all other formats, you can omit the `Content-Type` header or specify `application/octet-stream` with the header to have the service automatically detect the format of the audio. (With the `curl` command, you can specify either `"Content-Type:"` or `"Content-Type: application/octet-stream"`.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio. * `audio/alaw` (**Required.** Specify the sampling rate (`rate`) of the audio.) * `audio/basic` (**Required.** Use only with narrowband models.) * `audio/flac` * `audio/g729` (Use only with narrowband models.) * `audio/l16` (**Required.** Specify the sampling rate (`rate`) and optionally the number of channels (`channels`) and endianness (`endianness`) of the audio.) * `audio/mp3` * `audio/mpeg` * `audio/mulaw` (**Required.** Specify the sampling rate (`rate`) of the audio.) * `audio/ogg` (The service automatically detects the codec of the input audio.) * `audio/ogg;codecs=opus` * `audio/ogg;codecs=vorbis` * `audio/wav` (Provide audio with a maximum of nine channels.) * `audio/webm` (The service automatically detects the codec of the input audio.) * `audio/webm;codecs=opus` * `audio/webm;codecs=vorbis`
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
**See also:** [Supported audio formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).
### Large speech models and Next-generation models
The service supports large speech models and next-generation `Multimedia` (16 kHz) and `Telephony` (8 kHz) models for many languages. Large speech models and next-generation models have higher throughput than the service's previous generation of `Broadband` and `Narrowband` models. When you use large speech models and next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.
You specify a large speech model or next-generation model by using the `model` query parameter, as you do a previous-generation model. Only the next-generation models support the `low_latency` parameter, and all large speech models and next-generation models support the `character_insertion_bias` parameter. These parameters are not available with previous-generation models.
Large speech models and next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters: * `acoustic_customization_id` * `keywords` and `keywords_threshold` * `processing_metrics` and `processing_metrics_interval` * `word_alternatives_threshold`
**Important:** Effective **31 July 2023**, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see [Migrating to large speech models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).
**See also:** * [Large speech languages and models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-large-speech-languages) * [Supported features for large speech models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-large-speech-languages#models-lsm-supported-features) * [Next-generation languages and models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng) * [Supported features for next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-features)
### Multipart speech recognition
**Note:** The asynchronous HTTP interface, WebSocket interface, and Watson SDKs do not support multipart speech recognition.
The HTTP `POST` method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
**See also:** [Making a multipart HTTP request](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-http#HTTP-multi).
recognizeOptions
- the RecognizeOptions
containing the options for the callServiceCall
with a result of type SpeechRecognitionResults
public com.ibm.cloud.sdk.core.http.ServiceCall<RegisterStatus> registerCallback(RegisterCallbackOptions registerCallbackOptions)
Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or allowlist, the callback URL if it is not already registered by sending a `GET` request to the callback URL. The service passes a random alphanumeric challenge string via the `challenge_string` parameter of the request. The request includes an `Accept` header that specifies `text/plain` as the required response type.
To be registered successfully, the callback URL must respond to the `GET` request from the service. The response must send status code 200 and must include the challenge string in its body. Set the `Content-Type` response header to `text/plain`. Upon receiving this response, the service responds to the original registration request with response code 201.
The service sends only a single `GET` request to the callback URL. If the service does not receive a reply with a response code of 200 and a body that echoes the challenge string sent by the service within five seconds, it does not allowlist the URL; it instead sends status code 400 in response to the request to register a callback. If the requested callback URL is already allowlisted, the service responds to the initial registration request with response code 200.
If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of the challenge string in its response to the `POST` request. It sends this signature in the `X-Callback-Signature` header of its `GET` request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.
After you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.
**See also:** [Registering a callback URL](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#register).
registerCallbackOptions
- the RegisterCallbackOptions
containing the options for
the callServiceCall
with a result of type RegisterStatus
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> unregisterCallback(UnregisterCallbackOptions unregisterCallbackOptions)
Unregisters a callback URL that was previously allowlisted with a [Register a callback](#registercallback) request for use with the asynchronous interface. Once unregistered, the URL can no longer be used with asynchronous recognition requests.
**See also:** [Unregistering a callback URL](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#unregister).
unregisterCallbackOptions
- the UnregisterCallbackOptions
containing the options
for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<RecognitionJob> createJob(CreateJobOptions createJobOptions)
Creates a job for a new asynchronous recognition request. The job is owned by the instance of the service whose credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request: * By callback notification: Include the `callback_url` parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the `events` and `user_token` parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job. * By polling the service: Omit the `callback_url`, `events`, and `user_token` parameters. You must then use the [Check jobs](#checkjobs) or [Check a job](#checkjob) methods to check the status of the job, using the latter to retrieve the results when the job is complete.
The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the `results_ttl` parameter to specify how long the results are to remain available after the job is complete. Using the HTTPS [Check a job](#checkjob) method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.
The method supports the same basic parameters as other HTTP and WebSocket recognition requests. It also supports the following parameters specific to the asynchronous interface: * `callback_url` * `events` * `user_token` * `results_ttl`
You can pass a maximum of 1 GB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the `curl` command, use the `--data-binary` option to upload the file for the request.)
**See also:** [Creating a job](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#create).
### Streaming mode
For requests to transcribe live audio as it becomes available, you must set the `Transfer-Encoding` header to `chunked` to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for `inactivity_timeout` seconds of streaming audio; use the `inactivity_timeout` parameter to change the default of 30 seconds.
**See also:** * [Audio transmission](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#transmission) * [Timeouts](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#timeouts)
### Audio formats (content types)
The service accepts audio in the following formats (MIME types). * For formats that are labeled **Required**, you must use the `Content-Type` header with the request to specify the format of the audio. * For all other formats, you can omit the `Content-Type` header or specify `application/octet-stream` with the header to have the service automatically detect the format of the audio. (With the `curl` command, you can specify either `"Content-Type:"` or `"Content-Type: application/octet-stream"`.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio. * `audio/alaw` (**Required.** Specify the sampling rate (`rate`) of the audio.) * `audio/basic` (**Required.** Use only with narrowband models.) * `audio/flac` * `audio/g729` (Use only with narrowband models.) * `audio/l16` (**Required.** Specify the sampling rate (`rate`) and optionally the number of channels (`channels`) and endianness (`endianness`) of the audio.) * `audio/mp3` * `audio/mpeg` * `audio/mulaw` (**Required.** Specify the sampling rate (`rate`) of the audio.) * `audio/ogg` (The service automatically detects the codec of the input audio.) * `audio/ogg;codecs=opus` * `audio/ogg;codecs=vorbis` * `audio/wav` (Provide audio with a maximum of nine channels.) * `audio/webm` (The service automatically detects the codec of the input audio.) * `audio/webm;codecs=opus` * `audio/webm;codecs=vorbis`
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
**See also:** [Supported audio formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).
### Large speech models and Next-generation models
The service supports large speech models and next-generation `Multimedia` (16 kHz) and `Telephony` (8 kHz) models for many languages. Large speech models and next-generation models have higher throughput than the service's previous generation of `Broadband` and `Narrowband` models. When you use large speech models and next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.
You specify a large speech model or next-generation model by using the `model` query parameter, as you do a previous-generation model. Only the next-generation models support the `low_latency` parameter, and all large speech models and next-generation models support the `character_insertion_bias` parameter. These parameters are not available with previous-generation models.
Large speech models and next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters: * `acoustic_customization_id` * `keywords` and `keywords_threshold` * `processing_metrics` and `processing_metrics_interval` * `word_alternatives_threshold`
**Important:** Effective **31 July 2023**, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see [Migrating to large speech models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).
**See also:** * [Large speech languages and models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-large-speech-languages) * [Supported features for large speech models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-large-speech-languages#models-lsm-supported-features) * [Next-generation languages and models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng) * [Supported features for next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-features).
createJobOptions
- the CreateJobOptions
containing the options for the callServiceCall
with a result of type RecognitionJob
public com.ibm.cloud.sdk.core.http.ServiceCall<RecognitionJobs> checkJobs(CheckJobsOptions checkJobsOptions)
Returns the ID and status of the latest 100 outstanding jobs associated with the credentials with which it is called. The method also returns the creation and update times of each job, and, if a job was created with a callback URL and a user token, the user token for the job. To obtain the results for a job whose status is `completed` or not one of the latest 100 outstanding jobs, use the [Check a job[(#checkjob) method. A job and its results remain available until you delete them with the [Delete a job](#deletejob) method or until the job's time to live expires, whichever comes first.
**See also:** [Checking the status of the latest jobs](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#jobs).
checkJobsOptions
- the CheckJobsOptions
containing the options for the callServiceCall
with a result of type RecognitionJobs
public com.ibm.cloud.sdk.core.http.ServiceCall<RecognitionJobs> checkJobs()
Returns the ID and status of the latest 100 outstanding jobs associated with the credentials with which it is called. The method also returns the creation and update times of each job, and, if a job was created with a callback URL and a user token, the user token for the job. To obtain the results for a job whose status is `completed` or not one of the latest 100 outstanding jobs, use the [Check a job[(#checkjob) method. A job and its results remain available until you delete them with the [Delete a job](#deletejob) method or until the job's time to live expires, whichever comes first.
**See also:** [Checking the status of the latest jobs](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#jobs).
ServiceCall
with a result of type RecognitionJobs
public com.ibm.cloud.sdk.core.http.ServiceCall<RecognitionJob> checkJob(CheckJobOptions checkJobOptions)
Returns information about the specified job. The response always includes the status of the job and its creation and update times. If the status is `completed`, the response includes the results of the recognition request. You must use credentials for the instance of the service that owns a job to list information about it.
You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the `recognitions.completed_with_results` event, and you can retrieve the results multiple times for as long as they remain available. Use the [Check jobs](#checkjobs) method to request information about the most recent jobs associated with the calling credentials.
**See also:** [Checking the status and retrieving the results of a job](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#job).
checkJobOptions
- the CheckJobOptions
containing the options for the callServiceCall
with a result of type RecognitionJob
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteJob(DeleteJobOptions deleteJobOptions)
Deletes the specified job. You cannot delete a job that the service is actively processing. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must use credentials for the instance of the service that owns a job to delete it.
**See also:** [Deleting a job](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#delete-async).
deleteJobOptions
- the DeleteJobOptions
containing the options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<LanguageModel> createLanguageModel(CreateLanguageModelOptions createLanguageModelOptions)
Creates a new custom language model for a specified base model. The custom language model can be used only with the base model for which it is created. The model is owned by the instance of the service whose credentials are used to create it.
You can create a maximum of 1024 custom language models per owning credentials. The service returns an error if you attempt to create more than 1024 models. You do not lose any models, but you cannot create any more until your model count is below the limit.
**Important:** Effective **31 July 2023**, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see [Migrating to large speech models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).
**See also:** * [Create a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#createModel-language) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support)
### Large speech models and Next-generation models
The service supports large speech models and next-generation `Multimedia` (16 kHz) and `Telephony` (8 kHz) models for many languages. Large speech models and next-generation models have higher throughput than the service's previous generation of `Broadband` and `Narrowband` models. When you use large speech models and next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.
You specify a large speech model or next-generation model by using the `model` query parameter, as you do a previous-generation model. Only the next-generation models support the `low_latency` parameter, and all large speech models and next-generation models support the `character_insertion_bias` parameter. These parameters are not available with previous-generation models.
Large speech models and next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters: * `acoustic_customization_id` * `keywords` and `keywords_threshold` * `processing_metrics` and `processing_metrics_interval` * `word_alternatives_threshold`
**Important:** Effective **31 July 2023**, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see [Migrating to large speech models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).
**See also:** * [Large speech languages and models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-large-speech-languages) * [Supported features for large speech models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-large-speech-languages#models-lsm-supported-features) * [Next-generation languages and models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng) * [Supported features for next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-features).
createLanguageModelOptions
- the CreateLanguageModelOptions
containing the options
for the callServiceCall
with a result of type LanguageModel
public com.ibm.cloud.sdk.core.http.ServiceCall<LanguageModels> listLanguageModels(ListLanguageModelsOptions listLanguageModelsOptions)
Lists information about all custom language models that are owned by an instance of the service. Use the `language` parameter to see all custom language models for the specified language. Omit the parameter to see all custom language models for all languages. You must use credentials for the instance of the service that owns a model to list information about it.
**See also:** * [Listing custom language models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#listModels-language) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).
listLanguageModelsOptions
- the ListLanguageModelsOptions
containing the options
for the callServiceCall
with a result of type LanguageModels
public com.ibm.cloud.sdk.core.http.ServiceCall<LanguageModels> listLanguageModels()
Lists information about all custom language models that are owned by an instance of the service. Use the `language` parameter to see all custom language models for the specified language. Omit the parameter to see all custom language models for all languages. You must use credentials for the instance of the service that owns a model to list information about it.
**See also:** * [Listing custom language models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#listModels-language) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).
ServiceCall
with a result of type LanguageModels
public com.ibm.cloud.sdk.core.http.ServiceCall<LanguageModel> getLanguageModel(GetLanguageModelOptions getLanguageModelOptions)
Gets information about a specified custom language model. You must use credentials for the instance of the service that owns a model to list information about it.
**See also:** * [Listing custom language models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#listModels-language) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).
getLanguageModelOptions
- the GetLanguageModelOptions
containing the options for
the callServiceCall
with a result of type LanguageModel
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteLanguageModel(DeleteLanguageModelOptions deleteLanguageModelOptions)
Deletes an existing custom language model. The custom model cannot be deleted if another request, such as adding a corpus or grammar to the model, is currently being processed. You must use credentials for the instance of the service that owns a model to delete it.
**See also:** * [Deleting a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#deleteModel-language) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).
deleteLanguageModelOptions
- the DeleteLanguageModelOptions
containing the options
for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<TrainingResponse> trainLanguageModel(TrainLanguageModelOptions trainLanguageModelOptions)
Initiates the training of a custom language model with new resources such as corpora, grammars, and custom words. After adding, modifying, or deleting resources for a custom language model, use this method to begin the actual training of the model on the latest data. You can specify whether the custom language model is to be trained with all words from its words resource or only with words that were added or modified by the user directly. You must use credentials for the instance of the service that owns a model to train it.
The training method is asynchronous. It can take on the order of minutes to complete depending on the amount of data on which the service is being trained and the current load on the service. The method returns an HTTP 200 response code to indicate that the training process has begun.
You can monitor the status of the training by using the [Get a custom language model](#getlanguagemodel) method to poll the model's status. Use a loop to check the status every 10 seconds. If you added custom words directly to a custom model that is based on a next-generation model, allow for some minutes of extra training time for the model.
The method returns a `LanguageModel` object that includes `status` and `progress` fields. A status of `available` means that the custom model is trained and ready to use. The service cannot accept subsequent training requests or requests to add new resources until the existing request completes.
For custom models that are based on improved base language models, training also performs an automatic upgrade to a newer version of the base model. You do not need to use the [Upgrade a custom language model](#upgradelanguagemodel) method to perform the upgrade.
**See also:** * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support) * [Train the custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#trainModel-language) * [Upgrading custom language models that are based on improved next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-language-ng)
### Training failures
Training can fail to start for the following reasons: * The service is currently handling another request for the custom model, such as another training request or a request to add a corpus or grammar to the model. * No training data have been added to the custom model. * The custom model contains one or more invalid corpora, grammars, or words (for example, a custom word has an invalid sounds-like pronunciation). You can correct the invalid resources or set the `strict` parameter to `false` to exclude the invalid resources from the training. The model must contain at least one valid resource for training to succeed.
trainLanguageModelOptions
- the TrainLanguageModelOptions
containing the options
for the callServiceCall
with a result of type TrainingResponse
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> resetLanguageModel(ResetLanguageModelOptions resetLanguageModelOptions)
Resets a custom language model by removing all corpora, grammars, and words from the model. Resetting a custom language model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved, but the model's words resource is removed and must be re-created. You must use credentials for the instance of the service that owns a model to reset it.
**See also:** * [Resetting a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#resetModel-language) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).
resetLanguageModelOptions
- the ResetLanguageModelOptions
containing the options
for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Void> upgradeLanguageModel(UpgradeLanguageModelOptions upgradeLanguageModelOptions)
Initiates the upgrade of a custom language model to the latest version of its base language model. The upgrade method is asynchronous. It can take on the order of minutes to complete depending on the amount of data in the custom model and the current load on the service. A custom model must be in the `ready` or `available` state to be upgraded. You must use credentials for the instance of the service that owns a model to upgrade it.
The method returns an HTTP 200 response code to indicate that the upgrade process has begun successfully. You can monitor the status of the upgrade by using the [Get a custom language model](#getlanguagemodel) method to poll the model's status. The method returns a `LanguageModel` object that includes `status` and `progress` fields. Use a loop to check the status every 10 seconds.
While it is being upgraded, the custom model has the status `upgrading`. When the upgrade is complete, the model resumes the status that it had prior to upgrade. The service cannot accept subsequent requests for the model until the upgrade completes.
For custom models that are based on improved base language models, the [Train a custom language model](#trainlanguagemodel) method also performs an automatic upgrade to a newer version of the base model. You do not need to use the upgrade method.
**See also:** * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support) * [Upgrading a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-language) * [Upgrading custom language models that are based on improved next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-language-ng).
upgradeLanguageModelOptions
- the UpgradeLanguageModelOptions
containing the
options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Corpora> listCorpora(ListCorporaOptions listCorporaOptions)
Lists information about all corpora from a custom language model. The information includes the name, status, and total number of words for each corpus. _For custom models that are based on previous-generation models_, it also includes the number of out-of-vocabulary (OOV) words from the corpus. You must use credentials for the instance of the service that owns a model to list its corpora.
**See also:** [Listing corpora for a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageCorpora#listCorpora).
listCorporaOptions
- the ListCorporaOptions
containing the options for the callServiceCall
with a result of type Corpora
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> addCorpus(AddCorpusOptions addCorpusOptions)
Adds a single corpus text file of new training data to a custom language model. Use multiple requests to submit multiple corpus text files. You must use credentials for the instance of the service that owns a model to add a corpus to it. Adding a corpus does not affect the custom language model until you train the model for the new data by using the [Train a custom language model](#trainlanguagemodel) method.
Submit a plain text file that contains sample sentences from the domain of interest to enable the service to parse the words in context. The more sentences you add that represent the context in which speakers use words from the domain, the better the service's recognition accuracy.
The call returns an HTTP 201 response code if the corpus is valid. The service then asynchronously processes and automatically extracts data from the contents of the corpus. This operation can take on the order of minutes to complete depending on the current load on the service, the total number of words in the corpus, and, _for custom models that are based on previous-generation models_, the number of new (out-of-vocabulary) words in the corpus. You cannot submit requests to add additional resources to the custom model or to train the model until the service's analysis of the corpus for the current request completes. Use the [Get a corpus](#getcorpus) method to check the status of the analysis.
_For custom models that are based on large speech models_, the service parses and extracts word sequences from one or multiple corpora files. The characters help the service learn and predict character sequences from audio.
_For custom models that are based on previous-generation models_, the service auto-populates the model's words resource with words from the corpus that are not found in its base vocabulary. These words are referred to as out-of-vocabulary (OOV) words. After adding a corpus, you must validate the words resource to ensure that each OOV word's definition is complete and valid. You can use the [List custom words](#listwords) method to examine the words resource. You can use other words method to eliminate typos and modify how words are pronounced and displayed as needed.
To add a corpus file that has the same name as an existing corpus, set the `allow_overwrite` parameter to `true`; otherwise, the request fails. Overwriting an existing corpus causes the service to process the corpus text file and extract its data anew. _For a custom model that is based on a previous-generation model_, the service first removes any OOV words that are associated with the existing corpus from the model's words resource unless they were also added by another corpus or grammar, or they have been modified in some way with the [Add custom words](#addwords) or [Add a custom word](#addword) method.
The service limits the overall amount of data that you can add to a custom model to a maximum of 10 million total words from all sources combined. _For a custom model that is based on a previous-generation model_, you can add no more than 90 thousand custom (OOV) words to a model. This includes words that the service extracts from corpora and grammars, and words that you add directly.
**See also:** * [Add a corpus to the custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#addCorpus) * [Working with corpora for previous-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#workingCorpora) * [Working with corpora for large speech models and next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#workingCorpora-ng) * [Validating a words resource for previous-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#validateModel) * [Validating a words resource for large speech models and next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#validateModel-ng).
addCorpusOptions
- the AddCorpusOptions
containing the options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Corpus> getCorpus(GetCorpusOptions getCorpusOptions)
Gets information about a corpus from a custom language model. The information includes the name, status, and total number of words for the corpus. _For custom models that are based on previous-generation models_, it also includes the number of out-of-vocabulary (OOV) words from the corpus. You must use credentials for the instance of the service that owns a model to list its corpora.
**See also:** [Listing corpora for a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageCorpora#listCorpora).
getCorpusOptions
- the GetCorpusOptions
containing the options for the callServiceCall
with a result of type Corpus
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteCorpus(DeleteCorpusOptions deleteCorpusOptions)
Deletes an existing corpus from a custom language model. Removing a corpus does not affect the custom model until you train the model with the [Train a custom language model](#trainlanguagemodel) method. You must use credentials for the instance of the service that owns a model to delete its corpora.
_For custom models that are based on previous-generation models_, the service removes any out-of-vocabulary (OOV) words that are associated with the corpus from the custom model's words resource unless they were also added by another corpus or grammar, or they were modified in some way with the [Add custom words](#addwords) or [Add a custom word](#addword) method.
**See also:** [Deleting a corpus from a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageCorpora#deleteCorpus).
deleteCorpusOptions
- the DeleteCorpusOptions
containing the options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Words> listWords(ListWordsOptions listWordsOptions)
Lists information about custom words from a custom language model. You can list all words from the custom model's words resource, only custom words that were added or modified by the user, or, _for a custom model that is based on a previous-generation model_, only out-of-vocabulary (OOV) words that were extracted from corpora or are recognized by grammars. _For a custom model that is based on a next-generation model_, you can list all words or only those words that were added directly by a user, which return the same results.
You can also indicate the order in which the service is to return words; by default, the service lists words in ascending alphabetical order. You must use credentials for the instance of the service that owns a model to list information about its words.
**See also:** [Listing words from a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageWords#listWords).
listWordsOptions
- the ListWordsOptions
containing the options for the callServiceCall
with a result of type Words
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> addWords(AddWordsOptions addWordsOptions)
Adds one or more custom words to a custom language model. You can use this method to add words or to modify existing words in a custom model's words resource. _For custom models that are based on previous-generation models_, the service populates the words resource for a custom model with out-of-vocabulary (OOV) words from each corpus or grammar that is added to the model. You can use this method to modify OOV words in the model's words resource.
_For a custom model that is based on a previous-generation model_, the words resource for a model can contain a maximum of 90 thousand custom (OOV) words. This includes words that the service extracts from corpora and grammars and words that you add directly.
You must use credentials for the instance of the service that owns a model to add or modify custom words for the model. Adding or modifying custom words does not affect the custom model until you train the model for the new data by using the [Train a custom language model](#trainlanguagemodel) method.
You add custom words by providing a `CustomWords` object, which is an array of `CustomWord` objects, one per word. Use the object's `word` parameter to identify the word that is to be added. You can also provide one or both of the optional `display_as` or `sounds_like` fields for each word. * The `display_as` field provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in training data. For example, you might indicate that the word `IBM` is to be displayed as `IBM™`. * The `sounds_like` field provides an array of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users. Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on. For example, you might specify that the word `IEEE` can sound like `I triple E`. You can specify a maximum of five sounds-like pronunciations for a word. _For a custom model that is based on a previous-generation model_, if you omit the `sounds_like` field, the service attempts to set the field to its pronunciation of the word. It cannot generate a pronunciation for all words, so you must review the word's definition to ensure that it is complete and valid. * The `mapping_only` field provides parameter for custom words. You can use the 'mapping_only' key in custom words as a form of post processing. This key parameter has a boolean value to determine whether 'sounds_like' (for non-Japanese models) or word (for Japanese) is not used for the model fine-tuning, but for the replacement for 'display_as'. This feature helps you when you use custom words exclusively to map 'sounds_like' (or word) to 'display_as' value. When you use custom words solely for post-processing purposes that does not need fine-tuning.
If you add a custom word that already exists in the words resource for the custom model, the new definition overwrites the existing data for the word. If the service encounters an error with the input data, it returns a failure code and does not add any of the words to the words resource.
The call returns an HTTP 201 response code if the input data is valid. It then asynchronously processes the words to add them to the model's words resource. The time that it takes for the analysis to complete depends on the number of new words that you add but is generally faster than adding a corpus or grammar.
You can monitor the status of the request by using the [Get a custom language model](#getlanguagemodel) method to poll the model's status. Use a loop to check the status every 10 seconds. The method returns a `Customization` object that includes a `status` field. A status of `ready` means that the words have been added to the custom model. The service cannot accept requests to add new data or to train the model until the existing request completes.
You can use the [List custom words](#listwords) or [Get a custom word](#getword) method to review the words that you add. Words with an invalid `sounds_like` field include an `error` field that describes the problem. You can use other words-related methods to correct errors, eliminate typos, and modify how words are pronounced as needed.
**See also:** * [Add words to the custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#addWords) * [Working with custom words for previous-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#workingWords) * [Working with custom words for large speech models and next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#workingWords-ng) * [Validating a words resource for previous-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#validateModel) * [Validating a words resource for large speech models and next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#validateModel-ng).
addWordsOptions
- the AddWordsOptions
containing the options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Void> addWord(AddWordOptions addWordOptions)
Adds a custom word to a custom language model. You can use this method to add a word or to modify an existing word in the words resource. _For custom models that are based on previous-generation models_, the service populates the words resource for a custom model with out-of-vocabulary (OOV) words from each corpus or grammar that is added to the model. You can use this method to modify OOV words in the model's words resource.
_For a custom model that is based on a previous-generation models_, the words resource for a model can contain a maximum of 90 thousand custom (OOV) words. This includes words that the service extracts from corpora and grammars and words that you add directly.
You must use credentials for the instance of the service that owns a model to add or modify a custom word for the model. Adding or modifying a custom word does not affect the custom model until you train the model for the new data by using the [Train a custom language model](#trainlanguagemodel) method.
Use the `word_name` parameter to specify the custom word that is to be added or modified. Use the `CustomWord` object to provide one or both of the optional `display_as` or `sounds_like` fields for the word. * The `display_as` field provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in training data. For example, you might indicate that the word `IBM` is to be displayed as `IBM™`. * The `sounds_like` field provides an array of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users. Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on. For example, you might specify that the word `IEEE` can sound like `i triple e`. You can specify a maximum of five sounds-like pronunciations for a word. _For custom models that are based on previous-generation models_, if you omit the `sounds_like` field, the service attempts to set the field to its pronunciation of the word. It cannot generate a pronunciation for all words, so you must review the word's definition to ensure that it is complete and valid.
If you add a custom word that already exists in the words resource for the custom model, the new definition overwrites the existing data for the word. If the service encounters an error, it does not add the word to the words resource. Use the [Get a custom word](#getword) method to review the word that you add.
**See also:** * [Add words to the custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#addWords) * [Working with custom words for previous-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#workingWords) * [Working with custom words for large speech models and next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#workingWords-ng) * [Validating a words resource for previous-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#validateModel) * [Validating a words resource for large speech models and next-generation models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#validateModel-ng).
addWordOptions
- the AddWordOptions
containing the options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Word> getWord(GetWordOptions getWordOptions)
Gets information about a custom word from a custom language model. You must use credentials for the instance of the service that owns a model to list information about its words.
**See also:** [Listing words from a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageWords#listWords).
getWordOptions
- the GetWordOptions
containing the options for the callServiceCall
with a result of type Word
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteWord(DeleteWordOptions deleteWordOptions)
Deletes a custom word from a custom language model. You can remove any word that you added to the custom model's words resource via any means. However, if the word also exists in the service's base vocabulary, the service removes the word only from the words resource; the word remains in the base vocabulary. Removing a custom word does not affect the custom model until you train the model with the [Train a custom language model](#trainlanguagemodel) method. You must use credentials for the instance of the service that owns a model to delete its words.
**See also:** [Deleting a word from a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageWords#deleteWord).
deleteWordOptions
- the DeleteWordOptions
containing the options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Grammars> listGrammars(ListGrammarsOptions listGrammarsOptions)
Lists information about all grammars from a custom language model. For each grammar, the information includes the name, status, and (for grammars that are based on previous-generation models) the total number of out-of-vocabulary (OOV) words. You must use credentials for the instance of the service that owns a model to list its grammars.
**See also:** * [Listing grammars from a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageGrammars#listGrammars) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).
listGrammarsOptions
- the ListGrammarsOptions
containing the options for the callServiceCall
with a result of type Grammars
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> addGrammar(AddGrammarOptions addGrammarOptions)
Adds a single grammar file to a custom language model. Submit a plain text file in UTF-8 format that defines the grammar. Use multiple requests to submit multiple grammar files. You must use credentials for the instance of the service that owns a model to add a grammar to it. Adding a grammar does not affect the custom language model until you train the model for the new data by using the [Train a custom language model](#trainlanguagemodel) method.
The call returns an HTTP 201 response code if the grammar is valid. The service then asynchronously processes the contents of the grammar and automatically extracts new words that it finds. This operation can take a few seconds or minutes to complete depending on the size and complexity of the grammar, as well as the current load on the service. You cannot submit requests to add additional resources to the custom model or to train the model until the service's analysis of the grammar for the current request completes. Use the [Get a grammar](#getgrammar) method to check the status of the analysis.
_For grammars that are based on previous-generation models,_ the service populates the model's words resource with any word that is recognized by the grammar that is not found in the model's base vocabulary. These are referred to as out-of-vocabulary (OOV) words. You can use the [List custom words](#listwords) method to examine the words resource and use other words-related methods to eliminate typos and modify how words are pronounced as needed. _For grammars that are based on next-generation models,_ the service extracts no OOV words from the grammars.
To add a grammar that has the same name as an existing grammar, set the `allow_overwrite` parameter to `true`; otherwise, the request fails. Overwriting an existing grammar causes the service to process the grammar file and extract OOV words anew. Before doing so, it removes any OOV words associated with the existing grammar from the model's words resource unless they were also added by another resource or they have been modified in some way with the [Add custom words](#addwords) or [Add a custom word](#addword) method.
_For grammars that are based on previous-generation models,_ the service limits the overall amount of data that you can add to a custom model to a maximum of 10 million total words from all sources combined. Also, you can add no more than 90 thousand OOV words to a model. This includes words that the service extracts from corpora and grammars and words that you add directly.
**See also:** * [Understanding grammars](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarUnderstand#grammarUnderstand) * [Add a grammar to the custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarAdd#addGrammar) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).
addGrammarOptions
- the AddGrammarOptions
containing the options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Grammar> getGrammar(GetGrammarOptions getGrammarOptions)
Gets information about a grammar from a custom language model. For each grammar, the information includes the name, status, and (for grammars that are based on previous-generation models) the total number of out-of-vocabulary (OOV) words. You must use credentials for the instance of the service that owns a model to list its grammars.
**See also:** * [Listing grammars from a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageGrammars#listGrammars) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).
getGrammarOptions
- the GetGrammarOptions
containing the options for the callServiceCall
with a result of type Grammar
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteGrammar(DeleteGrammarOptions deleteGrammarOptions)
Deletes an existing grammar from a custom language model. _For grammars that are based on previous-generation models,_ the service removes any out-of-vocabulary (OOV) words associated with the grammar from the custom model's words resource unless they were also added by another resource or they were modified in some way with the [Add custom words](#addwords) or [Add a custom word](#addword) method. Removing a grammar does not affect the custom model until you train the model with the [Train a custom language model](#trainlanguagemodel) method. You must use credentials for the instance of the service that owns a model to delete its grammar.
**See also:** * [Deleting a grammar from a custom language model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageGrammars#deleteGrammar) * [Language support for customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).
deleteGrammarOptions
- the DeleteGrammarOptions
containing the options for the
callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<AcousticModel> createAcousticModel(CreateAcousticModelOptions createAcousticModelOptions)
Creates a new custom acoustic model for a specified base model. The custom acoustic model can be used only with the base model for which it is created. The model is owned by the instance of the service whose credentials are used to create it.
You can create a maximum of 1024 custom acoustic models per owning credentials. The service returns an error if you attempt to create more than 1024 models. You do not lose any models, but you cannot create any more until your model count is below the limit.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**Important:** Effective **31 July 2023**, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see [Migrating to large speech models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).
**See also:** [Create a custom acoustic model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acoustic#createModel-acoustic).
createAcousticModelOptions
- the CreateAcousticModelOptions
containing the options
for the callServiceCall
with a result of type AcousticModel
public com.ibm.cloud.sdk.core.http.ServiceCall<AcousticModels> listAcousticModels(ListAcousticModelsOptions listAcousticModelsOptions)
Lists information about all custom acoustic models that are owned by an instance of the service. Use the `language` parameter to see all custom acoustic models for the specified language. Omit the parameter to see all custom acoustic models for all languages. You must use credentials for the instance of the service that owns a model to list information about it.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Listing custom acoustic models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#listModels-acoustic).
listAcousticModelsOptions
- the ListAcousticModelsOptions
containing the options
for the callServiceCall
with a result of type AcousticModels
public com.ibm.cloud.sdk.core.http.ServiceCall<AcousticModels> listAcousticModels()
Lists information about all custom acoustic models that are owned by an instance of the service. Use the `language` parameter to see all custom acoustic models for the specified language. Omit the parameter to see all custom acoustic models for all languages. You must use credentials for the instance of the service that owns a model to list information about it.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Listing custom acoustic models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#listModels-acoustic).
ServiceCall
with a result of type AcousticModels
public com.ibm.cloud.sdk.core.http.ServiceCall<AcousticModel> getAcousticModel(GetAcousticModelOptions getAcousticModelOptions)
Gets information about a specified custom acoustic model. You must use credentials for the instance of the service that owns a model to list information about it.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Listing custom acoustic models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#listModels-acoustic).
getAcousticModelOptions
- the GetAcousticModelOptions
containing the options for
the callServiceCall
with a result of type AcousticModel
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteAcousticModel(DeleteAcousticModelOptions deleteAcousticModelOptions)
Deletes an existing custom acoustic model. The custom model cannot be deleted if another request, such as adding an audio resource to the model, is currently being processed. You must use credentials for the instance of the service that owns a model to delete it.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Deleting a custom acoustic model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#deleteModel-acoustic).
deleteAcousticModelOptions
- the DeleteAcousticModelOptions
containing the options
for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<TrainingResponse> trainAcousticModel(TrainAcousticModelOptions trainAcousticModelOptions)
Initiates the training of a custom acoustic model with new or changed audio resources. After adding or deleting audio resources for a custom acoustic model, use this method to begin the actual training of the model on the latest audio data. The custom acoustic model does not reflect its changed data until you train it. You must use credentials for the instance of the service that owns a model to train it.
The training method is asynchronous. Training time depends on the cumulative amount of audio data that the custom acoustic model contains and the current load on the service. When you train or retrain a model, the service uses all of the model's audio data in the training. Training a custom acoustic model takes approximately as long as the length of its cumulative audio data. For example, it takes approximately 2 hours to train a model that contains a total of 2 hours of audio. The method returns an HTTP 200 response code to indicate that the training process has begun.
You can monitor the status of the training by using the [Get a custom acoustic model](#getacousticmodel) method to poll the model's status. Use a loop to check the status once a minute. The method returns an `AcousticModel` object that includes `status` and `progress` fields. A status of `available` indicates that the custom model is trained and ready to use. The service cannot train a model while it is handling another request for the model. The service cannot accept subsequent training requests, or requests to add new audio resources, until the existing training request completes.
You can use the optional `custom_language_model_id` parameter to specify the GUID of a separately created custom language model that is to be used during training. Train with a custom language model if you have verbatim transcriptions of the audio files that you have added to the custom model or you have either corpora (text files) or a list of words that are relevant to the contents of the audio files. For training to succeed, both of the custom models must be based on the same version of the same base model, and the custom language model must be fully trained and available.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** * [Train the custom acoustic model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acoustic#trainModel-acoustic) * [Using custom acoustic and custom language models together](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-useBoth#useBoth)
### Training failures
Training can fail to start for the following reasons: * The service is currently handling another request for the custom model, such as another training request or a request to add audio resources to the model. * The custom model contains less than 10 minutes of audio that includes speech, not silence. * The custom model contains more than 50 hours of audio (for IBM Cloud) or more that 200 hours of audio (for IBM Cloud Pak for Data). **Note:** For IBM Cloud, the maximum hours of audio for a custom acoustic model was reduced from 200 to 50 hours in August and September 2022. For more information, see [Maximum hours of audio](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audioResources#audioMaximum). * You passed a custom language model with the `custom_language_model_id` query parameter that is not in the available state. A custom language model must be fully trained and available to be used to train a custom acoustic model. * You passed an incompatible custom language model with the `custom_language_model_id` query parameter. Both custom models must be based on the same version of the same base model. * The custom model contains one or more invalid audio resources. You can correct the invalid audio resources or set the `strict` parameter to `false` to exclude the invalid resources from the training. The model must contain at least one valid resource for training to succeed.
trainAcousticModelOptions
- the TrainAcousticModelOptions
containing the options
for the callServiceCall
with a result of type TrainingResponse
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> resetAcousticModel(ResetAcousticModelOptions resetAcousticModelOptions)
Resets a custom acoustic model by removing all audio resources from the model. Resetting a custom acoustic model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved, but the model's audio resources are removed and must be re-created. The service cannot reset a model while it is handling another request for the model. The service cannot accept subsequent requests for the model until the existing reset request completes. You must use credentials for the instance of the service that owns a model to reset it.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Resetting a custom acoustic model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#resetModel-acoustic).
resetAcousticModelOptions
- the ResetAcousticModelOptions
containing the options
for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Void> upgradeAcousticModel(UpgradeAcousticModelOptions upgradeAcousticModelOptions)
Initiates the upgrade of a custom acoustic model to the latest version of its base language model. The upgrade method is asynchronous. It can take on the order of minutes or hours to complete depending on the amount of data in the custom model and the current load on the service; typically, upgrade takes approximately twice the length of the total audio contained in the custom model. A custom model must be in the `ready` or `available` state to be upgraded. You must use credentials for the instance of the service that owns a model to upgrade it.
The method returns an HTTP 200 response code to indicate that the upgrade process has begun successfully. You can monitor the status of the upgrade by using the [Get a custom acoustic model](#getacousticmodel) method to poll the model's status. The method returns an `AcousticModel` object that includes `status` and `progress` fields. Use a loop to check the status once a minute.
While it is being upgraded, the custom model has the status `upgrading`. When the upgrade is complete, the model resumes the status that it had prior to upgrade. The service cannot upgrade a model while it is handling another request for the model. The service cannot accept subsequent requests for the model until the existing upgrade request completes.
If the custom acoustic model was trained with a separately created custom language model, you must use the `custom_language_model_id` parameter to specify the GUID of that custom language model. The custom language model must be upgraded before the custom acoustic model can be upgraded. Omit the parameter if the custom acoustic model was not trained with a custom language model.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Upgrading a custom acoustic model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-acoustic).
upgradeAcousticModelOptions
- the UpgradeAcousticModelOptions
containing the
options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<AudioResources> listAudio(ListAudioOptions listAudioOptions)
Lists information about all audio resources from a custom acoustic model. The information includes the name of the resource and information about its audio data, such as its duration. It also includes the status of the audio resource, which is important for checking the service's analysis of the resource in response to a request to add it to the custom acoustic model. You must use credentials for the instance of the service that owns a model to list its audio resources.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Listing audio resources for a custom acoustic model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAudio#listAudio).
listAudioOptions
- the ListAudioOptions
containing the options for the callServiceCall
with a result of type AudioResources
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> addAudio(AddAudioOptions addAudioOptions)
Adds an audio resource to a custom acoustic model. Add audio content that reflects the acoustic characteristics of the audio that you plan to transcribe. You must use credentials for the instance of the service that owns a model to add an audio resource to it. Adding audio data does not affect the custom acoustic model until you train the model for the new data by using the [Train a custom acoustic model](#trainacousticmodel) method.
You can add individual audio files or an archive file that contains multiple audio files. Adding multiple audio files via a single archive file is significantly more efficient than adding each file individually. You can add audio resources in any format that the service supports for speech recognition.
You can use this method to add any number of audio resources to a custom model by calling the method once for each audio or archive file. You can add multiple different audio resources at the same time. You must add a minimum of 10 minutes of audio that includes speech, not just silence, to a custom acoustic model before you can train it. No audio resource, audio- or archive-type, can be larger than 100 MB. To add an audio resource that has the same name as an existing audio resource, set the `allow_overwrite` parameter to `true`; otherwise, the request fails. A custom model can contain no more than 50 hours of audio (for IBM Cloud) or 200 hours of audio (for IBM Cloud Pak for Data). **Note:** For IBM Cloud, the maximum hours of audio for a custom acoustic model was reduced from 200 to 50 hours in August and September 2022. For more information, see [Maximum hours of audio](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audioResources#audioMaximum).
The method is asynchronous. It can take several seconds or minutes to complete depending on the duration of the audio and, in the case of an archive file, the total number of audio files being processed. The service returns a 201 response code if the audio is valid. It then asynchronously analyzes the contents of the audio file or files and automatically extracts information about the audio such as its length, sampling rate, and encoding. You cannot submit requests to train or upgrade the model until the service's analysis of all audio resources for current requests completes.
To determine the status of the service's analysis of the audio, use the [Get an audio resource](#getaudio) method to poll the status of the audio. The method accepts the customization ID of the custom model and the name of the audio resource, and it returns the status of the resource. Use a loop to check the status of the audio every few seconds until it becomes `ok`.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Add audio to the custom acoustic model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acoustic#addAudio).
### Content types for audio-type resources
You can add an individual audio file in any format that the service supports for speech recognition. For an audio-type resource, use the `Content-Type` parameter to specify the audio format (MIME type) of the audio file, including specifying the sampling rate, channels, and endianness where indicated. * `audio/alaw` (Specify the sampling rate (`rate`) of the audio.) * `audio/basic` (Use only with narrowband models.) * `audio/flac` * `audio/g729` (Use only with narrowband models.) * `audio/l16` (Specify the sampling rate (`rate`) and optionally the number of channels (`channels`) and endianness (`endianness`) of the audio.) * `audio/mp3` * `audio/mpeg` * `audio/mulaw` (Specify the sampling rate (`rate`) of the audio.) * `audio/ogg` (The service automatically detects the codec of the input audio.) * `audio/ogg;codecs=opus` * `audio/ogg;codecs=vorbis` * `audio/wav` (Provide audio with a maximum of nine channels.) * `audio/webm` (The service automatically detects the codec of the input audio.) * `audio/webm;codecs=opus` * `audio/webm;codecs=vorbis`
The sampling rate of an audio file must match the sampling rate of the base model for the custom model: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the service labels the audio file as `invalid`.
**See also:** [Supported audio formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).
### Content types for archive-type resources
You can add an archive file (**.zip** or **.tar.gz** file) that contains audio files in any format that the service supports for speech recognition. For an archive-type resource, use the `Content-Type` parameter to specify the media type of the archive file: * `application/zip` for a **.zip** file * `application/gzip` for a **.tar.gz** file.
When you add an archive-type resource, the `Contained-Content-Type` header is optional depending on the format of the files that you are adding: * For audio files of type `audio/alaw`, `audio/basic`, `audio/l16`, or `audio/mulaw`, you must use the `Contained-Content-Type` header to specify the format of the contained audio files. Include the `rate`, `channels`, and `endianness` parameters where necessary. In this case, all audio files contained in the archive file must have the same audio format. * For audio files of all other types, you can omit the `Contained-Content-Type` header. In this case, the audio files contained in the archive file can have any of the formats not listed in the previous bullet. The audio files do not need to have the same format.
Do not use the `Contained-Content-Type` header when adding an audio-type resource.
### Naming restrictions for embedded audio files
The name of an audio file that is contained in an archive-type resource can include a maximum of 128 characters. This includes the file extension and all elements of the name (for example, slashes).
addAudioOptions
- the AddAudioOptions
containing the options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<AudioListing> getAudio(GetAudioOptions getAudioOptions)
Gets information about an audio resource from a custom acoustic model. The method returns an `AudioListing` object whose fields depend on the type of audio resource that you specify with the method's `audio_name` parameter: * _For an audio-type resource_, the object's fields match those of an `AudioResource` object: `duration`, `name`, `details`, and `status`. * _For an archive-type resource_, the object includes a `container` field whose fields match those of an `AudioResource` object. It also includes an `audio` field, which contains an array of `AudioResource` objects that provides information about the audio files that are contained in the archive.
The information includes the status of the specified audio resource. The status is important for checking the service's analysis of a resource that you add to the custom model. * _For an audio-type resource_, the `status` field is located in the `AudioListing` object. * _For an archive-type resource_, the `status` field is located in the `AudioResource` object that is returned in the `container` field.
You must use credentials for the instance of the service that owns a model to list its audio resources.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Listing audio resources for a custom acoustic model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAudio#listAudio).
getAudioOptions
- the GetAudioOptions
containing the options for the callServiceCall
with a result of type AudioListing
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteAudio(DeleteAudioOptions deleteAudioOptions)
Deletes an existing audio resource from a custom acoustic model. Deleting an archive-type audio resource removes the entire archive of files. The service does not allow deletion of individual files from an archive resource.
Removing an audio resource does not affect the custom model until you train the model on its updated data by using the [Train a custom acoustic model](#trainacousticmodel) method. You can delete an existing audio resource from a model while a different resource is being added to the model. You must use credentials for the instance of the service that owns a model to delete its audio resources.
**Note:** Acoustic model customization is supported only for use with previous-generation models. It is not supported for large speech models and next-generation models.
**See also:** [Deleting an audio resource from a custom acoustic model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAudio#deleteAudio).
deleteAudioOptions
- the DeleteAudioOptions
containing the options for the callServiceCall
with a void resultpublic com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteUserData(DeleteUserDataOptions deleteUserDataOptions)
Deletes all data that is associated with a specified customer ID. The method deletes all data for the customer ID, regardless of the method by which the information was added. The method has no effect if no data is associated with the customer ID. You must issue the request with credentials for the same instance of the service that was used to associate the customer ID with the data. You associate a customer ID with data by passing the `X-Watson-Metadata` header with a request that passes the data.
**Note:** If you delete an instance of the service from the service console, all data associated with that service instance is automatically deleted. This includes all custom language models, corpora, grammars, and words; all custom acoustic models and audio resources; all registered endpoints for the asynchronous HTTP interface; and all data related to speech recognition requests.
**See also:** [Information security](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-information-security#information-security).
deleteUserDataOptions
- the DeleteUserDataOptions
containing the options for the
callServiceCall
with a void resultCopyright © 2024 IBM Cloud. All rights reserved.