Class TextToSpeech
public class TextToSpeech
extends com.ibm.cloud.sdk.core.service.BaseService
For speech synthesis, the service supports a synchronous HTTP Representational State Transfer (REST) interface and a WebSocket interface. Both interfaces support plain text and SSML input. SSML is an XML-based markup language that provides text annotation for speech-synthesis applications. The WebSocket interface also supports the SSML <code><mark></code> element and word timings.
The service offers a customization interface that you can use to define sounds-like or phonetic translations for words. A sounds-like translation consists of one or more words that, when combined, sound like the word. A phonetic translation is based on the SSML phoneme format for representing a word. You can specify a phonetic translation in standard International Phonetic Alphabet (IPA) representation or in the proprietary IBM Symbolic Phonetic Representation (SPR).
The service also offers a Tune by Example feature that lets you define custom prompts. You can also define speaker models to improve the quality of your custom prompts. The service support custom prompts only for US English custom models and voices.
**IBM Cloud®.** The Arabic, Chinese, Dutch, Australian English, and Korean languages and voices are supported only for IBM Cloud. For phonetic translation, they support only IPA, not SPR.
API Version: 1.0.0 See: https://cloud.ibm.com/docs/text-to-speech
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_SERVICE_NAME
static String
DEFAULT_SERVICE_URL
Fields inherited from class com.ibm.cloud.sdk.core.service.BaseService
PROPNAME_DISABLE_SSL, PROPNAME_ENABLE_GZIP, PROPNAME_URL
-
Constructor Summary
Constructors Constructor Description TextToSpeech()
Constructs an instance of the `TextToSpeech` client.TextToSpeech(com.ibm.cloud.sdk.core.security.Authenticator authenticator)
Constructs an instance of the `TextToSpeech` client.TextToSpeech(String serviceName)
Constructs an instance of the `TextToSpeech` client.TextToSpeech(String serviceName, com.ibm.cloud.sdk.core.security.Authenticator authenticator)
Constructs an instance of the `TextToSpeech` client. -
Method Summary
Modifier and Type Method Description com.ibm.cloud.sdk.core.http.ServiceCall<Prompt>
addCustomPrompt(AddCustomPromptOptions addCustomPromptOptions)
Add a custom prompt.com.ibm.cloud.sdk.core.http.ServiceCall<Void>
addWord(AddWordOptions addWordOptions)
Add a custom word.com.ibm.cloud.sdk.core.http.ServiceCall<Void>
addWords(AddWordsOptions addWordsOptions)
Add custom words.com.ibm.cloud.sdk.core.http.ServiceCall<CustomModel>
createCustomModel(CreateCustomModelOptions createCustomModelOptions)
Create a custom model.com.ibm.cloud.sdk.core.http.ServiceCall<SpeakerModel>
createSpeakerModel(CreateSpeakerModelOptions createSpeakerModelOptions)
Create a speaker model.com.ibm.cloud.sdk.core.http.ServiceCall<Void>
deleteCustomModel(DeleteCustomModelOptions deleteCustomModelOptions)
Delete a custom model.com.ibm.cloud.sdk.core.http.ServiceCall<Void>
deleteCustomPrompt(DeleteCustomPromptOptions deleteCustomPromptOptions)
Delete a custom prompt.com.ibm.cloud.sdk.core.http.ServiceCall<Void>
deleteSpeakerModel(DeleteSpeakerModelOptions deleteSpeakerModelOptions)
Delete a speaker model.com.ibm.cloud.sdk.core.http.ServiceCall<Void>
deleteUserData(DeleteUserDataOptions deleteUserDataOptions)
Delete labeled data.com.ibm.cloud.sdk.core.http.ServiceCall<Void>
deleteWord(DeleteWordOptions deleteWordOptions)
Delete a custom word.com.ibm.cloud.sdk.core.http.ServiceCall<CustomModel>
getCustomModel(GetCustomModelOptions getCustomModelOptions)
Get a custom model.com.ibm.cloud.sdk.core.http.ServiceCall<Prompt>
getCustomPrompt(GetCustomPromptOptions getCustomPromptOptions)
Get a custom prompt.com.ibm.cloud.sdk.core.http.ServiceCall<Pronunciation>
getPronunciation(GetPronunciationOptions getPronunciationOptions)
Get pronunciation.com.ibm.cloud.sdk.core.http.ServiceCall<SpeakerCustomModels>
getSpeakerModel(GetSpeakerModelOptions getSpeakerModelOptions)
Get a speaker model.com.ibm.cloud.sdk.core.http.ServiceCall<Voice>
getVoice(GetVoiceOptions getVoiceOptions)
Get a voice.com.ibm.cloud.sdk.core.http.ServiceCall<Translation>
getWord(GetWordOptions getWordOptions)
Get a custom word.com.ibm.cloud.sdk.core.http.ServiceCall<CustomModels>
listCustomModels()
List custom models.com.ibm.cloud.sdk.core.http.ServiceCall<CustomModels>
listCustomModels(ListCustomModelsOptions listCustomModelsOptions)
List custom models.com.ibm.cloud.sdk.core.http.ServiceCall<Prompts>
listCustomPrompts(ListCustomPromptsOptions listCustomPromptsOptions)
List custom prompts.com.ibm.cloud.sdk.core.http.ServiceCall<Speakers>
listSpeakerModels()
List speaker models.com.ibm.cloud.sdk.core.http.ServiceCall<Speakers>
listSpeakerModels(ListSpeakerModelsOptions listSpeakerModelsOptions)
List speaker models.com.ibm.cloud.sdk.core.http.ServiceCall<Voices>
listVoices()
List voices.com.ibm.cloud.sdk.core.http.ServiceCall<Voices>
listVoices(ListVoicesOptions listVoicesOptions)
List voices.com.ibm.cloud.sdk.core.http.ServiceCall<Words>
listWords(ListWordsOptions listWordsOptions)
List custom words.com.ibm.cloud.sdk.core.http.ServiceCall<InputStream>
synthesize(SynthesizeOptions synthesizeOptions)
Synthesize audio.okhttp3.WebSocket
synthesizeUsingWebSocket(SynthesizeOptions synthesizeOptions, SynthesizeCallback callback)
Synthesize audio.com.ibm.cloud.sdk.core.http.ServiceCall<Void>
updateCustomModel(UpdateCustomModelOptions updateCustomModelOptions)
Update a custom model.Methods inherited from class com.ibm.cloud.sdk.core.service.BaseService
configureClient, configureService, constructServiceUrl, constructServiceURL, enableGzipCompression, getAuthenticator, getClient, getEndPoint, getName, getServiceUrl, isJsonMimeType, isJsonPatchMimeType, setClient, setDefaultHeaders, setEndPoint, setServiceUrl, toString
-
Field Details
-
DEFAULT_SERVICE_NAME
- See Also:
- Constant Field Values
-
DEFAULT_SERVICE_URL
- See Also:
- Constant Field Values
-
-
Constructor Details
-
TextToSpeech
public TextToSpeech()Constructs an instance of the `TextToSpeech` client. The default service name is used to configure the client instance. -
TextToSpeech
public TextToSpeech(com.ibm.cloud.sdk.core.security.Authenticator authenticator)Constructs an instance of the `TextToSpeech` client. The default service name and specified authenticator are used to configure the client instance.- Parameters:
authenticator
- theAuthenticator
instance to be configured for this client
-
TextToSpeech
Constructs an instance of the `TextToSpeech` client. The specified service name is used to configure the client instance.- Parameters:
serviceName
- the service name to be used when configuring the client instance
-
TextToSpeech
public TextToSpeech(String serviceName, com.ibm.cloud.sdk.core.security.Authenticator authenticator)Constructs an instance of the `TextToSpeech` client. The specified service name and authenticator are used to configure the client instance.- Parameters:
serviceName
- the service name to be used when configuring the client instanceauthenticator
- theAuthenticator
instance to be configured for this client
-
-
Method Details
-
listVoices
public com.ibm.cloud.sdk.core.http.ServiceCall<Voices> listVoices(ListVoicesOptions listVoicesOptions)List voices.Lists all voices available for use with the service. The information includes the name, language, gender, and other details about the voice. The ordering of the list of voices can change from call to call; do not rely on an alphabetized or static list of voices. To see information about a specific voice, use the [Get a voice](#getvoice).
**See also:** [Listing all available voices](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices#listVoices).
- Parameters:
listVoicesOptions
- theListVoicesOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeVoices
-
listVoices
List voices.Lists all voices available for use with the service. The information includes the name, language, gender, and other details about the voice. The ordering of the list of voices can change from call to call; do not rely on an alphabetized or static list of voices. To see information about a specific voice, use the [Get a voice](#getvoice).
**See also:** [Listing all available voices](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices#listVoices).
- Returns:
- a
ServiceCall
with a result of typeVoices
-
getVoice
Get a voice.Gets information about the specified voice. The information includes the name, language, gender, and other details about the voice. Specify a customization ID to obtain information for a custom model that is defined for the language of the specified voice. To list information about all available voices, use the [List voices](#listvoices) method.
**See also:** [Listing a specific voice](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices#listVoice).
### Important voice updates for IBM Cloud
The service's voices underwent significant change on 2 December 2020. * The Arabic, Chinese, Dutch, Australian English, and Korean voices are now neural instead of concatenative. * The `ar-AR_OmarVoice` voice is deprecated. Use `ar-MS_OmarVoice` voice instead. * The `ar-AR` language identifier cannot be used to create a custom model. Use the `ar-MS` identifier instead. * The standard concatenative voices for the following languages are now deprecated: Brazilian Portuguese, United Kingdom and United States English, French, German, Italian, Japanese, and Spanish (all dialects). * The features expressive SSML, voice transformation SSML, and use of the `volume` attribute of the `<prosody>` element are deprecated and are not supported with any of the service's neural voices. * All of the service's voices are now customizable and generally available (GA) for production use.
The deprecated voices and features will continue to function for at least one year but might be removed at a future date. You are encouraged to migrate to the equivalent neural voices at your earliest convenience. For more information about all voice updates, see the [2 December 2020 service update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#December2020) in the release notes for IBM Cloud.
- Parameters:
getVoiceOptions
- theGetVoiceOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeVoice
-
synthesize
public com.ibm.cloud.sdk.core.http.ServiceCall<InputStream> synthesize(SynthesizeOptions synthesizeOptions)Synthesize audio.Synthesizes text to audio that is spoken in the specified voice. The service bases its understanding of the language for the input text on the specified voice. Use a voice that matches the language of the input text.
The method accepts a maximum of 5 KB of input text in the body of the request, and 8 KB for the URL and headers. The 5 KB limit includes any SSML tags that you specify. The service returns the synthesized audio stream as an array of bytes.
**See also:** [The HTTP interface](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-usingHTTP#usingHTTP).
### Audio formats (accept types)
The service can return audio in the following formats (MIME types). * Where indicated, you can optionally specify the sampling rate (`rate`) of the audio. You must specify a sampling rate for the `audio/l16` and `audio/mulaw` formats. A specified sampling rate must lie in the range of 8 kHz to 192 kHz. Some formats restrict the sampling rate to certain values, as noted. * For the `audio/l16` format, you can optionally specify the endianness (`endianness`) of the audio: `endianness=big-endian` or `endianness=little-endian`.
Use the `Accept` header or the `accept` parameter to specify the requested format of the response audio. If you omit an audio format altogether, the service returns the audio in Ogg format with the Opus codec (`audio/ogg;codecs=opus`). The service always returns single-channel audio. * `audio/basic` - The service returns audio with a sampling rate of 8000 Hz. * `audio/flac` - You can optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz. * `audio/l16` - You must specify the `rate` of the audio. You can optionally specify the `endianness` of the audio. The default endianness is `little-endian`. * `audio/mp3` - You can optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz. * `audio/mpeg` - You can optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz. * `audio/mulaw` - You must specify the `rate` of the audio. * `audio/ogg` - The service returns the audio in the `vorbis` codec. You can optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz. * `audio/ogg;codecs=opus` - You can optionally specify the `rate` of the audio. Only the following values are valid sampling rates: `48000`, `24000`, `16000`, `12000`, or `8000`. If you specify a value other than one of these, the service returns an error. The default sampling rate is 48,000 Hz. * `audio/ogg;codecs=vorbis` - You can optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz. * `audio/wav` - You can optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz. * `audio/webm` - The service returns the audio in the `opus` codec. The service returns audio with a sampling rate of 48,000 Hz. * `audio/webm;codecs=opus` - The service returns audio with a sampling rate of 48,000 Hz. * `audio/webm;codecs=vorbis` - You can optionally specify the `rate` of the audio. The default sampling rate is 22,050 Hz.
For more information about specifying an audio format, including additional details about some of the formats, see [Using audio formats](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-audio-formats).
### Important voice updates for IBM Cloud
The service's voices underwent significant change on 2 December 2020. * The Arabic, Chinese, Dutch, Australian English, and Korean voices are now neural instead of concatenative. * The `ar-AR_OmarVoice` voice is deprecated. Use `ar-MS_OmarVoice` voice instead. * The `ar-AR` language identifier cannot be used to create a custom model. Use the `ar-MS` identifier instead. * The standard concatenative voices for the following languages are now deprecated: Brazilian Portuguese, United Kingdom and United States English, French, German, Italian, Japanese, and Spanish (all dialects). * The features expressive SSML, voice transformation SSML, and use of the `volume` attribute of the `<prosody>` element are deprecated and are not supported with any of the service's neural voices. * All of the service's voices are now customizable and generally available (GA) for production use.
The deprecated voices and features will continue to function for at least one year but might be removed at a future date. You are encouraged to migrate to the equivalent neural voices at your earliest convenience. For more information about all voice updates, see the [2 December 2020 service update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#December2020) in the release notes for IBM Cloud.
### Warning messages
If a request includes invalid query parameters, the service returns a `Warnings` response header that provides messages about the invalid parameters. The warning includes a descriptive message and a list of invalid argument strings. For example, a message such as `"Unknown arguments:"` or `"Unknown url query arguments:"` followed by a list of the form `"{invalid_arg_1}, {invalid_arg_2}."` The request succeeds despite the warnings.
- Parameters:
synthesizeOptions
- theSynthesizeOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeInputStream
-
synthesizeUsingWebSocket
public okhttp3.WebSocket synthesizeUsingWebSocket(SynthesizeOptions synthesizeOptions, SynthesizeCallback callback)Synthesize audio.Synthesizes text to audio that is spoken in the specified voice. The service bases its understanding of the language for the input text on the specified voice. Use a voice that matches the language of the input text.
The method accepts a maximum of 5 KB of input text in the body of the request, and 8 KB for the URL and headers. The 5 KB limit includes any SSML tags that you specify. The service returns the synthesized audio stream as an array of bytes.
### Audio formats (accept types)
For more information about specifying an audio format, including additional details about some of the formats, see [Audio formats](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-audioFormats#audioFormats).
- Parameters:
synthesizeOptions
- theSynthesizeOptions
containing the options for the callcallback
- theSynthesizeCallback
callback- Returns:
- a
WebSocket
instance
-
getPronunciation
public com.ibm.cloud.sdk.core.http.ServiceCall<Pronunciation> getPronunciation(GetPronunciationOptions getPronunciationOptions)Get pronunciation.Gets the phonetic pronunciation for the specified word. You can request the pronunciation for a specific format. You can also request the pronunciation for a specific voice to see the default translation for the language of that voice or for a specific custom model to see the translation for that model.
**See also:** [Querying a word from a language](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuWordsQueryLanguage).
### Important voice updates for IBM Cloud
The service's voices underwent significant change on 2 December 2020. * The Arabic, Chinese, Dutch, Australian English, and Korean voices are now neural instead of concatenative. * The `ar-AR_OmarVoice` voice is deprecated. Use `ar-MS_OmarVoice` voice instead. * The `ar-AR` language identifier cannot be used to create a custom model. Use the `ar-MS` identifier instead. * The standard concatenative voices for the following languages are now deprecated: Brazilian Portuguese, United Kingdom and United States English, French, German, Italian, Japanese, and Spanish (all dialects). * The features expressive SSML, voice transformation SSML, and use of the `volume` attribute of the `<prosody>` element are deprecated and are not supported with any of the service's neural voices. * All of the service's voices are now customizable and generally available (GA) for production use.
The deprecated voices and features will continue to function for at least one year but might be removed at a future date. You are encouraged to migrate to the equivalent neural voices at your earliest convenience. For more information about all voice updates, see the [2 December 2020 service update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#December2020) in the release notes for IBM Cloud.
- Parameters:
getPronunciationOptions
- theGetPronunciationOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typePronunciation
-
createCustomModel
public com.ibm.cloud.sdk.core.http.ServiceCall<CustomModel> createCustomModel(CreateCustomModelOptions createCustomModelOptions)Create a custom model.Creates a new empty custom model. You must specify a name for the new custom model. You can optionally specify the language and a description for the new model. The model is owned by the instance of the service whose credentials are used to create it.
**See also:** [Creating a custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customModels#cuModelsCreate).
### Important voice updates for IBM Cloud
The service's voices underwent significant change on 2 December 2020. * The Arabic, Chinese, Dutch, Australian English, and Korean voices are now neural instead of concatenative. * The `ar-AR_OmarVoice` voice is deprecated. Use `ar-MS_OmarVoice` voice instead. * The `ar-AR` language identifier cannot be used to create a custom model. Use the `ar-MS` identifier instead. * The standard concatenative voices for the following languages are now deprecated: Brazilian Portuguese, United Kingdom and United States English, French, German, Italian, Japanese, and Spanish (all dialects). * The features expressive SSML, voice transformation SSML, and use of the `volume` attribute of the `<prosody>` element are deprecated and are not supported with any of the service's neural voices. * All of the service's voices are now customizable and generally available (GA) for production use.
The deprecated voices and features will continue to function for at least one year but might be removed at a future date. You are encouraged to migrate to the equivalent neural voices at your earliest convenience. For more information about all voice updates, see the [2 December 2020 service update](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-release-notes#December2020) in the release notes for IBM Cloud.
- Parameters:
createCustomModelOptions
- theCreateCustomModelOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeCustomModel
-
listCustomModels
public com.ibm.cloud.sdk.core.http.ServiceCall<CustomModels> listCustomModels(ListCustomModelsOptions listCustomModelsOptions)List custom models.Lists metadata such as the name and description for all custom models that are owned by an instance of the service. Specify a language to list the custom models for that language only. To see the words and prompts in addition to the metadata for a specific custom model, use the [Get a custom model](#getcustommodel) method. You must use credentials for the instance of the service that owns a model to list information about it.
**See also:** [Querying all custom models](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customModels#cuModelsQueryAll).
- Parameters:
listCustomModelsOptions
- theListCustomModelsOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeCustomModels
-
listCustomModels
List custom models.Lists metadata such as the name and description for all custom models that are owned by an instance of the service. Specify a language to list the custom models for that language only. To see the words and prompts in addition to the metadata for a specific custom model, use the [Get a custom model](#getcustommodel) method. You must use credentials for the instance of the service that owns a model to list information about it.
**See also:** [Querying all custom models](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customModels#cuModelsQueryAll).
- Returns:
- a
ServiceCall
with a result of typeCustomModels
-
updateCustomModel
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> updateCustomModel(UpdateCustomModelOptions updateCustomModelOptions)Update a custom model.Updates information for the specified custom model. You can update metadata such as the name and description of the model. You can also update the words in the model and their translations. Adding a new translation for a word that already exists in a custom model overwrites the word's existing translation. A custom model can contain no more than 20,000 entries. You must use credentials for the instance of the service that owns a model to update it.
You can define sounds-like or phonetic translations for words. A sounds-like translation consists of one or more words that, when combined, sound like the word. Phonetic translations are based on the SSML phoneme format for representing a word. You can specify them in standard International Phonetic Alphabet (IPA) representation
<code><phoneme alphabet="ipa" ph="təmˈɑto"></phoneme></code>
or in the proprietary IBM Symbolic Phonetic Representation (SPR)
<code><phoneme alphabet="ibm" ph="1gAstroEntxrYFXs"></phoneme></code>
**See also:** * [Updating a custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customModels#cuModelsUpdate) * [Adding words to a Japanese custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuJapaneseAdd) * [Understanding customization](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customIntro#customIntro).
- Parameters:
updateCustomModelOptions
- theUpdateCustomModelOptions
containing the options for the call- Returns:
- a
ServiceCall
with a void result
-
getCustomModel
public com.ibm.cloud.sdk.core.http.ServiceCall<CustomModel> getCustomModel(GetCustomModelOptions getCustomModelOptions)Get a custom model.Gets all information about a specified custom model. In addition to metadata such as the name and description of the custom model, the output includes the words and their translations that are defined for the model, as well as any prompts that are defined for the model. To see just the metadata for a model, use the [List custom models](#listcustommodels) method.
**See also:** [Querying a custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customModels#cuModelsQuery).
- Parameters:
getCustomModelOptions
- theGetCustomModelOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeCustomModel
-
deleteCustomModel
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteCustomModel(DeleteCustomModelOptions deleteCustomModelOptions)Delete a custom model.Deletes the specified custom model. You must use credentials for the instance of the service that owns a model to delete it.
**See also:** [Deleting a custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customModels#cuModelsDelete).
- Parameters:
deleteCustomModelOptions
- theDeleteCustomModelOptions
containing the options for the call- Returns:
- a
ServiceCall
with a void result
-
addWords
Add custom words.Adds one or more words and their translations to the specified custom model. Adding a new translation for a word that already exists in a custom model overwrites the word's existing translation. A custom model can contain no more than 20,000 entries. You must use credentials for the instance of the service that owns a model to add words to it.
You can define sounds-like or phonetic translations for words. A sounds-like translation consists of one or more words that, when combined, sound like the word. Phonetic translations are based on the SSML phoneme format for representing a word. You can specify them in standard International Phonetic Alphabet (IPA) representation
<code><phoneme alphabet="ipa" ph="təmˈɑto"></phoneme></code>
or in the proprietary IBM Symbolic Phonetic Representation (SPR)
<code><phoneme alphabet="ibm" ph="1gAstroEntxrYFXs"></phoneme></code>
**See also:** * [Adding multiple words to a custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuWordsAdd) * [Adding words to a Japanese custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuJapaneseAdd) * [Understanding customization](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customIntro#customIntro).
- Parameters:
addWordsOptions
- theAddWordsOptions
containing the options for the call- Returns:
- a
ServiceCall
with a void result
-
listWords
List custom words.Lists all of the words and their translations for the specified custom model. The output shows the translations as they are defined in the model. You must use credentials for the instance of the service that owns a model to list its words.
**See also:** [Querying all words from a custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuWordsQueryModel).
- Parameters:
listWordsOptions
- theListWordsOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeWords
-
addWord
Add a custom word.Adds a single word and its translation to the specified custom model. Adding a new translation for a word that already exists in a custom model overwrites the word's existing translation. A custom model can contain no more than 20,000 entries. You must use credentials for the instance of the service that owns a model to add a word to it.
You can define sounds-like or phonetic translations for words. A sounds-like translation consists of one or more words that, when combined, sound like the word. Phonetic translations are based on the SSML phoneme format for representing a word. You can specify them in standard International Phonetic Alphabet (IPA) representation
<code><phoneme alphabet="ipa" ph="təmˈɑto"></phoneme></code>
or in the proprietary IBM Symbolic Phonetic Representation (SPR)
<code><phoneme alphabet="ibm" ph="1gAstroEntxrYFXs"></phoneme></code>
**See also:** * [Adding a single word to a custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuWordAdd) * [Adding words to a Japanese custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuJapaneseAdd) * [Understanding customization](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customIntro#customIntro).
- Parameters:
addWordOptions
- theAddWordOptions
containing the options for the call- Returns:
- a
ServiceCall
with a void result
-
getWord
Get a custom word.Gets the translation for a single word from the specified custom model. The output shows the translation as it is defined in the model. You must use credentials for the instance of the service that owns a model to list its words.
**See also:** [Querying a single word from a custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuWordQueryModel).
- Parameters:
getWordOptions
- theGetWordOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeTranslation
-
deleteWord
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteWord(DeleteWordOptions deleteWordOptions)Delete a custom word.Deletes a single word from the specified custom model. You must use credentials for the instance of the service that owns a model to delete its words.
**See also:** [Deleting a word from a custom model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-customWords#cuWordDelete).
- Parameters:
deleteWordOptions
- theDeleteWordOptions
containing the options for the call- Returns:
- a
ServiceCall
with a void result
-
listCustomPrompts
public com.ibm.cloud.sdk.core.http.ServiceCall<Prompts> listCustomPrompts(ListCustomPromptsOptions listCustomPromptsOptions)List custom prompts.Lists information about all custom prompts that are defined for a custom model. The information includes the prompt ID, prompt text, status, and optional speaker ID for each prompt of the custom model. You must use credentials for the instance of the service that owns the custom model. The same information about all of the prompts for a custom model is also provided by the [Get a custom model](#getcustommodel) method. That method provides complete details about a specified custom model, including its language, owner, custom words, and more. Custom prompts are supported only for use with US English custom models and voices.
**See also:** [Listing custom prompts](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-custom-prompts#tbe-custom-prompts-list).
- Parameters:
listCustomPromptsOptions
- theListCustomPromptsOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typePrompts
-
addCustomPrompt
public com.ibm.cloud.sdk.core.http.ServiceCall<Prompt> addCustomPrompt(AddCustomPromptOptions addCustomPromptOptions)Add a custom prompt.Adds a custom prompt to a custom model. A prompt is defined by the text that is to be spoken, the audio for that text, a unique user-specified ID for the prompt, and an optional speaker ID. The information is used to generate prosodic data that is not visible to the user. This data is used by the service to produce the synthesized audio upon request. You must use credentials for the instance of the service that owns a custom model to add a prompt to it. You can add a maximum of 1000 custom prompts to a single custom model.
You are recommended to assign meaningful values for prompt IDs. For example, use `goodbye` to identify a prompt that speaks a farewell message. Prompt IDs must be unique within a given custom model. You cannot define two prompts with the same name for the same custom model. If you provide the ID of an existing prompt, the previously uploaded prompt is replaced by the new information. The existing prompt is reprocessed by using the new text and audio and, if provided, new speaker model, and the prosody data associated with the prompt is updated.
The quality of a prompt is undefined if the language of a prompt does not match the language of its custom model. This is consistent with any text or SSML that is specified for a speech synthesis request. The service makes a best-effort attempt to render the specified text for the prompt; it does not validate that the language of the text matches the language of the model.
Adding a prompt is an asynchronous operation. Although it accepts less audio than speaker enrollment, the service must align the audio with the provided text. The time that it takes to process a prompt depends on the prompt itself. The processing time for a reasonably sized prompt generally matches the length of the audio (for example, it takes 20 seconds to process a 20-second prompt).
For shorter prompts, you can wait for a reasonable amount of time and then check the status of the prompt with the [Get a custom prompt](#getcustomprompt) method. For longer prompts, consider using that method to poll the service every few seconds to determine when the prompt becomes available. No prompt can be used for speech synthesis if it is in the `processing` or `failed` state. Only prompts that are in the `available` state can be used for speech synthesis.
When it processes a request, the service attempts to align the text and the audio that are provided for the prompt. The text that is passed with a prompt must match the spoken audio as closely as possible. Optimally, the text and audio match exactly. The service does its best to align the specified text with the audio, and it can often compensate for mismatches between the two. But if the service cannot effectively align the text and the audio, possibly because the magnitude of mismatches between the two is too great, processing of the prompt fails.
### Evaluating a prompt
Always listen to and evaluate a prompt to determine its quality before using it in production. To evaluate a prompt, include only the single prompt in a speech synthesis request by using the following SSML extension, in this case for a prompt whose ID is `goodbye`:
`<ibm:prompt id="goodbye"/>`
In some cases, you might need to rerecord and resubmit a prompt as many as five times to address the following possible problems: * The service might fail to detect a mismatch between the prompt’s text and audio. The longer the prompt, the greater the chance for misalignment between its text and audio. Therefore, multiple shorter prompts are preferable to a single long prompt. * The text of a prompt might include a word that the service does not recognize. In this case, you can create a custom word and pronunciation pair to tell the service how to pronounce the word. You must then re-create the prompt. * The quality of the input audio might be insufficient or the service’s processing of the audio might fail to detect the intended prosody. Submitting new audio for the prompt can correct these issues.
If a prompt that is created without a speaker ID does not adequately reflect the intended prosody, enrolling the speaker and providing a speaker ID for the prompt is one recommended means of potentially improving the quality of the prompt. This is especially important for shorter prompts such as "good-bye" or "thank you," where less audio data makes it more difficult to match the prosody of the speaker. Custom prompts are supported only for use with US English custom models and voices.
**See also:** * [Add a custom prompt](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-create#tbe-create-add-prompt) * [Evaluate a custom prompt](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-create#tbe-create-evaluate-prompt) * [Rules for creating custom prompts](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-rules#tbe-rules-prompts).
- Parameters:
addCustomPromptOptions
- theAddCustomPromptOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typePrompt
-
getCustomPrompt
public com.ibm.cloud.sdk.core.http.ServiceCall<Prompt> getCustomPrompt(GetCustomPromptOptions getCustomPromptOptions)Get a custom prompt.Gets information about a specified custom prompt for a specified custom model. The information includes the prompt ID, prompt text, status, and optional speaker ID for each prompt of the custom model. You must use credentials for the instance of the service that owns the custom model. Custom prompts are supported only for use with US English custom models and voices.
**See also:** [Listing custom prompts](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-custom-prompts#tbe-custom-prompts-list).
- Parameters:
getCustomPromptOptions
- theGetCustomPromptOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typePrompt
-
deleteCustomPrompt
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteCustomPrompt(DeleteCustomPromptOptions deleteCustomPromptOptions)Delete a custom prompt.Deletes an existing custom prompt from a custom model. The service deletes the prompt with the specified ID. You must use credentials for the instance of the service that owns the custom model from which the prompt is to be deleted.
**Caution:** Deleting a custom prompt elicits a 400 response code from synthesis requests that attempt to use the prompt. Make sure that you do not attempt to use a deleted prompt in a production application. Custom prompts are supported only for use with US English custom models and voices.
**See also:** [Deleting a custom prompt](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-custom-prompts#tbe-custom-prompts-delete).
- Parameters:
deleteCustomPromptOptions
- theDeleteCustomPromptOptions
containing the options for the call- Returns:
- a
ServiceCall
with a void result
-
listSpeakerModels
public com.ibm.cloud.sdk.core.http.ServiceCall<Speakers> listSpeakerModels(ListSpeakerModelsOptions listSpeakerModelsOptions)List speaker models.Lists information about all speaker models that are defined for a service instance. The information includes the speaker ID and speaker name of each defined speaker. You must use credentials for the instance of a service to list its speakers. Speaker models and the custom prompts with which they are used are supported only for use with US English custom models and voices.
**See also:** [Listing speaker models](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-speaker-models#tbe-speaker-models-list).
- Parameters:
listSpeakerModelsOptions
- theListSpeakerModelsOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeSpeakers
-
listSpeakerModels
List speaker models.Lists information about all speaker models that are defined for a service instance. The information includes the speaker ID and speaker name of each defined speaker. You must use credentials for the instance of a service to list its speakers. Speaker models and the custom prompts with which they are used are supported only for use with US English custom models and voices.
**See also:** [Listing speaker models](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-speaker-models#tbe-speaker-models-list).
- Returns:
- a
ServiceCall
with a result of typeSpeakers
-
createSpeakerModel
public com.ibm.cloud.sdk.core.http.ServiceCall<SpeakerModel> createSpeakerModel(CreateSpeakerModelOptions createSpeakerModelOptions)Create a speaker model.Creates a new speaker model, which is an optional enrollment token for users who are to add prompts to custom models. A speaker model contains information about a user's voice. The service extracts this information from a WAV audio sample that you pass as the body of the request. Associating a speaker model with a prompt is optional, but the information that is extracted from the speaker model helps the service learn about the speaker's voice.
A speaker model can make an appreciable difference in the quality of prompts, especially short prompts with relatively little audio, that are associated with that speaker. A speaker model can help the service produce a prompt with more confidence; the lack of a speaker model can potentially compromise the quality of a prompt.
The gender of the speaker who creates a speaker model does not need to match the gender of a voice that is used with prompts that are associated with that speaker model. For example, a speaker model that is created by a male speaker can be associated with prompts that are spoken by female voices.
You create a speaker model for a given instance of the service. The new speaker model is owned by the service instance whose credentials are used to create it. That same speaker can then be used to create prompts for all custom models within that service instance. No language is associated with a speaker model, but each custom model has a single specified language. You can add prompts only to US English models.
You specify a name for the speaker when you create it. The name must be unique among all speaker names for the owning service instance. To re-create a speaker model for an existing speaker name, you must first delete the existing speaker model that has that name.
Speaker enrollment is a synchronous operation. Although it accepts more audio data than a prompt, the process of adding a speaker is very fast. The service simply extracts information about the speaker’s voice from the audio. Unlike prompts, speaker models neither need nor accept a transcription of the audio. When the call returns, the audio is fully processed and the speaker enrollment is complete.
The service returns a speaker ID with the request. A speaker ID is globally unique identifier (GUID) that you use to identify the speaker in subsequent requests to the service. Speaker models and the custom prompts with which they are used are supported only for use with US English custom models and voices.
**See also:** * [Create a speaker model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-create#tbe-create-speaker-model) * [Rules for creating speaker models](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-rules#tbe-rules-speakers).
- Parameters:
createSpeakerModelOptions
- theCreateSpeakerModelOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeSpeakerModel
-
getSpeakerModel
public com.ibm.cloud.sdk.core.http.ServiceCall<SpeakerCustomModels> getSpeakerModel(GetSpeakerModelOptions getSpeakerModelOptions)Get a speaker model.Gets information about all prompts that are defined by a specified speaker for all custom models that are owned by a service instance. The information is grouped by the customization IDs of the custom models. For each custom model, the information lists information about each prompt that is defined for that custom model by the speaker. You must use credentials for the instance of the service that owns a speaker model to list its prompts. Speaker models and the custom prompts with which they are used are supported only for use with US English custom models and voices.
**See also:** [Listing the custom prompts for a speaker model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-speaker-models#tbe-speaker-models-list-prompts).
- Parameters:
getSpeakerModelOptions
- theGetSpeakerModelOptions
containing the options for the call- Returns:
- a
ServiceCall
with a result of typeSpeakerCustomModels
-
deleteSpeakerModel
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteSpeakerModel(DeleteSpeakerModelOptions deleteSpeakerModelOptions)Delete a speaker model.Deletes an existing speaker model from the service instance. The service deletes the enrolled speaker with the specified speaker ID. You must use credentials for the instance of the service that owns a speaker model to delete the speaker.
Any prompts that are associated with the deleted speaker are not affected by the speaker's deletion. The prosodic data that defines the quality of a prompt is established when the prompt is created. A prompt is static and remains unaffected by deletion of its associated speaker. However, the prompt cannot be resubmitted or updated with its original speaker once that speaker is deleted. Speaker models and the custom prompts with which they are used are supported only for use with US English custom models and voices.
**See also:** [Deleting a speaker model](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-speaker-models#tbe-speaker-models-delete).
- Parameters:
deleteSpeakerModelOptions
- theDeleteSpeakerModelOptions
containing the options for the call- Returns:
- a
ServiceCall
with a void result
-
deleteUserData
public com.ibm.cloud.sdk.core.http.ServiceCall<Void> deleteUserData(DeleteUserDataOptions deleteUserDataOptions)Delete labeled data.Deletes all data that is associated with a specified customer ID. The method deletes all data for the customer ID, regardless of the method by which the information was added. The method has no effect if no data is associated with the customer ID. You must issue the request with credentials for the same instance of the service that was used to associate the customer ID with the data. You associate a customer ID with data by passing the `X-Watson-Metadata` header with a request that passes the data.
**Note:** If you delete an instance of the service from the service console, all data associated with that service instance is automatically deleted. This includes all custom models and word/translation pairs, and all data related to speech synthesis requests.
**See also:** [Information security](https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-information-security#information-security).
- Parameters:
deleteUserDataOptions
- theDeleteUserDataOptions
containing the options for the call- Returns:
- a
ServiceCall
with a void result
-