ibm_watson.text_to_speech_v1 module¶

The IBM® Text to Speech service provides APIs that use IBM’s speech-synthesis capabilities to synthesize text into natural-sounding speech in a variety of languages, dialects, and voices. The service supports at least one male or female voice, sometimes both, for each language. The audio is streamed back to the client with minimal delay. For speech synthesis, the service supports a synchronous HTTP Representational State Transfer (REST) interface. It also supports a WebSocket interface that provides both plain text and SSML input, including the SSML <mark> element and word timings. SSML is an XML-based markup language that provides text annotation for speech-synthesis applications. The service also offers a customization interface. You can use the interface to define sounds-like or phonetic translations for words. A sounds-like translation consists of one or more words that, when combined, sound like the word. A phonetic translation is based on the SSML phoneme format for representing a word. You can specify a phonetic translation in standard International Phonetic Alphabet (IPA) representation or in the proprietary IBM Symbolic Phonetic Representation (SPR).

class TextToSpeechV1(url='https://stream.watsonplatform.net/text-to-speech/api', username=None, password=None, iam_apikey=None, iam_access_token=None, iam_url=None, iam_client_id=None, iam_client_secret=None, icp4d_access_token=None, icp4d_url=None, authentication_type=None)[source]¶

Bases: ibm_cloud_sdk_core.base_service.BaseService

The Text to Speech V1 service.

default_url = 'https://stream.watsonplatform.net/text-to-speech/api'¶

list_voices(**kwargs)[source]¶

List voices.

Lists all voices available for use with the service. The information includes the name, language, gender, and other details about the voice. To see information about a specific voice, use the Get a voice method. See also: [Listing all available voices](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-voices#listVoices).

Parameters: headers (dict) – A dict containing the request headers
Returns: A DetailedResponse containing the result, headers and HTTP status code.
Return type: DetailedResponse

get_voice(voice, customization_id=None, **kwargs)[source]¶

Get a voice.

Gets information about the specified voice. The information includes the name, language, gender, and other details about the voice. Specify a customization ID to obtain information for a custom voice model that is defined for the language of the specified voice. To list information about all available voices, use the List voices method. See also: [Listing a specific voice](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-voices#listVoice).

Parameters

voice (str) – The voice for which information is to be returned.
customization_id (str) – The customization ID (GUID) of a custom voice model

for which information is to be returned. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to see information about the specified voice with no customization. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

synthesize(text, voice=None, customization_id=None, accept=None, **kwargs)[source]¶

Synthesize audio.

Synthesizes text to audio that is spoken in the specified voice. The service bases its understanding of the language for the input text on the specified voice. Use a voice that matches the language of the input text. The method accepts a maximum of 5 KB of input text in the body of the request, and 8 KB for the URL and headers. The 5 KB limit includes any SSML tags that you specify. The service returns the synthesized audio stream as an array of bytes. See also: [The HTTP interface](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-usingHTTP#usingHTTP). ### Audio formats (accept types)

The service can return audio in the following formats (MIME types).

Where indicated, you can optionally specify the sampling rate (rate) of the

audio. You must specify a sampling rate for the audio/l16 and audio/mulaw formats. A specified sampling rate must lie in the range of 8 kHz to 192 kHz. * For the audio/l16 format, you can optionally specify the endianness (endianness) of the audio: endianness=big-endian or endianness=little-endian. Use the Accept header or the accept parameter to specify the requested format of the response audio. If you omit an audio format altogether, the service returns the audio in Ogg format with the Opus codec (audio/ogg;codecs=opus). The service always returns single-channel audio. * audio/basic

The service returns audio with a sampling rate of 8000 Hz.

audio/flac You can optionally specify the rate of the audio. The default sampling rate is

22,050 Hz. * audio/l16

You must specify the rate of the audio. You can optionally specify the

endianness of the audio. The default endianness is little-endian. * audio/mp3

You can optionally specify the rate of the audio. The default sampling rate is

22,050 Hz. * audio/mpeg

You can optionally specify the rate of the audio. The default sampling rate is

22,050 Hz. * audio/mulaw

You must specify the rate of the audio.

audio/ogg The service returns the audio in the vorbis codec. You can optionally specify

the rate of the audio. The default sampling rate is 22,050 Hz. * audio/ogg;codecs=opus

You can optionally specify the rate of the audio. The default sampling rate is

22,050 Hz. * audio/ogg;codecs=vorbis

You can optionally specify the rate of the audio. The default sampling rate is

22,050 Hz. * audio/wav

You can optionally specify the rate of the audio. The default sampling rate is

22,050 Hz. * audio/webm

The service returns the audio in the opus codec. The service returns audio

with a sampling rate of 48,000 Hz. * audio/webm;codecs=opus

The service returns audio with a sampling rate of 48,000 Hz.

audio/webm;codecs=vorbis You can optionally specify the rate of the audio. The default sampling rate is

22,050 Hz. For more information about specifying an audio format, including additional details about some of the formats, see [Audio formats](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-audioFormats#audioFormats). ### Warning messages

If a request includes invalid query parameters, the service returns a Warnings

response header that provides messages about the invalid parameters. The warning includes a descriptive message and a list of invalid argument strings. For example, a message such as “Unknown arguments:” or “Unknown url query arguments:” followed by a list of the form “{invalid_arg_1}, {invalid_arg_2}.” The request succeeds despite the warnings.

Parameters

text (str) – The text to synthesize.
voice (str) – The voice to use for synthesis.
customization_id (str) – The customization ID (GUID) of a custom voice model

to use for the synthesis. If a custom voice model is specified, it is guaranteed to work only if it matches the language of the indicated voice. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to use the specified voice with no customization. :param str accept: The requested format (MIME type) of the audio. You can use the Accept header or the accept parameter to specify the audio format. For more information about specifying an audio format, see Audio formats (accept types) in the method description. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_pronunciation(text, voice=None, format=None, customization_id=None, **kwargs)[source]¶

Get pronunciation.

Gets the phonetic pronunciation for the specified word. You can request the pronunciation for a specific format. You can also request the pronunciation for a specific voice to see the default translation for the language of that voice or for a specific custom voice model to see the translation for that voice model. Note: This method is currently a beta release. See also: [Querying a word from a language](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customWords#cuWordsQueryLanguage).

Parameters

text (str) – The word for which the pronunciation is requested.
voice (str) – A voice that specifies the language in which the pronunciation

is to be returned. All voices for the same language (for example, en-US) return the same translation. :param str format: The phoneme format in which to return the pronunciation. Omit the parameter to obtain the pronunciation in the default format. :param str customization_id: The customization ID (GUID) of a custom voice model for which the pronunciation is to be returned. The language of a specified custom model must match the language of the specified voice. If the word is not defined in the specified custom model, the service returns the default translation for the custom model’s language. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to see the translation for the specified voice with no customization. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_voice_model(name, language=None, description=None, **kwargs)[source]¶

Create a custom model.

Creates a new empty custom voice model. You must specify a name for the new custom model. You can optionally specify the language and a description for the new model. The model is owned by the instance of the service whose credentials are used to create it. Note: This method is currently a beta release. See also: [Creating a custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customModels#cuModelsCreate).

Parameters

name (str) – The name of the new custom voice model.
language (str) – The language of the new custom voice model. Omit the

parameter to use the the default language, en-US. :param str description: A description of the new custom voice model. Specifying a description is recommended. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

list_voice_models(language=None, **kwargs)[source]¶

List custom models.

Lists metadata such as the name and description for all custom voice models that are owned by an instance of the service. Specify a language to list the voice models for that language only. To see the words in addition to the metadata for a specific voice model, use the List a custom model method. You must use credentials for the instance of the service that owns a model to list information about it. Note: This method is currently a beta release. See also: [Querying all custom models](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customModels#cuModelsQueryAll).

Parameters: language (str) – The language for which custom voice models that are owned by

the requesting credentials are to be returned. Omit the parameter to see all custom voice models that are owned by the requester. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

update_voice_model(customization_id, name=None, description=None, words=None, **kwargs)[source]¶

Update a custom model.

Updates information for the specified custom voice model. You can update metadata such as the name and description of the voice model. You can also update the words in the model and their translations. Adding a new translation for a word that already exists in a custom model overwrites the word’s existing translation. A custom model can contain no more than 20,000 entries. You must use credentials for the instance of the service that owns a model to update it. You can define sounds-like or phonetic translations for words. A sounds-like translation consists of one or more words that, when combined, sound like the word. Phonetic translations are based on the SSML phoneme format for representing a word. You can specify them in standard International Phonetic Alphabet (IPA) representation

<code><phoneme alphabet=”ipa”

ph=”təmˈɑto”></phoneme></code>: or in the proprietary IBM Symbolic Phonetic Representation (SPR) <code><phoneme alphabet=”ibm”

ph=”1gAstroEntxrYFXs”></phoneme></code> Note: This method is currently a beta release. See also: * [Updating a custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customModels#cuModelsUpdate) * [Adding words to a Japanese custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customWords#cuJapaneseAdd) * [Understanding customization](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customIntro#customIntro).

Parameters: customization_id (str) – The customization ID (GUID) of the custom voice

model. You must make the request with credentials for the instance of the service that owns the custom model. :param str name: A new name for the custom voice model. :param str description: A new description for the custom voice model. :param list[Word] words: An array of Word objects that provides the words and their translations that are to be added or updated for the custom voice model. Pass an empty array to make no additions or updates. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_voice_model(customization_id, **kwargs)[source]¶

Get a custom model.

Gets all information about a specified custom voice model. In addition to metadata such as the name and description of the voice model, the output includes the words and their translations as defined in the model. To see just the metadata for a voice model, use the List custom models method. Note: This method is currently a beta release. See also: [Querying a custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customModels#cuModelsQuery).

Parameters: customization_id (str) – The customization ID (GUID) of the custom voice

model. You must make the request with credentials for the instance of the service that owns the custom model. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_voice_model(customization_id, **kwargs)[source]¶

Delete a custom model.

Deletes the specified custom voice model. You must use credentials for the instance of the service that owns a model to delete it. Note: This method is currently a beta release. See also: [Deleting a custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customModels#cuModelsDelete).

Parameters: customization_id (str) – The customization ID (GUID) of the custom voice

model. You must make the request with credentials for the instance of the service that owns the custom model. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

add_words(customization_id, words, **kwargs)[source]¶

Add custom words.

Adds one or more words and their translations to the specified custom voice model. Adding a new translation for a word that already exists in a custom model overwrites the word’s existing translation. A custom model can contain no more than 20,000 entries. You must use credentials for the instance of the service that owns a model to add words to it. You can define sounds-like or phonetic translations for words. A sounds-like translation consists of one or more words that, when combined, sound like the word. Phonetic translations are based on the SSML phoneme format for representing a word. You can specify them in standard International Phonetic Alphabet (IPA) representation

<code><phoneme alphabet=”ipa”

ph=”təmˈɑto”></phoneme></code>: or in the proprietary IBM Symbolic Phonetic Representation (SPR) <code><phoneme alphabet=”ibm”

ph=”1gAstroEntxrYFXs”></phoneme></code> Note: This method is currently a beta release. See also: * [Adding multiple words to a custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customWords#cuWordsAdd) * [Adding words to a Japanese custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customWords#cuJapaneseAdd) * [Understanding customization](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customIntro#customIntro).

Parameters: customization_id (str) – The customization ID (GUID) of the custom voice

model. You must make the request with credentials for the instance of the service that owns the custom model. :param list[Word] words: The Add custom words method accepts an array of Word objects. Each object provides a word that is to be added or updated for the custom voice model and the word’s translation. The List custom words method returns an array of Word objects. Each object shows a word and its translation from the custom voice model. The words are listed in alphabetical order, with uppercase letters listed before lowercase letters. The array is empty if the custom model contains no words. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

list_words(customization_id, **kwargs)[source]¶

List custom words.

Lists all of the words and their translations for the specified custom voice model. The output shows the translations as they are defined in the model. You must use credentials for the instance of the service that owns a model to list its words. Note: This method is currently a beta release. See also: [Querying all words from a custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customWords#cuWordsQueryModel).

Parameters: customization_id (str) – The customization ID (GUID) of the custom voice

model. You must make the request with credentials for the instance of the service that owns the custom model. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

add_word(customization_id, word, translation, part_of_speech=None, **kwargs)[source]¶

Add a custom word.

Adds a single word and its translation to the specified custom voice model. Adding a new translation for a word that already exists in a custom model overwrites the word’s existing translation. A custom model can contain no more than 20,000 entries. You must use credentials for the instance of the service that owns a model to add a word to it. You can define sounds-like or phonetic translations for words. A sounds-like translation consists of one or more words that, when combined, sound like the word. Phonetic translations are based on the SSML phoneme format for representing a word. You can specify them in standard International Phonetic Alphabet (IPA) representation

<code><phoneme alphabet=”ipa”

ph=”təmˈɑto”></phoneme></code>: or in the proprietary IBM Symbolic Phonetic Representation (SPR) <code><phoneme alphabet=”ibm”

ph=”1gAstroEntxrYFXs”></phoneme></code> Note: This method is currently a beta release. See also: * [Adding a single word to a custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customWords#cuWordAdd) * [Adding words to a Japanese custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customWords#cuJapaneseAdd) * [Understanding customization](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customIntro#customIntro).

Parameters: customization_id (str) – The customization ID (GUID) of the custom voice

model. You must make the request with credentials for the instance of the service that owns the custom model. :param str word: The word that is to be added or updated for the custom voice model. :param str translation: The phonetic or sounds-like translation for the word. A phonetic translation is based on the SSML format for representing the phonetic string of a word either as an IPA translation or as an IBM SPR translation. A sounds-like is one or more words that, when combined, sound like the word. :param str part_of_speech: Japanese only. The part of speech for the word. The service uses the value to produce the correct intonation for the word. You can create only a single entry, with or without a single part of speech, for any word; you cannot create multiple entries with different parts of speech for the same word. For more information, see [Working with Japanese entries](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-rules#jaNotes). :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_word(customization_id, word, **kwargs)[source]¶

Get a custom word.

Gets the translation for a single word from the specified custom model. The output shows the translation as it is defined in the model. You must use credentials for the instance of the service that owns a model to list its words. Note: This method is currently a beta release. See also: [Querying a single word from a custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customWords#cuWordQueryModel).

Parameters: customization_id (str) – The customization ID (GUID) of the custom voice

model. You must make the request with credentials for the instance of the service that owns the custom model. :param str word: The word that is to be queried from the custom voice model. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_word(customization_id, word, **kwargs)[source]¶

Delete a custom word.

Deletes a single word from the specified custom voice model. You must use credentials for the instance of the service that owns a model to delete its words. Note: This method is currently a beta release. See also: [Deleting a word from a custom model](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-customWords#cuWordDelete).

Parameters: customization_id (str) – The customization ID (GUID) of the custom voice

model. You must make the request with credentials for the instance of the service that owns the custom model. :param str word: The word that is to be deleted from the custom voice model. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_user_data(customer_id, **kwargs)[source]¶

Delete labeled data.

Deletes all data that is associated with a specified customer ID. The method deletes all data for the customer ID, regardless of the method by which the information was added. The method has no effect if no data is associated with the customer ID. You must issue the request with credentials for the same instance of the service that was used to associate the customer ID with the data. You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes the data. See also: [Information security](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-information-security#information-security).

Parameters

customer_id (str) – The customer ID for which all data is to be deleted.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

class Pronunciation(pronunciation)[source]¶

Bases: object

The pronunciation of the specified text.

Attr str pronunciation: The pronunciation of the specified text in the requested

voice and format. If a custom voice model is specified, the pronunciation also reflects that custom voice.

class SupportedFeatures(custom_pronunciation, voice_transformation)[source]¶

Bases: object

Additional service features that are supported with the voice.

Attr bool custom_pronunciation: If true, the voice can be customized; if false,

the voice cannot be customized. (Same as customizable.). :attr bool voice_transformation: If true, the voice can be transformed by using the SSML <voice-transformation> element; if false, the voice cannot be transformed.

class Translation(translation, part_of_speech=None)[source]¶

Bases: object

Information about the translation for the specified text.

Attr str translation: The phonetic or sounds-like translation for the word. A

phonetic translation is based on the SSML format for representing the phonetic string of a word either as an IPA translation or as an IBM SPR translation. A sounds-like is one or more words that, when combined, sound like the word. :attr str part_of_speech: (optional) Japanese only. The part of speech for the word. The service uses the value to produce the correct intonation for the word. You can create only a single entry, with or without a single part of speech, for any word; you cannot create multiple entries with different parts of speech for the same word. For more information, see [Working with Japanese entries](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-rules#jaNotes).

class Voice(url, gender, name, language, description, customizable, supported_features, customization=None)[source]¶

Bases: object

Information about an available voice model.

Attr str url: The URI of the voice.
Attr str gender: The gender of the voice: male or female.
Attr str name: The name of the voice. Use this as the voice identifier in all

requests. :attr str language: The language and region of the voice (for example, en-US). :attr str description: A textual description of the voice. :attr bool customizable: If true, the voice can be customized; if false, the voice cannot be customized. (Same as custom_pronunciation; maintained for backward compatibility.). :attr SupportedFeatures supported_features: Additional service features that are supported with the voice. :attr VoiceModel customization: (optional) Returns information about a specified custom voice model. This field is returned only by the Get a voice method and only when you specify the customization ID of a custom voice model.

class VoiceModel(customization_id, name=None, language=None, owner=None, created=None, last_modified=None, description=None, words=None)[source]¶

Bases: object

Information about an existing custom voice model.

Attr str customization_id: The customization ID (GUID) of the custom voice model. The

Create a custom model method returns only this field. It does not not return the other fields of this object. :attr str name: (optional) The name of the custom voice model. :attr str language: (optional) The language identifier of the custom voice model (for example, en-US). :attr str owner: (optional) The GUID of the credentials for the instance of the service that owns the custom voice model. :attr str created: (optional) The date and time in Coordinated Universal Time (UTC) at which the custom voice model was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD). :attr str last_modified: (optional) The date and time in Coordinated Universal Time (UTC) at which the custom voice model was last modified. The created and updated fields are equal when a voice model is first added but has yet to be updated. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD). :attr str description: (optional) The description of the custom voice model. :attr list[Word] words: (optional) An array of Word objects that lists the words and their translations from the custom voice model. The words are listed in alphabetical order, with uppercase letters listed before lowercase letters. The array is empty if the custom model contains no words. This field is returned only by the Get a voice method and only when you specify the customization ID of a custom voice model.

class VoiceModels(customizations)[source]¶

Bases: object

Information about existing custom voice models.

Attr list[VoiceModel] customizations: An array of VoiceModel objects that provides

information about each available custom voice model. The array is empty if the requesting credentials own no custom voice models (if no language is specified) or own no custom voice models for the specified language.

class Voices(voices)[source]¶

Bases: object

Information about all available voice models.

Attr list[Voice] voices: A list of available voices.

class Word(word, translation, part_of_speech=None)[source]¶

Bases: object

Information about a word for the custom voice model.

Attr str word: The word for the custom voice model.
Attr str translation: The phonetic or sounds-like translation for the word. A

phonetic translation is based on the SSML format for representing the phonetic string of a word either as an IPA or IBM SPR translation. A sounds-like translation consists of one or more words that, when combined, sound like the word. :attr str part_of_speech: (optional) Japanese only. The part of speech for the word. The service uses the value to produce the correct intonation for the word. You can create only a single entry, with or without a single part of speech, for any word; you cannot create multiple entries with different parts of speech for the same word. For more information, see [Working with Japanese entries](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-rules#jaNotes).

class Words(words)[source]¶

Bases: object

For the Add custom words method, one or more words that are to be added or updated for the custom voice model and the translation for each specified word. For the List custom words method, the words and their translations from the custom voice model.

Attr list[Word] words: The Add custom words method accepts an array of Word

objects. Each object provides a word that is to be added or updated for the custom voice model and the word’s translation. The List custom words method returns an array of Word objects. Each object shows a word and its translation from the custom voice model. The words are listed in alphabetical order, with uppercase letters listed before lowercase letters. The array is empty if the custom model contains no words.

ibm_watson.text_to_speech_v1 module¶

Useful Links

Related Topics