watson_developer_cloud.discovery_v1 module

The IBM Watson Discovery Service is a cognitive search and content analytics engine that you can add to applications to identify patterns, trends and actionable insights to drive better decision-making. Securely unify structured and unstructured data with pre-enriched content, and use a simplified query language to eliminate the need for manual filtering of results.

class DiscoveryV1(version, url='https://gateway.watsonplatform.net/discovery/api', username=None, password=None)[source]

Bases: watson_developer_cloud.watson_service.WatsonService

The Discovery V1 service.

default_url = 'https://gateway.watsonplatform.net/discovery/api'
VERSION_DATE_2017_09_01 = '2017-09-01'
VERSION_DATE_2017_08_01 = '2017-08-01'
VERSION_DATE_2017_07_19 = '2017-07-19'
VERSION_DATE_2017_06_25 = '2017-06-25'
VERSION_DATE_2016_12_01 = '2016-12-01'
create_environment(name, description=None, size=None)[source]

Add an environment.

Creates a new environment. You can create only one environment per service instance. An attempt to create another environment results in an error.

Parameters:
  • name (str) – Name that identifies the environment.
  • description (str) – Description of the environment.
  • size (int) – Deprecated: Size of the environment.
Returns:

A dict containing the Environment response.

Return type:

dict

delete_environment(environment_id)[source]

Delete environment.

Parameters:environment_id (str) – The ID of the environment.
Returns:A dict containing the DeleteEnvironmentResponse response.
Return type:dict
get_environment(environment_id)[source]

Get environment info.

Parameters:environment_id (str) – The ID of the environment.
Returns:A dict containing the Environment response.
Return type:dict
list_environments(name=None)[source]

List environments.

List existing environments for the service instance.

Parameters:name (str) – Show only the environment with the given name.
Returns:A dict containing the ListEnvironmentsResponse response.
Return type:dict
list_fields(environment_id, collection_ids)[source]

List fields in specified collecitons.

Gets a list of the unique fields (and their types) stored in the indexes of the specified collecitons.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_ids (list[str]) – A comma-separated list of collection IDs to be queried against.
Returns:

A dict containing the ListCollectionFieldsResponse response.

Return type:

dict

update_environment(environment_id, name=None, description=None)[source]

Update an environment.

Updates an environment. The environment’s name and description parameters can be changed. You must specify a name for the environment.

Parameters:
  • environment_id (str) – The ID of the environment.
  • name (str) – Name that identifies the environment.
  • description (str) – Description of the environment.
Returns:

A dict containing the Environment response.

Return type:

dict

create_configuration(environment_id, name, description=None, conversions=None, enrichments=None, normalizations=None)[source]

Add configuration.

Creates a new configuration. If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration. The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.

Parameters:
  • environment_id (str) – The ID of the environment.
  • name (str) – The name of the configuration.
  • description (str) – The description of the configuration, if available.
  • conversions (Conversions) – The document conversion settings for the configuration.
  • enrichments (list[Enrichment]) – An array of document enrichment settings for the configuration.
  • normalizations (list[NormalizationOperation]) – Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
Returns:

A dict containing the Configuration response.

Return type:

dict

delete_configuration(environment_id, configuration_id)[source]

Delete a configuration.

The deletion is performed unconditionally. A configuration deletion request succeeds even if the configuration is referenced by a collection or document ingestion. However, documents that have already been submitted for processing continue to use the deleted configuration. Documents are always processed with a snapshot of the configuration as it existed at the time the document was submitted.

Parameters:
  • environment_id (str) – The ID of the environment.
  • configuration_id (str) – The ID of the configuration.
Returns:

A dict containing the DeleteConfigurationResponse response.

Return type:

dict

get_configuration(environment_id, configuration_id)[source]

Get configuration details.

Parameters:
  • environment_id (str) – The ID of the environment.
  • configuration_id (str) – The ID of the configuration.
Returns:

A dict containing the Configuration response.

Return type:

dict

list_configurations(environment_id, name=None)[source]

List configurations.

Lists existing configurations for the service instance.

Parameters:
  • environment_id (str) – The ID of the environment.
  • name (str) – Find configurations with the given name.
Returns:

A dict containing the ListConfigurationsResponse response.

Return type:

dict

update_configuration(environment_id, configuration_id, name, description=None, conversions=None, enrichments=None, normalizations=None)[source]

Update a configuration.

Replaces an existing configuration. * Completely replaces the original configuration. * The configuration_id, updated, and created fields are accepted in the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties. * Documents are processed with a snapshot of the configuration as it was at the time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.

Parameters:
  • environment_id (str) – The ID of the environment.
  • configuration_id (str) – The ID of the configuration.
  • name (str) – The name of the configuration.
  • description (str) – The description of the configuration, if available.
  • conversions (Conversions) – The document conversion settings for the configuration.
  • enrichments (list[Enrichment]) – An array of document enrichment settings for the configuration.
  • normalizations (list[NormalizationOperation]) – Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
Returns:

A dict containing the Configuration response.

Return type:

dict

test_configuration_in_environment(environment_id, configuration=None, step=None, configuration_id=None, file=None, metadata=None, file_content_type=None, filename=None)[source]

Test configuration.

Runs a sample document through the default or your configuration and returns diagnostic information designed to help you understand how the document was processed. The document is not added to the index.

Parameters:
  • environment_id (str) – The ID of the environment.
  • configuration (str) – The configuration to use to process the document. If this part is provided, then the provided configuration is used to process the document. If the configuration_id is also provided (both are present at the same time), then request is rejected. The maximum supported configuration size is 1 MB. Configuration parts larger than 1 MB are rejected. See the GET /configurations/{configuration_id} operation for an example configuration.
  • step (str) – Specify to only run the input document through the given step instead of running the input document through the entire ingestion workflow. Valid values are convert, enrich, and normalize.
  • configuration_id (str) – The ID of the configuration to use to process the document. If the configuration form part is also provided (both are present at the same time), then request will be rejected.
  • file (file) – The content of the document to ingest. The maximum supported file size is 50 megabytes. Files larger than 50 megabytes is rejected.
  • metadata (str) – If you’re using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ` {   "Creator": "Johnny Appleseed",   "Subject": "Apples" } `.
  • file_content_type (str) – The content type of file.
  • filename (str) – The filename for file.
Returns:

A dict containing the TestDocument response.

Return type:

dict

create_collection(environment_id, name, description=None, configuration_id=None, language=None)[source]

Create a collection.

Parameters:
  • environment_id (str) – The ID of the environment.
  • name (str) – The name of the collection to be created.
  • description (str) – A description of the collection.
  • configuration_id (str) – The ID of the configuration in which the collection is to be created.
  • language (str) – The language of the documents stored in the collection, in the form of an ISO 639-1 language code.
Returns:

A dict containing the Collection response.

Return type:

dict

delete_collection(environment_id, collection_id)[source]

Delete a collection.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
Returns:

A dict containing the DeleteCollectionResponse response.

Return type:

dict

get_collection(environment_id, collection_id)[source]

Get collection details.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
Returns:

A dict containing the Collection response.

Return type:

dict

list_collection_fields(environment_id, collection_id)[source]

List unique fields.

Gets a list of the unique fields (and their types) stored in the index.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
Returns:

A dict containing the ListCollectionFieldsResponse response.

Return type:

dict

list_collections(environment_id, name=None)[source]

List collections.

Lists existing collections for the service instance.

Parameters:
  • environment_id (str) – The ID of the environment.
  • name (str) – Find collections with the given name.
Returns:

A dict containing the ListCollectionsResponse response.

Return type:

dict

update_collection(environment_id, collection_id, name, description=None, configuration_id=None)[source]

Update a collection.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • name (str) – The name of the collection.
  • description (str) – A description of the collection.
  • configuration_id (str) – The ID of the configuration in which the collection is to be updated.
Returns:

A dict containing the Collection response.

Return type:

dict

add_document(environment_id, collection_id, file=None, metadata=None, file_content_type=None, filename=None)[source]

Add a document.

Add a document to a collection with optional metadata. * The version query parameter is still required. * Returns immediately after the system has accepted the document for processing. * The user must provide document content, metadata, or both. If the request is missing both document content and metadata, it is rejected. * The user can set the Content-Type parameter on the file part to indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example, application/octet-stream), then the service attempts to automatically detect the document’s media type. * The following field names are reserved and will be filtered out if present after normalization: id, score, highlight, and any field with the prefix of: _, +, or - * Fields with empty name values after normalization are filtered out before indexing. * Fields containing the following characters after normalization are filtered out before indexing: # and ,.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • file (file) – The content of the document to ingest. The maximum supported file size is 50 megabytes. Files larger than 50 megabytes is rejected.
  • metadata (str) – If you’re using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ` {   "Creator": "Johnny Appleseed",   "Subject": "Apples" } `.
  • file_content_type (str) – The content type of file.
  • filename (str) – The filename for file.
Returns:

A dict containing the DocumentAccepted response.

Return type:

dict

delete_document(environment_id, collection_id, document_id)[source]

Delete a document.

If the given document ID is invalid, or if the document is not found, then the a success response is returned (HTTP status code 200) with the status set to ‘deleted’.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • document_id (str) – The ID of the document.
Returns:

A dict containing the DeleteDocumentResponse response.

Return type:

dict

get_document_status(environment_id, collection_id, document_id)[source]

Get document details.

Fetch status details about a submitted document. Note: this operation does not return the document itself. Instead, it returns only the document’s processing status and any notices (warnings or errors) that were generated when the document was ingested. Use the query API to retrieve the actual document content.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • document_id (str) – The ID of the document.
Returns:

A dict containing the DocumentStatus response.

Return type:

dict

update_document(environment_id, collection_id, document_id, file=None, metadata=None, file_content_type=None, filename=None)[source]

Update a document.

Replace an existing document. Starts ingesting a document with optional metadata.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • document_id (str) – The ID of the document.
  • file (file) – The content of the document to ingest. The maximum supported file size is 50 megabytes. Files larger than 50 megabytes is rejected.
  • metadata (str) – If you’re using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ` {   "Creator": "Johnny Appleseed",   "Subject": "Apples" } `.
  • file_content_type (str) – The content type of file.
  • filename (str) – The filename for file.
Returns:

A dict containing the DocumentAccepted response.

Return type:

dict

federated_query(environment_id, collection_ids, filter=None, query=None, natural_language_query=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, deduplicate=None, deduplicate_field=None)[source]

Query documents in multiple collections.

See the [Discovery service documentation](https://console.bluemix.net/docs/services/discovery/using.html) for more details.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_ids (list[str]) – A comma-separated list of collection IDs to be queried against.
  • filter (str) – A cacheable query that limits the documents returned to exclude any documents that don’t mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set.
  • query (str) – A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time.
  • natural_language_query (str) – A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time.
  • aggregation (str) – An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference.
  • count (int) – Number of documents to return.
  • return_fields (list[str]) – A comma separated list of the portion of the document hierarchy to return.
  • offset (int) – The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results.
  • sort (list[str]) – A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified.
  • highlight (bool) – When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false.
  • deduplicate (bool) – When true and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only, offset is not considered. Defaults to false. This parameter is currently Beta functionality.
  • deduplicate_field (str) – When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality.
Returns:

A dict containing the QueryResponse response.

Return type:

dict

federated_query_notices(environment_id, collection_ids, filter=None, query=None, natural_language_query=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, deduplicate_field=None)[source]

Query multiple collection system notices.

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the [Discovery service documentation](https://console.bluemix.net/docs/services/discovery/using.html) for more details on the query language.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_ids (list[str]) – A comma-separated list of collection IDs to be queried against.
  • filter (str) – A cacheable query that limits the documents returned to exclude any documents that don’t mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set.
  • query (str) – A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time.
  • natural_language_query (str) – A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time.
  • aggregation (str) – An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference.
  • count (int) – Number of documents to return.
  • return_fields (list[str]) – A comma separated list of the portion of the document hierarchy to return.
  • offset (int) – The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results.
  • sort (list[str]) – A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified.
  • highlight (bool) – When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false.
  • deduplicate_field (str) – When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality.
Returns:

A dict containing the QueryNoticesResponse response.

Return type:

dict

query(environment_id, collection_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate=None, deduplicate_field=None)[source]

Query documents.

See the [Discovery service documentation](https://console.bluemix.net/docs/services/discovery/using.html) for more details.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • filter (str) – A cacheable query that limits the documents returned to exclude any documents that don’t mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set.
  • query (str) – A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time.
  • natural_language_query (str) – A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time.
  • passages (bool) – A passages query that returns the most relevant passages from the results.
  • aggregation (str) – An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference.
  • count (int) – Number of documents to return.
  • return_fields (list[str]) – A comma separated list of the portion of the document hierarchy to return_fields.
  • offset (int) – The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results.
  • sort (list[str]) – A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified.
  • highlight (bool) – When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false.
  • passages_fields (list[str]) – A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included.
  • passages_count (int) – The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100.
  • passages_characters (int) – The approximate number of characters that any one passage will have. The default is 400. The minimum is 50. The maximum is 2000.
  • deduplicate (bool) – When true and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only, offset is not considered. Defaults to false. This parameter is currently Beta functionality.
  • deduplicate_field (str) – When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality.
Returns:

A dict containing the QueryResponse response.

Return type:

dict

query_notices(environment_id, collection_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate_field=None)[source]

Query system notices.

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the [Discovery service documentation](https://console.bluemix.net/docs/services/discovery/using.html) for more details on the query language.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • filter (str) – A cacheable query that limits the documents returned to exclude any documents that don’t mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set.
  • query (str) – A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time.
  • natural_language_query (str) – A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time.
  • passages (bool) – A passages query that returns the most relevant passages from the results.
  • aggregation (str) – An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference.
  • count (int) – Number of documents to return.
  • return_fields (list[str]) – A comma separated list of the portion of the document hierarchy to return.
  • offset (int) – The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results.
  • sort (list[str]) – A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified.
  • highlight (bool) – When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false.
  • passages_fields (list[str]) – A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included.
  • passages_count (int) – The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100.
  • passages_characters (int) – The approximate number of characters that any one passage will have. The default is 400. The minimum is 50. The maximum is 2000.
  • deduplicate_field (str) – When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality.
Returns:

A dict containing the QueryNoticesResponse response.

Return type:

dict

add_training_data(environment_id, collection_id, natural_language_query=None, filter=None, examples=None)[source]

Adds a query to the training data for this collection. The query can contain a filter and natural language query.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • natural_language_query (str) –
  • filter (str) –
  • examples (list[TrainingExample]) –
Returns:

A dict containing the TrainingQuery response.

Return type:

dict

create_training_example(environment_id, collection_id, query_id, document_id=None, cross_reference=None, relevance=None)[source]

Adds a new example to this training data query.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • query_id (str) – The ID of the query used for training.
  • document_id (str) –
  • cross_reference (str) –
  • relevance (int) –
Returns:

A dict containing the TrainingExample response.

Return type:

dict

delete_all_training_data(environment_id, collection_id)[source]

Clears all training data for this collection.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
Return type:

None

delete_training_data(environment_id, collection_id, query_id)[source]

Removes the training data and all associated examples from the training data set.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • query_id (str) – The ID of the query used for training.
Return type:

None

delete_training_example(environment_id, collection_id, query_id, example_id)[source]

Removes the example with the given ID for the training data query.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • query_id (str) – The ID of the query used for training.
  • example_id (str) – The ID of the document as it is indexed.
Return type:

None

get_training_data(environment_id, collection_id, query_id)[source]

Shows details for a specific training data query, including the query string and all examples.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • query_id (str) – The ID of the query used for training.
Returns:

A dict containing the TrainingQuery response.

Return type:

dict

get_training_example(environment_id, collection_id, query_id, example_id)[source]

Gets the details for this training example.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • query_id (str) – The ID of the query used for training.
  • example_id (str) – The ID of the document as it is indexed.
Returns:

A dict containing the TrainingExample response.

Return type:

dict

list_training_data(environment_id, collection_id)[source]

Lists the training data for this collection.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
Returns:

A dict containing the TrainingDataSet response.

Return type:

dict

list_training_examples(environment_id, collection_id, query_id)[source]

List all examples for this training data query.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • query_id (str) – The ID of the query used for training.
Returns:

A dict containing the TrainingExampleList response.

Return type:

dict

update_training_example(environment_id, collection_id, query_id, example_id, cross_reference=None, relevance=None)[source]

Changes the label or cross reference query for this training example.

Parameters:
  • environment_id (str) – The ID of the environment.
  • collection_id (str) – The ID of the collection.
  • query_id (str) – The ID of the query used for training.
  • example_id (str) – The ID of the document as it is indexed.
  • cross_reference (str) –
  • relevance (int) –
Returns:

A dict containing the TrainingExample response.

Return type:

dict

class AggregationResult(key=None, matching_results=None, aggregations=None)[source]

Bases: object

AggregationResult.

Attr str key:(optional) Key that matched the aggregation type.
Attr int matching_results:
 (optional) Number of matching results.
Attr list[QueryAggregation] aggregations:
 (optional) Aggregations returned in the case of chained aggregations.
class Collection(collection_id=None, name=None, description=None, created=None, updated=None, status=None, configuration_id=None, language=None, document_counts=None, disk_usage=None, training_status=None)[source]

Bases: object

A collection for storing documents.

Attr str collection_id:
 (optional) The unique identifier of the collection.
Attr str name:(optional) The name of the collection.
Attr str description:
 (optional) The description of the collection.
Attr datetime created:
 (optional) The creation date of the collection in the format yyyy-MM-dd’T’HH:mmcon:ss.SSS’Z’.
Attr datetime updated:
 (optional) The timestamp of when the collection was last updated in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’.
Attr str status:
 (optional) The status of the collection.
Attr str configuration_id:
 (optional) The unique identifier of the collection’s configuration.
Attr str language:
 (optional) The language of the documents stored in the collection. Permitted values include en_us (U.S. English), de (German), and es (Spanish).
Attr DocumentCounts document_counts:
 (optional) The object providing information about the documents in the collection. Present only when retrieving details of a collection.
Attr CollectionDiskUsage disk_usage:
 (optional) The object providing information about the disk usage of the collection. Present only when retrieving details of a collection.
Attr TrainingStatus training_status:
 (optional) Provides information about the status of relevance training for collection.
class CollectionDiskUsage(used_bytes=None)[source]

Bases: object

Summary of the disk usage statistics for this collection.

Attr int used_bytes:
 (optional) Number of bytes used by the collection.
class Configuration(name, configuration_id=None, created=None, updated=None, description=None, conversions=None, enrichments=None, normalizations=None)[source]

Bases: object

A custom configuration for the environment.

Attr str configuration_id:
 (optional) The unique identifier of the configuration.
Attr str name:The name of the configuration.
Attr datetime created:
 (optional) The creation date of the configuration in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’.
Attr datetime updated:
 (optional) The timestamp of when the configuration was last updated in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’.
Attr str description:
 (optional) The description of the configuration, if available.
Attr Conversions conversions:
 (optional) The document conversion settings for the configuration.
Attr list[Enrichment] enrichments:
 (optional) An array of document enrichment settings for the configuration.
Attr list[NormalizationOperation] normalizations:
 (optional) Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
class Conversions(pdf=None, word=None, html=None, json_normalizations=None)[source]

Bases: object

Document conversion settings.

Attr PdfSettings pdf:
 (optional) A list of PDF conversion settings.
Attr WordSettings word:
 (optional) A list of Word conversion settings.
Attr HtmlSettings html:
 (optional) A list of HTML conversion settings.
Attr list[NormalizationOperation] json_normalizations:
 (optional) Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
class DeleteCollectionResponse(collection_id, status)[source]

Bases: object

DeleteCollectionResponse.

Attr str collection_id:
 The unique identifier of the collection that is being deleted.
Attr str status:
 The status of the collection. The status of a successful deletion operation is deleted.
class DeleteConfigurationResponse(configuration_id, status, notices=None)[source]

Bases: object

DeleteConfigurationResponse.

Attr str configuration_id:
 The unique identifier for the configuration.
Attr str status:
 Status of the configuration. A deleted configuration has the status deleted.
Attr list[Notice] notices:
 (optional) An array of notice messages, if any.
class DeleteDocumentResponse(document_id=None, status=None)[source]

Bases: object

DeleteDocumentResponse.

Attr str document_id:
 (optional) The unique identifier of the document.
Attr str status:
 (optional) Status of the document. A deleted document has the status deleted.
class DeleteEnvironmentResponse(environment_id, status)[source]

Bases: object

DeleteEnvironmentResponse.

Attr str environment_id:
 The unique identifier for the environment.
Attr str status:
 Status of the environment.
class DiskUsage(used_bytes=None, maximum_allowed_bytes=None, total_bytes=None, used=None, total=None, percent_used=None)[source]

Bases: object

Summary of the disk usage statistics for the environment.

Attr int used_bytes:
 (optional) Number of bytes used on the environment’s disk capacity.
Attr int maximum_allowed_bytes:
 (optional) Total number of bytes available in the environment’s disk capacity.
Attr int total_bytes:
 (optional) Deprecated: Total number of bytes available in the environment’s disk capacity.
Attr str used:(optional) Deprecated: Amount of disk capacity used, in KB or GB format.
Attr str total:(optional) Deprecated: Total amount of the environment’s disk capacity, in KB or GB format.
Attr float percent_used:
 (optional) Deprecated: Percentage of the environment’s disk capacity that is being used.
class DocumentAccepted(document_id=None, status=None, notices=None)[source]

Bases: object

DocumentAccepted.

Attr str document_id:
 (optional) The unique identifier of the ingested document.
Attr str status:
 (optional) Status of the document in the ingestion process.
Attr list[Notice] notices:
 (optional) Array of notices produced by the document-ingestion process.
class DocumentCounts(available=None, processing=None, failed=None)[source]

Bases: object

DocumentCounts.

Attr int available:
 (optional) The total number of available documents in the collection.
Attr int processing:
 (optional) The number of documents in the collection that are currently being processed.
Attr int failed:
 (optional) The number of documents in the collection that failed to be ingested.
class DocumentSnapshot(step=None, snapshot=None)[source]

Bases: object

DocumentSnapshot.

Attr str step:(optional)
Attr object snapshot:
 (optional)
class DocumentStatus(document_id, configuration_id, created, updated, status, status_description, notices, filename=None, file_type=None, sha1=None)[source]

Bases: object

Status information about a submitted document.

Attr str document_id:
 The unique identifier of the document.
Attr str configuration_id:
 The unique identifier for the configuration.
Attr datetime created:
 The creation date of the document in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’.
Attr datetime updated:
 Date of the most recent document update, in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’.
Attr str status:
 Status of the document in the ingestion process.
Attr str status_description:
 Description of the document status.
Attr str filename:
 (optional) Name of the original source file (if available).
Attr str file_type:
 (optional) The type of the original source file.
Attr str sha1:(optional) The SHA-1 hash of the original source file (formatted as a hexadecimal string).
Attr list[Notice] notices:
 Array of notices produced by the document-ingestion process.
class Enrichment(destination_field, source_field, enrichment_name, description=None, overwrite=None, ignore_downstream_errors=None, options=None)[source]

Bases: object

Enrichment.

Attr str description:
 (optional) Describes what the enrichment step does.
Attr str destination_field:
 Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if text is a top-level field with no sub-fields, text.foo is a valid destination but text.foo.bar is not.
Attr str source_field:
 Field to be enriched.
Attr bool overwrite:
 (optional) Indicates that the enrichments will overwrite the destination_field field if it already exists.
Attr str enrichment_name:
 Name of the enrichment service to call. Currently the only valid value is alchemy_language.
Attr bool ignore_downstream_errors:
 (optional) If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.
Attr EnrichmentOptions options:
 (optional) A list of options specific to the enrichment.
class EnrichmentOptions(extract=None, sentiment=None, quotations=None, show_source_text=None, hierarchical_typed_relations=None, model=None, language=None)[source]

Bases: object

Options which are specific to a particular enrichment.

Attr list[str] extract:
 (optional) A comma-separated list of analyses that will be applied when using the alchemy_language enrichment. See the service documentation for details on each extract option. Possible values include: * entity * keyword * taxonomy * concept * relation * doc-sentiment * doc-emotion * typed-rels.
Attr bool sentiment:
 (optional)
Attr bool quotations:
 (optional)
Attr bool show_source_text:
 (optional)
Attr bool hierarchical_typed_relations:
 (optional)
Attr str model:(optional) Required when using the typed-rel extract option. Should be set to the ID of a previously published custom Watson Knowledge Studio model.
Attr str language:
 (optional) If provided, then do not attempt to detect the language of the input document. Instead, assume the language is the one specified in this field. You can set this property to work around unsupported-text-language errors. Supported languages include English, German, French, Italian, Portuguese, Russian, Spanish and Swedish. Supported language codes are the ISO-639-1, ISO-639-2, ISO-639-3, and the plain english name of the language (for example “russian”).
class Environment(environment_id=None, name=None, description=None, created=None, updated=None, status=None, read_only=None, size=None, index_capacity=None)[source]

Bases: object

Details about an environment.

Attr str environment_id:
 (optional) Unique identifier for the environment.
Attr str name:(optional) Name that identifies the environment.
Attr str description:
 (optional) Description of the environment.
Attr datetime created:
 (optional) Creation date of the environment, in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’.
Attr datetime updated:
 (optional) Date of most recent environment update, in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’.
Attr str status:
 (optional) Status of the environment.
Attr bool read_only:
 (optional) If true, then the environment contains read-only collections which are maintained by IBM.
Attr int size:(optional) Deprecated: Size of the environment.
Attr IndexCapacity index_capacity:
 (optional) Details about the resource usage and capacity of the environment.
class EnvironmentDocuments(indexed=None, maximum_allowed=None)[source]

Bases: object

Summary of the document usage statistics for the environment.

Attr int indexed:
 (optional) Number of documents indexed for the environment.
Attr int maximum_allowed:
 (optional) Total number of documents allowed in the environment’s capacity.
class Field(field_name=None, field_type=None)[source]

Bases: object

Field.

Attr str field_name:
 (optional) The name of the field.
Attr str field_type:
 (optional) The type of the field.
class FontSetting(level=None, min_size=None, max_size=None, bold=None, italic=None, name=None)[source]

Bases: object

FontSetting.

Attr int level:(optional)
Attr int min_size:
 (optional)
Attr int max_size:
 (optional)
Attr bool bold:(optional)
Attr bool italic:
 (optional)
Attr str name:(optional)
class HtmlSettings(exclude_tags_completely=None, exclude_tags_keep_content=None, keep_content=None, exclude_content=None, keep_tag_attributes=None, exclude_tag_attributes=None)[source]

Bases: object

A list of HTML conversion settings.

Attr list[str] exclude_tags_completely:
 (optional)
Attr list[str] exclude_tags_keep_content:
 (optional)
Attr XPathPatterns keep_content:
 (optional)
Attr XPathPatterns exclude_content:
 (optional)
Attr list[str] keep_tag_attributes:
 (optional)
Attr list[str] exclude_tag_attributes:
 (optional)
class IndexCapacity(documents=None, disk_usage=None, memory_usage=None)[source]

Bases: object

Details about the resource usage and capacity of the environment.

Attr EnvironmentDocuments documents:
 (optional) Summary of the document usage statistics for the environment.
Attr DiskUsage disk_usage:
 (optional) Summary of the disk usage of the environment.
Attr MemoryUsage memory_usage:
 (optional) Deprecated: Summary of the memory usage of the environment.
class ListCollectionFieldsResponse(fields=None)[source]

Bases: object

The list of fetched fields. The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.

  • Fields which contain nested JSON objects are assigned a type of “nested”. *

Fields which belong to a nested object are prefixed with .properties (for example, warnings.properties.severity means that the warnings object has a property called severity). * Fields returned from the News collection are prefixed with v{N}-fullnews-t3-{YEAR}.mappings (for example, v5-fullnews-t3-2016.mappings.text.properties.author).

Attr list[Field] fields:
 (optional) An array containing information about each field in the collections.
class ListCollectionsResponse(collections=None)[source]

Bases: object

ListCollectionsResponse.

Attr list[Collection] collections:
 (optional) An array containing information about each collection in the environment.
class ListConfigurationsResponse(configurations=None)[source]

Bases: object

ListConfigurationsResponse.

Attr list[Configuration] configurations:
 (optional) An array of Configurations that are available for the service instance.
class ListEnvironmentsResponse(environments=None)[source]

Bases: object

ListEnvironmentsResponse.

Attr list[Environment] environments:
 (optional) An array of [environments] that are available for the service instance.
class MemoryUsage(used_bytes=None, total_bytes=None, used=None, total=None, percent_used=None)[source]

Bases: object

Deprecated: Summary of the memory usage statistics for this environment.

Attr int used_bytes:
 (optional) Deprecated: Number of bytes used in the environment’s memory capacity.
Attr int total_bytes:
 (optional) Deprecated: Total number of bytes available in the environment’s memory capacity.
Attr str used:(optional) Deprecated: Amount of memory capacity used, in KB or GB format.
Attr str total:(optional) Deprecated: Total amount of the environment’s memory capacity, in KB or GB format.
Attr float percent_used:
 (optional) Deprecated: Percentage of the environment’s memory capacity that is being used.
class NormalizationOperation(operation=None, source_field=None, destination_field=None)[source]

Bases: object

NormalizationOperation.

Attr str operation:
 (optional) Identifies what type of operation to perform. copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a _copy_ followed by a _remove_). merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This is ensures the type for destination_field is consistent across all documents. remove - Deletes the source_field field. The destination_field is ignored for this operation. remove_nulls - Removes all nested null (blank) leif values from the JSON tree. source_field and destination_field are ignored by this operation because _remove_nulls_ operates on the entire JSON tree. Typically, remove_nulls is invoked as the last normalization operation (if it is inoked at all, it can be time-expensive).
Attr str source_field:
 (optional) The source field for the operation.
Attr str destination_field:
 (optional) The destination field for the operation.
class Notice(notice_id=None, created=None, document_id=None, query_id=None, severity=None, step=None, description=None)[source]

Bases: object

A notice produced for the collection.

Attr str notice_id:
 (optional) Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action.
Attr datetime created:
 (optional) The creation date of the collection in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’.
Attr str document_id:
 (optional) Unique identifier of the document.
Attr str query_id:
 (optional) Unique identifier of the query used for relevance training.
Attr str severity:
 (optional) Severity level of the notice.
Attr str step:(optional) Ingestion or training step in which the notice occurred.
Attr str description:
 (optional) The description of the notice.
class PdfHeadingDetection(fonts=None)[source]

Bases: object

PdfHeadingDetection.

Attr list[FontSetting] fonts:
 (optional)
class PdfSettings(heading=None)[source]

Bases: object

A list of PDF conversion settings.

Attr PdfHeadingDetection heading:
 (optional)
class QueryAggregation(type=None, field=None, results=None, match=None, matching_results=None, aggregations=None)[source]

Bases: object

An aggregation produced by the Discovery service to analyze the input provided.

Attr str type:(optional) The type of aggregation command used. For example: term, filter, max, min, etc.
Attr str field:(optional) The field where the aggregation is located in the document.
Attr list[AggregationResult] results:
 (optional)
Attr str match:(optional) The match the aggregated results queried for.
Attr int matching_results:
 (optional) Number of matching results.
Attr list[QueryAggregation] aggregations:
 (optional) Aggregations returned by the Discovery service.
class QueryNoticesResponse(matching_results=None, results=None, aggregations=None, passages=None, duplicates_removed=None)[source]

Bases: object

QueryNoticesResponse.

Attr int matching_results:
 (optional)
Attr list[QueryNoticesResult] results:
 (optional)
Attr list[QueryAggregation] aggregations:
 (optional)
Attr list[QueryPassages] passages:
 (optional)
Attr int duplicates_removed:
 (optional)
class QueryNoticesResult(id=None, score=None, metadata=None, collection_id=None, **kwargs)[source]

Bases: object

QueryNoticesResult.

Attr str id:(optional) The unique identifier of the document.
Attr float score:
 (optional) The confidence score of the result’s analysis. Scores range from 0 to 1, with a higher score indicating greater confidence.
Attr object metadata:
 (optional) Metadata of the document.
Attr str collection_id:
 (optional) The collection ID of the collection containing the document for this result.
class QueryPassages(document_id=None, passage_score=None, passage_text=None, start_offset=None, end_offset=None, field=None)[source]

Bases: object

QueryPassages.

Attr str document_id:
 (optional) The unique identifier of the document from which the passage has been extracted.
Attr float passage_score:
 (optional) The confidence score of the passages’s analysis. A higher score indicates greater confidence.
Attr str passage_text:
 (optional) The content of the extracted passage.
Attr float start_offset:
 (optional) The position of the first character of the extracted passage in the originating field.
Attr float end_offset:
 (optional) The position of the last character of the extracted passage in the originating field.
Attr str field:(optional) The label of the field from which the passage has been extracted.
class QueryResponse(matching_results=None, results=None, aggregations=None, passages=None, duplicates_removed=None)[source]

Bases: object

A response containing the documents and aggregations for the query.

Attr int matching_results:
 (optional)
Attr list[QueryResult] results:
 (optional)
Attr list[QueryAggregation] aggregations:
 (optional)
Attr list[QueryPassages] passages:
 (optional)
Attr int duplicates_removed:
 (optional)
class QueryResult(id=None, score=None, metadata=None, collection_id=None, **kwargs)[source]

Bases: object

QueryResult.

Attr str id:(optional) The unique identifier of the document.
Attr float score:
 (optional) The confidence score of the result’s analysis. Scores range from 0 to 1, with a higher score indicating greater confidence.
Attr object metadata:
 (optional) Metadata of the document.
Attr str collection_id:
 (optional) The collection ID of the collection containing the document for this result.
class TestDocument(configuration_id=None, status=None, enriched_field_units=None, original_media_type=None, snapshots=None, notices=None)[source]

Bases: object

TestDocument.

Attr str configuration_id:
 (optional) The unique identifier for the configuration.
Attr str status:
 (optional) Status of the preview operation.
Attr int enriched_field_units:
 (optional) The number of 10-kB chunks of field data that were enriched. This can be used to estimate the cost of running a real ingestion.
Attr str original_media_type:
 (optional) Format of the test document.
Attr list[DocumentSnapshot] snapshots:
 (optional) An array of objects that describe each step in the preview process.
Attr list[Notice] notices:
 (optional) An array of notice messages about the preview operation.
class TrainingDataSet(environment_id=None, collection_id=None, queries=None)[source]

Bases: object

TrainingDataSet.

Attr str environment_id:
 (optional)
Attr str collection_id:
 (optional)
Attr list[TrainingQuery] queries:
 (optional)
class TrainingExample(document_id=None, cross_reference=None, relevance=None)[source]

Bases: object

TrainingExample.

Attr str document_id:
 (optional)
Attr str cross_reference:
 (optional)
Attr int relevance:
 (optional)
class TrainingExampleList(examples=None)[source]

Bases: object

TrainingExampleList.

Attr list[TrainingExample] examples:
 (optional)
class TrainingQuery(query_id=None, natural_language_query=None, filter=None, examples=None)[source]

Bases: object

TrainingQuery.

Attr str query_id:
 (optional)
Attr str natural_language_query:
 (optional)
Attr str filter:
 (optional)
Attr list[TrainingExample] examples:
 (optional)
class TrainingStatus(total_examples=None, available=None, processing=None, minimum_queries_added=None, minimum_examples_added=None, sufficient_label_diversity=None, notices=None, successfully_trained=None, data_updated=None)[source]

Bases: object

TrainingStatus.

Attr int total_examples:
 (optional)
Attr bool available:
 (optional)
Attr bool processing:
 (optional)
Attr bool minimum_queries_added:
 (optional)
Attr bool minimum_examples_added:
 (optional)
Attr bool sufficient_label_diversity:
 (optional)
Attr int notices:
 (optional)
Attr datetime successfully_trained:
 (optional)
Attr datetime data_updated:
 (optional)
class WordHeadingDetection(fonts=None, styles=None)[source]

Bases: object

WordHeadingDetection.

Attr list[FontSetting] fonts:
 (optional)
Attr list[WordStyle] styles:
 (optional)
class WordSettings(heading=None)[source]

Bases: object

A list of Word conversion settings.

Attr WordHeadingDetection heading:
 (optional)
class WordStyle(level=None, names=None)[source]

Bases: object

WordStyle.

Attr int level:(optional)
Attr list[str] names:
 (optional)
class XPathPatterns(xpaths=None)[source]

Bases: object

XPathPatterns.

Attr list[str] xpaths:
 (optional)