ibm_watson.discovery_v1 module¶

The IBM Watson™ Discovery Service is a cognitive search and content analytics engine that you can add to applications to identify patterns, trends and actionable insights to drive better decision-making. Securely unify structured and unstructured data with pre-enriched content, and use a simplified query language to eliminate the need for manual filtering of results.

class DiscoveryV1(version, url='https://gateway.watsonplatform.net/discovery/api', username=None, password=None, iam_apikey=None, iam_access_token=None, iam_url=None)[source]¶

Bases: ibm_cloud_sdk_core.base_service.BaseService

The Discovery V1 service.

default_url = 'https://gateway.watsonplatform.net/discovery/api'¶

create_environment(name, description=None, size=None, **kwargs)[source]¶

Create an environment.

Creates a new environment for private data. An environment must be created before collections can be created. Note: You can create only one environment for private data per service instance. An attempt to create another environment results in an error.

Parameters

name (str) – Name that identifies the environment.
description (str) – Description of the environment.
size (str) – Size of the environment. In the Lite plan the default and only

accepted value is LT, in all other plans the default is S. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_environment(environment_id, **kwargs)[source]¶

Delete environment.

Parameters

environment_id (str) – The ID of the environment.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_environment(environment_id, **kwargs)[source]¶

Get environment info.

Parameters

environment_id (str) – The ID of the environment.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_environments(name=None, **kwargs)[source]¶

List environments.

List existing environments for the service instance.

Parameters

name (str) – Show only the environment with the given name.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_fields(environment_id, collection_ids, **kwargs)[source]¶

List fields across collections.

Gets a list of the unique fields (and their types) stored in the indexes of the specified collections.

Parameters

environment_id (str) – The ID of the environment.
collection_ids (list[str]) – A comma-separated list of collection IDs to be

queried against. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

update_environment(environment_id, name=None, description=None, size=None, **kwargs)[source]¶

Update an environment.

Updates an environment. The environment’s name and description parameters can be changed. You must specify a name for the environment.

Parameters

environment_id (str) – The ID of the environment.
name (str) – Name that identifies the environment.
description (str) – Description of the environment.
size (str) – Size that the environment should be increased to. Environment

size cannot be modified when using a Lite plan. Environment size can only increased and not decreased. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_configuration(environment_id, name, description=None, conversions=None, enrichments=None, normalizations=None, source=None, **kwargs)[source]¶

Add configuration.

Creates a new configuration. If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration. The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.

Parameters

environment_id (str) – The ID of the environment.
name (str) – The name of the configuration.
description (str) – The description of the configuration, if available.
conversions (Conversions) – Document conversion settings.
enrichments (list[Enrichment]) – An array of document enrichment settings for

the configuration. :param list[NormalizationOperation] normalizations: Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array. :param Source source: Object containing source parameters for the configuration. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_configuration(environment_id, configuration_id, **kwargs)[source]¶

Delete a configuration.

The deletion is performed unconditionally. A configuration deletion request succeeds even if the configuration is referenced by a collection or document ingestion. However, documents that have already been submitted for processing continue to use the deleted configuration. Documents are always processed with a snapshot of the configuration as it existed at the time the document was submitted.

Parameters

environment_id (str) – The ID of the environment.
configuration_id (str) – The ID of the configuration.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_configuration(environment_id, configuration_id, **kwargs)[source]¶

Get configuration details.

Parameters

environment_id (str) – The ID of the environment.
configuration_id (str) – The ID of the configuration.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_configurations(environment_id, name=None, **kwargs)[source]¶

List configurations.

Lists existing configurations for the service instance.

Parameters

environment_id (str) – The ID of the environment.
name (str) – Find configurations with the given name.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_configuration(environment_id, configuration_id, name, description=None, conversions=None, enrichments=None, normalizations=None, source=None, **kwargs)[source]¶

Update a configuration.

Replaces an existing configuration.

Completely replaces the original configuration.
The configuration_id, updated, and created fields are accepted in

the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties.

Documents are processed with a snapshot of the configuration as it was at the

time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.

Parameters

environment_id (str) – The ID of the environment.
configuration_id (str) – The ID of the configuration.
name (str) – The name of the configuration.
description (str) – The description of the configuration, if available.
conversions (Conversions) – Document conversion settings.
enrichments (list[Enrichment]) – An array of document enrichment settings for

the configuration. :param list[NormalizationOperation] normalizations: Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array. :param Source source: Object containing source parameters for the configuration. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

test_configuration_in_environment(environment_id, configuration=None, file=None, filename=None, file_content_type=None, metadata=None, step=None, configuration_id=None, **kwargs)[source]¶

Test configuration.

Runs a sample document through the default or your configuration and returns diagnostic information designed to help you understand how the document was processed. The document is not added to the index.

Parameters

environment_id (str) – The ID of the environment.
configuration (str) – The configuration to use to process the document. If

this part is provided, then the provided configuration is used to process the document. If the configuration_id is also provided (both are present at the same time), then request is rejected. The maximum supported configuration size is 1 MB. Configuration parts larger than 1 MB are rejected. See the GET /configurations/{configuration_id} operation for an example configuration. :param file file: The content of the document to ingest. The maximum supported file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected. :param str filename: The filename for file. :param str file_content_type: The content type of file. :param str metadata: If you’re using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

“Creator”: “Johnny Appleseed”, “Subject”: “Apples”

} ``. :param str step: Specify to only run the input document through the given step instead of running the input document through the entire ingestion workflow. Valid values are `convert, enrich, and normalize. :param str configuration_id: The ID of the configuration to use to process the document. If the configuration form part is also provided (both are present at the same time), then the request will be rejected. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_collection(environment_id, name, description=None, configuration_id=None, language=None, **kwargs)[source]¶

Create a collection.

Parameters

environment_id (str) – The ID of the environment.
name (str) – The name of the collection to be created.
description (str) – A description of the collection.
configuration_id (str) – The ID of the configuration in which the collection

is to be created. :param str language: The language of the documents stored in the collection, in the form of an ISO 639-1 language code. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_collection(environment_id, collection_id, **kwargs)[source]¶

Delete a collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_collection(environment_id, collection_id, **kwargs)[source]¶

Get collection details.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_collection_fields(environment_id, collection_id, **kwargs)[source]¶

List collection fields.

Gets a list of the unique fields (and their types) stored in the index.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_collections(environment_id, name=None, **kwargs)[source]¶

List collections.

Lists existing collections for the service instance.

Parameters

environment_id (str) – The ID of the environment.
name (str) – Find collections with the given name.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_collection(environment_id, collection_id, name, description=None, configuration_id=None, **kwargs)[source]¶

Update a collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
name (str) – The name of the collection.
description (str) – A description of the collection.
configuration_id (str) – The ID of the configuration in which the collection

is to be updated. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_expansions(environment_id, collection_id, expansions, **kwargs)[source]¶

Create or update expansion list.

Create or replace the Expansion list for this collection. The maximum number of expanded terms per collection is 500. The current expansion list is replaced with the uploaded content.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
expansions (list[Expansion]) – An array of query expansion definitions. Each object in the expansions array represents a term or set of terms that

will be expanded into other terms. Each expansion object can be configured as bidirectional or unidirectional. Bidirectional means that all terms are expanded to all other terms in the object. Unidirectional means that a set list of terms can be expanded into a second list of terms.

To create a bi-directional expansion specify an expanded_terms array. When

found in a query, all items in the expanded_terms array are then expanded to the other items in the same array.

To create a uni-directional expansion, specify both an array of input_terms

and an array of expanded_terms. When items in the input_terms array are present in a query, they are expanded using the items listed in the expanded_terms array. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_stopword_list(environment_id, collection_id, stopword_file, stopword_filename=None, **kwargs)[source]¶

Create stopword list.

Upload a custom stopword list to use with the specified collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
stopword_file (file) – The content of the stopword list to ingest.
stopword_filename (str) – The filename for stopword_file.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_tokenization_dictionary(environment_id, collection_id, tokenization_rules=None, **kwargs)[source]¶

Create tokenization dictionary.

Upload a custom tokenization dictionary to use with the specified collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
tokenization_rules (list[TokenDictRule]) – An array of tokenization rules.

Each rule contains, the original text string, component tokens, any alternate character set readings, and which part_of_speech the text is from. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_expansions(environment_id, collection_id, **kwargs)[source]¶

Delete the expansion list.

Remove the expansion information for this collection. The expansion list must be deleted to disable query expansion for a collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_stopword_list(environment_id, collection_id, **kwargs)[source]¶

Delete a custom stopword list.

Delete a custom stopword list from the collection. After a custom stopword list is deleted, the default list is used for the collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_tokenization_dictionary(environment_id, collection_id, **kwargs)[source]¶

Delete tokenization dictionary.

Delete the tokenization dictionary from the collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_stopword_list_status(environment_id, collection_id, **kwargs)[source]¶

Get stopword list status.

Returns the current status of the stopword list for the specified collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_tokenization_dictionary_status(environment_id, collection_id, **kwargs)[source]¶

Get tokenization dictionary status.

Returns the current status of the tokenization dictionary for the specified collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_expansions(environment_id, collection_id, **kwargs)[source]¶

Get the expansion list.

Returns the current expansion list for the specified collection. If an expansion list is not specified, an object with empty expansion arrays is returned.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

add_document(environment_id, collection_id, file=None, filename=None, file_content_type=None, metadata=None, **kwargs)[source]¶

Add a document.

Add a document to a collection with optional metadata.

The version query parameter is still required.
Returns immediately after the system has accepted the document for processing.
The user must provide document content, metadata, or both. If the request is

missing both document content and metadata, it is rejected.

The user can set the Content-Type parameter on the file part to

indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example, application/octet-stream), then the service attempts to automatically detect the document’s media type.

The following field names are reserved and will be filtered out if present

after normalization: id, score, highlight, and any field with the prefix of: _, +, or -

Fields with empty name values after normalization are filtered out before

indexing.

Fields containing the following characters after normalization are filtered

out before indexing: # and ,

Note: Documents can be added with a specific document_id by using the

_/v1/environments/{environment_id}/collections/{collection_id}/documents method.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
file (file) – The content of the document to ingest. The maximum supported

file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected. :param str filename: The filename for file. :param str file_content_type: The content type of file. :param str metadata: If you’re using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

“Creator”: “Johnny Appleseed”, “Subject”: “Apples”

} ``. :param dict headers: A `dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_document(environment_id, collection_id, document_id, **kwargs)[source]¶

Delete a document.

If the given document ID is invalid, or if the document is not found, then the a success response is returned (HTTP status code 200) with the status set to ‘deleted’.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
document_id (str) – The ID of the document.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_document_status(environment_id, collection_id, document_id, **kwargs)[source]¶

Get document details.

Fetch status details about a submitted document. Note: this operation does not return the document itself. Instead, it returns only the document’s processing status and any notices (warnings or errors) that were generated when the document was ingested. Use the query API to retrieve the actual document content.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
document_id (str) – The ID of the document.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_document(environment_id, collection_id, document_id, file=None, filename=None, file_content_type=None, metadata=None, **kwargs)[source]¶

Update a document.

Replace an existing document or add a document with a specified document_id. Starts ingesting a document with optional metadata. Note: When uploading a new document with this method it automatically replaces any document stored with the same document_id if it exists.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
document_id (str) – The ID of the document.
file (file) – The content of the document to ingest. The maximum supported

file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected. :param str filename: The filename for file. :param str file_content_type: The content type of file. :param str metadata: If you’re using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

“Creator”: “Johnny Appleseed”, “Subject”: “Apples”

} ``. :param dict headers: A `dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

federated_query(environment_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate=None, deduplicate_field=None, collection_ids=None, similar=None, similar_document_ids=None, similar_fields=None, bias=None, logging_opt_out=None, **kwargs)[source]¶

Long environment queries.

Complex queries might be too long for a standard method query. By using this method, you can construct longer queries. However, these queries may take longer to complete than the standard method. For details, see the [Discovery service documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-query-concepts#query-concepts).

Parameters

environment_id (str) – The ID of the environment.
filter (str) – A cacheable query that excludes documents that don’t mention

the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time. :param bool passages: A passages query that returns the most relevant passages from the results. :param str aggregation: An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference. :param int count: Number of results to return. :param str return_fields: A comma-separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. :param str sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. This parameter cannot be used in the same query as the bias parameter. :param bool highlight: When true, a highlight field is returned for each result which contains the fields which match the query with tags around the matching query terms. :param str passages_fields: A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included. :param int passages_count: The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100. :param int passages_characters: The approximate number of characters that any one passage will have. :param bool deduplicate: When true, and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only; offset is not considered. This parameter is currently Beta functionality. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param str collection_ids: A comma-separated list of collection IDs to be queried against. Required when querying multiple collections, invalid when performing a single collection query. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param str similar_document_ids: A comma-separated list of document IDs to find similar documents. Tip: Include the natural_language_query parameter to expand the scope of the document similarity search with the natural language query. Other query parameters, such as filter and query, are subsequently applied and reduce the scope. :param str similar_fields: A comma-separated list of field names that are used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param str bias: Field which the returned results will be biased against. The specified field must be either a date or number format. When a date type field is specified returned results are biased towards field values closer to the current date. When a number type field is specified, returned results are biased towards higher field values. This parameter cannot be used in the same query as the sort parameter. :param bool logging_opt_out: If true, queries are not stored in the Discovery Logs endpoint. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

federated_query_notices(environment_id, collection_ids, filter=None, query=None, natural_language_query=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, deduplicate_field=None, similar=None, similar_document_ids=None, similar_fields=None, **kwargs)[source]¶

Query multiple collection system notices.

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the [Discovery service documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-query-concepts#query-concepts) for more details on the query language.

Parameters

environment_id (str) – The ID of the environment.
collection_ids (list[str]) – A comma-separated list of collection IDs to be

queried against. :param str filter: A cacheable query that excludes documents that don’t mention the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time. :param str aggregation: An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference. :param int count: Number of results to return. The maximum for the count and offset values together in any one query is 10000. :param list[str] return_fields: A comma-separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000. :param list[str] sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param bool highlight: When true, a highlight field is returned for each result which contains the fields which match the query with tags around the matching query terms. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param list[str] similar_document_ids: A comma-separated list of document IDs to find similar documents. Tip: Include the natural_language_query parameter to expand the scope of the document similarity search with the natural language query. Other query parameters, such as filter and query, are subsequently applied and reduce the scope. :param list[str] similar_fields: A comma-separated list of field names that are used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

query(environment_id, collection_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate=None, deduplicate_field=None, collection_ids=None, similar=None, similar_document_ids=None, similar_fields=None, bias=None, logging_opt_out=None, **kwargs)[source]¶

Long collection queries.

Complex queries might be too long for a standard method query. By using this method, you can construct longer queries. However, these queries may take longer to complete than the standard method. For details, see the [Discovery service documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-query-concepts#query-concepts).

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
filter (str) – A cacheable query that excludes documents that don’t mention

the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time. :param bool passages: A passages query that returns the most relevant passages from the results. :param str aggregation: An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference. :param int count: Number of results to return. :param str return_fields: A comma-separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. :param str sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. This parameter cannot be used in the same query as the bias parameter. :param bool highlight: When true, a highlight field is returned for each result which contains the fields which match the query with tags around the matching query terms. :param str passages_fields: A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included. :param int passages_count: The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100. :param int passages_characters: The approximate number of characters that any one passage will have. :param bool deduplicate: When true, and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only; offset is not considered. This parameter is currently Beta functionality. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param str collection_ids: A comma-separated list of collection IDs to be queried against. Required when querying multiple collections, invalid when performing a single collection query. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param str similar_document_ids: A comma-separated list of document IDs to find similar documents. Tip: Include the natural_language_query parameter to expand the scope of the document similarity search with the natural language query. Other query parameters, such as filter and query, are subsequently applied and reduce the scope. :param str similar_fields: A comma-separated list of field names that are used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param str bias: Field which the returned results will be biased against. The specified field must be either a date or number format. When a date type field is specified returned results are biased towards field values closer to the current date. When a number type field is specified, returned results are biased towards higher field values. This parameter cannot be used in the same query as the sort parameter. :param bool logging_opt_out: If true, queries are not stored in the Discovery Logs endpoint. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

query_entities(environment_id, collection_id, feature=None, entity=None, context=None, count=None, evidence_count=None, **kwargs)[source]¶

Knowledge Graph entity query.

See the [Knowledge Graph documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-kg#kg) for more details.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
feature (str) – The entity query feature to perform. Supported features are

disambiguate and similar_entities. :param QueryEntitiesEntity entity: A text string that appears within the entity text field. :param QueryEntitiesContext context: Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England. :param int count: The number of results to return. The default is 10. The maximum is 1000. :param int evidence_count: The number of evidence items to return for each result. The default is 0. The maximum number of evidence items per query is 10,000. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

query_notices(environment_id, collection_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate_field=None, similar=None, similar_document_ids=None, similar_fields=None, **kwargs)[source]¶

Query system notices.

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the [Discovery service documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-query-concepts#query-concepts) for more details on the query language.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
filter (str) – A cacheable query that excludes documents that don’t mention

the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time. :param bool passages: A passages query that returns the most relevant passages from the results. :param str aggregation: An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference. :param int count: Number of results to return. The maximum for the count and offset values together in any one query is 10000. :param list[str] return_fields: A comma-separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000. :param list[str] sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param bool highlight: When true, a highlight field is returned for each result which contains the fields which match the query with tags around the matching query terms. :param list[str] passages_fields: A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included. :param int passages_count: The maximum number of passages to return. The search returns fewer passages if the requested total is not found. :param int passages_characters: The approximate number of characters that any one passage will have. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param list[str] similar_document_ids: A comma-separated list of document IDs to find similar documents. Tip: Include the natural_language_query parameter to expand the scope of the document similarity search with the natural language query. Other query parameters, such as filter and query, are subsequently applied and reduce the scope. :param list[str] similar_fields: A comma-separated list of field names that are used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

query_relations(environment_id, collection_id, entities=None, context=None, sort=None, filter=None, count=None, evidence_count=None, **kwargs)[source]¶

Knowledge Graph relationship query.

See the [Knowledge Graph documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-kg#kg) for more details.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
entities (list[QueryRelationsEntity]) – An array of entities to find

relationships for. :param QueryEntitiesContext context: Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England. :param str sort: The sorting method for the relationships, can be score or frequency. frequency is the number of unique times each entity is identified. The default is score. This parameter cannot be used in the same query as the bias parameter. :param QueryRelationsFilter filter: :param int count: The number of results to return. The default is 10. The maximum is 1000. :param int evidence_count: The number of evidence items to return for each result. The default is 0. The maximum number of evidence items per query is 10,000. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

add_training_data(environment_id, collection_id, natural_language_query=None, filter=None, examples=None, **kwargs)[source]¶

Add query to training data.

Adds a query to the training data for this collection. The query can contain a filter and natural language query.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
natural_language_query (str) – The natural text query for the new training

query. :param str filter: The filter used on the collection before the natural_language_query is applied. :param list[TrainingExample] examples: Array of training examples. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_training_example(environment_id, collection_id, query_id, document_id=None, cross_reference=None, relevance=None, **kwargs)[source]¶

Add example to training data query.

Adds a example to this training data query.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
query_id (str) – The ID of the query used for training.
document_id (str) – The document ID associated with this training example.
cross_reference (str) – The cross reference associated with this training

example. :param int relevance: The relevance of the training example. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_all_training_data(environment_id, collection_id, **kwargs)[source]¶

Delete all training data.

Deletes all training data from a collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_training_data(environment_id, collection_id, query_id, **kwargs)[source]¶

Delete a training data query.

Removes the training data query and all associated examples from the training data set.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
query_id (str) – The ID of the query used for training.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_training_example(environment_id, collection_id, query_id, example_id, **kwargs)[source]¶

Delete example for training data query.

Deletes the example document with the given ID from the training data query.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
query_id (str) – The ID of the query used for training.
example_id (str) – The ID of the document as it is indexed.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_training_data(environment_id, collection_id, query_id, **kwargs)[source]¶

Get details about a query.

Gets details for a specific training data query, including the query string and all examples.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
query_id (str) – The ID of the query used for training.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_training_example(environment_id, collection_id, query_id, example_id, **kwargs)[source]¶

Get details for training data example.

Gets the details for this training example.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
query_id (str) – The ID of the query used for training.
example_id (str) – The ID of the document as it is indexed.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_training_data(environment_id, collection_id, **kwargs)[source]¶

List training data.

Lists the training data for the specified collection.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_training_examples(environment_id, collection_id, query_id, **kwargs)[source]¶

List examples for a training data query.

List all examples for this training data query.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
query_id (str) – The ID of the query used for training.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_training_example(environment_id, collection_id, query_id, example_id, cross_reference=None, relevance=None, **kwargs)[source]¶

Change label or cross reference for example.

Changes the label or cross reference query for this training data example.

Parameters

environment_id (str) – The ID of the environment.
collection_id (str) – The ID of the collection.
query_id (str) – The ID of the query used for training.
example_id (str) – The ID of the document as it is indexed.
cross_reference (str) – The example to add.
relevance (int) – The relevance value for this example.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_user_data(customer_id, **kwargs)[source]¶

Delete labeled data.

Deletes all data associated with a specified customer ID. The method has no effect if no data is associated with the customer ID. You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes data. For more information about personal data and customer IDs, see [Information security](https://cloud.ibm.com/docs/services/discovery?topic=discovery-information-security#information-security).

Parameters

customer_id (str) – The customer ID for which all data is to be deleted.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_event(type, data, **kwargs)[source]¶

Create event.

The Events API can be used to create log entries that are associated with specific queries. For example, you can record which documents in the results set were “clicked” by a user and when that click occured.

Parameters

type (str) – The event type to be created.
data (EventData) – Query event data object.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_metrics_event_rate(start_time=None, end_time=None, result_type=None, **kwargs)[source]¶

Percentage of queries with an associated event.

The percentage of queries using the natural_language_query parameter that have a corresponding “click” event over a specified time window. This metric requires having integrated event tracking in your application using the Events API.

Parameters: start_time (datetime) – Metric is computed from data recorded after this

timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param datetime end_time: Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param str result_type: The type of result to consider when calculating the metric. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_metrics_query(start_time=None, end_time=None, result_type=None, **kwargs)[source]¶

Number of queries over time.

Total number of queries using the natural_language_query parameter over a specific time window.

Parameters: start_time (datetime) – Metric is computed from data recorded after this

timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param datetime end_time: Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param str result_type: The type of result to consider when calculating the metric. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_metrics_query_event(start_time=None, end_time=None, result_type=None, **kwargs)[source]¶

Number of queries with an event over time.

Total number of queries using the natural_language_query parameter that have a corresponding “click” event over a specified time window. This metric requires having integrated event tracking in your application using the Events API.

Parameters: start_time (datetime) – Metric is computed from data recorded after this

timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param datetime end_time: Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param str result_type: The type of result to consider when calculating the metric. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_metrics_query_no_results(start_time=None, end_time=None, result_type=None, **kwargs)[source]¶

Number of queries with no search results over time.

Total number of queries using the natural_language_query parameter that have no results returned over a specified time window.

Parameters: start_time (datetime) – Metric is computed from data recorded after this

timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param datetime end_time: Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param str result_type: The type of result to consider when calculating the metric. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_metrics_query_token_event(count=None, **kwargs)[source]¶

Most frequent query tokens with an event.

The most frequent query tokens parsed from the natural_language_query parameter and their corresponding “click” event rate within the recording period (queries and events are stored for 30 days). A query token is an individual word or unigram within the query string.

Parameters: count (int) – Number of results to return. The maximum for the count and

offset values together in any one query is 10000. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

query_log(filter=None, query=None, count=None, offset=None, sort=None, **kwargs)[source]¶

Search the query and event log.

Searches the query and event log to find query sessions that match the specified criteria. Searching the logs endpoint uses the standard Discovery query syntax for the parameters that are supported.

Parameters: filter (str) – A cacheable query that excludes documents that don’t mention

the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time. :param int count: Number of results to return. The maximum for the count and offset values together in any one query is 10000. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000. :param list[str] sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_credentials(environment_id, source_type=None, credential_details=None, **kwargs)[source]¶

Create credentials.

Creates a set of credentials to connect to a remote source. Created credentials are used in a configuration to associate a collection with the remote source. Note: All credentials are sent over an encrypted connection and encrypted at rest.

Parameters

environment_id (str) – The ID of the environment.
source_type (str) – The source that this credentials object connects to.

box indicates the credentials are used to connect an instance of Enterprise

Box. - salesforce indicates the credentials are used to connect to Salesforce. - sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online. - web_crawl indicates the credentials are used to perform a web crawl. = cloud_object_storage indicates the credentials are used to connect to an IBM Cloud Object Store. :param CredentialDetails credential_details: Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_credentials(environment_id, credential_id, **kwargs)[source]¶

Delete credentials.

Deletes a set of stored credentials from your Discovery instance.

Parameters

environment_id (str) – The ID of the environment.
credential_id (str) – The unique identifier for a set of source credentials.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_credentials(environment_id, credential_id, **kwargs)[source]¶

View Credentials.

Returns details about the specified credentials.: Note: Secure credential information such as a password or SSH key is never

returned and must be obtained from the source system.

Parameters

environment_id (str) – The ID of the environment.
credential_id (str) – The unique identifier for a set of source credentials.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_credentials(environment_id, **kwargs)[source]¶

List credentials.

List all the source credentials that have been created for this service instance.: Note: All credentials are sent over an encrypted connection and encrypted at

rest.

Parameters

environment_id (str) – The ID of the environment.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_credentials(environment_id, credential_id, source_type=None, credential_details=None, **kwargs)[source]¶

Update credentials.

Updates an existing set of source credentials. Note: All credentials are sent over an encrypted connection and encrypted at rest.

Parameters

environment_id (str) – The ID of the environment.
credential_id (str) – The unique identifier for a set of source credentials.
source_type (str) – The source that this credentials object connects to.

box indicates the credentials are used to connect an instance of Enterprise

Box. - salesforce indicates the credentials are used to connect to Salesforce. - sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online. - web_crawl indicates the credentials are used to perform a web crawl. = cloud_object_storage indicates the credentials are used to connect to an IBM Cloud Object Store. :param CredentialDetails credential_details: Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_gateway(environment_id, name=None, **kwargs)[source]¶

Create Gateway.

Create a gateway configuration to use with a remotely installed gateway.

Parameters

environment_id (str) – The ID of the environment.
name (str) – User-defined name.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_gateway(environment_id, gateway_id, **kwargs)[source]¶

Delete Gateway.

Delete the specified gateway configuration.

Parameters

environment_id (str) – The ID of the environment.
gateway_id (str) – The requested gateway ID.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_gateway(environment_id, gateway_id, **kwargs)[source]¶

List Gateway Details.

List information about the specified gateway.

Parameters

environment_id (str) – The ID of the environment.
gateway_id (str) – The requested gateway ID.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_gateways(environment_id, **kwargs)[source]¶

List Gateways.

List the currently configured gateways.

Parameters

environment_id (str) – The ID of the environment.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

class AggregationResult(key=None, matching_results=None, aggregations=None)[source]¶