ibm_watson.discovery_v2 module¶

IBM Watson™ Discovery for IBM Cloud Pak for Data is a cognitive search and content analytics engine that you can add to applications to identify patterns, trends and actionable insights to drive better decision-making. Securely unify structured and unstructured data with pre-enriched content, and use a simplified query language to eliminate the need for manual filtering of results.

class DiscoveryV2(version, authenticator=None)[source]¶

Bases: ibm_cloud_sdk_core.base_service.BaseService

The Discovery V2 service.

default_service_url = None¶

list_collections(project_id, **kwargs)[source]¶

List collections.

Lists existing collections for the specified project.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

query(project_id, *, collection_ids=None, filter=None, query=None, natural_language_query=None, aggregation=None, count=None, return_=None, offset=None, sort=None, highlight=None, spelling_suggestions=None, table_results=None, suggested_refinements=None, passages=None, **kwargs)[source]¶

Query a project.

By using this method, you can construct queries. For details, see the [Discovery documentation](https://cloud.ibm.com/docs/services/discovery-data?topic=discovery-data-query-concepts).

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
collection_ids (list[str]) – (optional) A comma-separated list of collection IDs to be queried against.
filter (str) – (optional) A cacheable query that excludes documents that don’t mention the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set.
query (str) – (optional) A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results.
natural_language_query (str) – (optional) A natural language query that returns relevant documents by utilizing training data and natural language understanding.
aggregation (str) – (optional) An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference.
count (int) – (optional) Number of results to return.
return (list[str]) – (optional) A list of the fields in the document hierarchy to return. If this parameter not specified, then all top-level fields are returned.
offset (int) – (optional) The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results.
sort (str) – (optional) A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. This parameter cannot be used in the same query as the bias parameter.
highlight (bool) – (optional) When true, a highlight field is returned for each result which contains the fields which match the query with <em></em> tags around the matching query terms.
spelling_suggestions (bool) – (optional) When true and the natural_language_query parameter is used, the natural_language_query parameter is spell checked. The most likely correction is returned in the suggested_query field of the response (if one exists).
table_results (QueryLargeTableResults) – (optional) Configuration for table retrieval.
suggested_refinements (QueryLargeSuggestedRefinements) – (optional) Configuration for suggested refinements.
passages (QueryLargePassages) – (optional) Configuration for passage retrieval.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_autocompletion(project_id, prefix, *, collection_ids=None, field=None, count=None, **kwargs)[source]¶

Get Autocomplete Suggestions.

Returns completion query suggestions for the specified prefix.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
prefix (str) – The prefix to use for autocompletion. For example, the prefix Ho could autocomplete to Hot, Housing, or How do I upgrade. Possible completions are.
collection_ids (list[str]) – (optional) Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
field (str) – (optional) The field in the result documents that autocompletion suggestions are identified from.
count (int) – (optional) The number of autocompletion suggestions to return.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

query_notices(project_id, *, filter=None, query=None, natural_language_query=None, count=None, offset=None, **kwargs)[source]¶

Query system notices.

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
filter (str) – (optional) A cacheable query that excludes documents that don’t mention the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set.
query (str) – (optional) A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first.
natural_language_query (str) – (optional) A natural language query that returns relevant documents by utilizing training data and natural language understanding.
count (int) – (optional) Number of results to return. The maximum for the count and offset values together in any one query is 10000.
offset (int) – (optional) The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_fields(project_id, *, collection_ids=None, **kwargs)[source]¶

List fields.

Gets a list of the unique fields (and their types) stored in the the specified collections.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
collection_ids (list[str]) – (optional) Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_component_settings(project_id, **kwargs)[source]¶

Configuration settings for components.

Returns default configuration settings for components.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

add_document(project_id, collection_id, *, file=None, filename=None, file_content_type=None, metadata=None, x_watson_discovery_force=None, **kwargs)[source]¶

Add a document.

Add a document to a collection with optional metadata.

Returns immediately after the system has accepted the document for processing.

The user must provide document content, metadata, or both. If the request is

missing both document content and metadata, it is rejected.

The user can set the Content-Type parameter on the file part to

indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example, application/octet-stream), then the service attempts to automatically detect the document’s media type.

The following field names are reserved and will be filtered out if present

after normalization: id, score, highlight, and any field with the prefix of: _, +, or -

Fields with empty name values after normalization are filtered out before

indexing.

Fields containing the following characters after normalization are filtered

out before indexing: # and ,

If the document is uploaded to a collection that has it’s data shared with

another collection, the X-Watson-Discovery-Force header must be set to true.

Note: Documents can be added with a specific document_id by using the

_/v2/projects/{project_id}/collections/{collection_id}/documents method. Note: This operation only works on collections created to accept direct file uploads. It cannot be used to modify a collection that conects to an external source such as Microsoft SharePoint.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
collection_id (str) – The ID of the collection.
file (file) – (optional) The content of the document to ingest. The maximum supported file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected.
filename (str) – (optional) The filename for file.
file_content_type (str) – (optional) The content type of file.
metadata (str) –
(optional) The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

”Creator”: “Johnny Appleseed”, “Subject”: “Apples”

} ```.
x_watson_discovery_force (bool) – (optional) When true, the uploaded document is added to the collection even if the data for that collection is shared with other collections.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_document(project_id, collection_id, document_id, *, file=None, filename=None, file_content_type=None, metadata=None, x_watson_discovery_force=None, **kwargs)[source]¶

Update a document.

Replace an existing document or add a document with a specified document_id. Starts ingesting a document with optional metadata. If the document is uploaded to a collection that has it’s data shared with another collection, the X-Watson-Discovery-Force header must be set to true. Note: When uploading a new document with this method it automatically replaces any document stored with the same document_id if it exists. Note: This operation only works on collections created to accept direct file uploads. It cannot be used to modify a collection that conects to an external source such as Microsoft SharePoint.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
collection_id (str) – The ID of the collection.
document_id (str) – The ID of the document.
file (file) – (optional) The content of the document to ingest. The maximum supported file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected.
filename (str) – (optional) The filename for file.
file_content_type (str) – (optional) The content type of file.
metadata (str) –
(optional) The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

”Creator”: “Johnny Appleseed”, “Subject”: “Apples”

} ```.
x_watson_discovery_force (bool) – (optional) When true, the uploaded document is added to the collection even if the data for that collection is shared with other collections.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_document(project_id, collection_id, document_id, *, x_watson_discovery_force=None, **kwargs)[source]¶

Delete a document.

If the given document ID is invalid, or if the document is not found, then the a success response is returned (HTTP status code 200) with the status set to ‘deleted’. Note: This operation only works on collections created to accept direct file uploads. It cannot be used to modify a collection that conects to an external source such as Microsoft SharePoint.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
collection_id (str) – The ID of the collection.
document_id (str) – The ID of the document.
x_watson_discovery_force (bool) – (optional) When true, the uploaded document is added to the collection even if the data for that collection is shared with other collections.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_training_queries(project_id, **kwargs)[source]¶

List training queries.

List the training queries for the specified project.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_training_queries(project_id, **kwargs)[source]¶

Delete training queries.

Removes all training queries for the specified project.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_training_query(project_id, natural_language_query, examples, *, filter=None, **kwargs)[source]¶

Create training query.

Add a query to the training data for this project. The query can contain a filter and natural language query.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
natural_language_query (str) – The natural text query for the training query.
examples (list[TrainingExample]) – Array of training examples.
filter (str) – (optional) The filter used on the collection before the natural_language_query is applied.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_training_query(project_id, query_id, **kwargs)[source]¶

Get a training data query.

Get details for a specific training data query, including the query string and all examples.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
query_id (str) – The ID of the query used for training.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_training_query(project_id, query_id, natural_language_query, examples, *, filter=None, **kwargs)[source]¶

Update a training query.

Updates an existing training query and it’s examples.

Parameters

project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.
query_id (str) – The ID of the query used for training.
natural_language_query (str) – The natural text query for the training query.
examples (list[TrainingExample]) – Array of training examples.
filter (str) – (optional) The filter used on the collection before the natural_language_query is applied.
headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

class AddDocumentEnums[source]¶