ibm_watson.discovery_v1 module

IBM Watson™ Discovery is a cognitive search and content analytics engine that you can add to applications to identify patterns, trends and actionable insights to drive better decision-making. Securely unify structured and unstructured data with pre-enriched content, and use a simplified query language to eliminate the need for manual filtering of results.

class DiscoveryV1(version, url='https://gateway.watsonplatform.net/discovery/api', username=None, password=None, iam_apikey=None, iam_access_token=None, iam_url=None, iam_client_id=None, iam_client_secret=None, icp4d_access_token=None, icp4d_url=None, authentication_type=None)[source]

Bases: ibm_cloud_sdk_core.base_service.BaseService

The Discovery V1 service.

default_url = 'https://gateway.watsonplatform.net/discovery/api'
create_environment(name, description=None, size=None, **kwargs)[source]

Create an environment.

Creates a new environment for private data. An environment must be created before collections can be created. Note: You can create only one environment for private data per service instance. An attempt to create another environment results in an error.

Parameters
  • name (str) – Name that identifies the environment.

  • description (str) – Description of the environment.

  • size (str) – Size of the environment. In the Lite plan the default and only

accepted value is LT, in all other plans the default is S. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

list_environments(name=None, **kwargs)[source]

List environments.

List existing environments for the service instance.

Parameters
  • name (str) – Show only the environment with the given name.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_environment(environment_id, **kwargs)[source]

Get environment info.

Parameters
  • environment_id (str) – The ID of the environment.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_environment(environment_id, name=None, description=None, size=None, **kwargs)[source]

Update an environment.

Updates an environment. The environment’s name and description parameters can be changed. You must specify a name for the environment.

Parameters
  • environment_id (str) – The ID of the environment.

  • name (str) – Name that identifies the environment.

  • description (str) – Description of the environment.

  • size (str) – Size that the environment should be increased to. Environment

size cannot be modified when using a Lite plan. Environment size can only increased and not decreased. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_environment(environment_id, **kwargs)[source]

Delete environment.

Parameters
  • environment_id (str) – The ID of the environment.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_fields(environment_id, collection_ids, **kwargs)[source]

List fields across collections.

Gets a list of the unique fields (and their types) stored in the indexes of the specified collections.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_ids (list[str]) – A comma-separated list of collection IDs to be

queried against. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_configuration(environment_id, name, description=None, conversions=None, enrichments=None, normalizations=None, source=None, **kwargs)[source]

Add configuration.

Creates a new configuration. If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration. The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.

Parameters
  • environment_id (str) – The ID of the environment.

  • name (str) – The name of the configuration.

  • description (str) – The description of the configuration, if available.

  • conversions (Conversions) – Document conversion settings.

  • enrichments (list[Enrichment]) – An array of document enrichment settings for

the configuration. :param list[NormalizationOperation] normalizations: Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array. :param Source source: Object containing source parameters for the configuration. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

list_configurations(environment_id, name=None, **kwargs)[source]

List configurations.

Lists existing configurations for the service instance.

Parameters
  • environment_id (str) – The ID of the environment.

  • name (str) – Find configurations with the given name.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_configuration(environment_id, configuration_id, **kwargs)[source]

Get configuration details.

Parameters
  • environment_id (str) – The ID of the environment.

  • configuration_id (str) – The ID of the configuration.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_configuration(environment_id, configuration_id, name, description=None, conversions=None, enrichments=None, normalizations=None, source=None, **kwargs)[source]

Update a configuration.

Replaces an existing configuration.
  • Completely replaces the original configuration.

  • The configuration_id, updated, and created fields are accepted in

the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties.

  • Documents are processed with a snapshot of the configuration as it was at the

time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.

Parameters
  • environment_id (str) – The ID of the environment.

  • configuration_id (str) – The ID of the configuration.

  • name (str) – The name of the configuration.

  • description (str) – The description of the configuration, if available.

  • conversions (Conversions) – Document conversion settings.

  • enrichments (list[Enrichment]) – An array of document enrichment settings for

the configuration. :param list[NormalizationOperation] normalizations: Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array. :param Source source: Object containing source parameters for the configuration. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_configuration(environment_id, configuration_id, **kwargs)[source]

Delete a configuration.

The deletion is performed unconditionally. A configuration deletion request succeeds even if the configuration is referenced by a collection or document ingestion. However, documents that have already been submitted for processing continue to use the deleted configuration. Documents are always processed with a snapshot of the configuration as it existed at the time the document was submitted.

Parameters
  • environment_id (str) – The ID of the environment.

  • configuration_id (str) – The ID of the configuration.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

test_configuration_in_environment(environment_id, configuration=None, file=None, filename=None, file_content_type=None, metadata=None, step=None, configuration_id=None, **kwargs)[source]

Test configuration.

Deprecated This method is no longer supported and is scheduled to be removed from service on July 31st 2019.

Runs a sample document through the default or your configuration and returns

diagnostic information designed to help you understand how the document was processed. The document is not added to the index.

Parameters
  • environment_id (str) – The ID of the environment.

  • configuration (str) – The configuration to use to process the document. If

this part is provided, then the provided configuration is used to process the document. If the configuration_id is also provided (both are present at the same time), then request is rejected. The maximum supported configuration size is 1 MB. Configuration parts larger than 1 MB are rejected. See the GET /configurations/{configuration_id} operation for an example configuration. :param file file: The content of the document to ingest. The maximum supported file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected. :param str filename: The filename for file. :param str file_content_type: The content type of file. :param str metadata: The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

“Creator”: “Johnny Appleseed”, “Subject”: “Apples”

} ``. :param str step: Specify to only run the input document through the given step instead of running the input document through the entire ingestion workflow. Valid values are `convert, enrich, and normalize. :param str configuration_id: The ID of the configuration to use to process the document. If the configuration form part is also provided (both are present at the same time), then the request will be rejected. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

create_collection(environment_id, name, description=None, configuration_id=None, language=None, **kwargs)[source]

Create a collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • name (str) – The name of the collection to be created.

  • description (str) – A description of the collection.

  • configuration_id (str) – The ID of the configuration in which the collection

is to be created. :param str language: The language of the documents stored in the collection, in the form of an ISO 639-1 language code. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

list_collections(environment_id, name=None, **kwargs)[source]

List collections.

Lists existing collections for the service instance.

Parameters
  • environment_id (str) – The ID of the environment.

  • name (str) – Find collections with the given name.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_collection(environment_id, collection_id, **kwargs)[source]

Get collection details.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_collection(environment_id, collection_id, name, description=None, configuration_id=None, **kwargs)[source]

Update a collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • name (str) – The name of the collection.

  • description (str) – A description of the collection.

  • configuration_id (str) – The ID of the configuration in which the collection

is to be updated. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_collection(environment_id, collection_id, **kwargs)[source]

Delete a collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_collection_fields(environment_id, collection_id, **kwargs)[source]

List collection fields.

Gets a list of the unique fields (and their types) stored in the index.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_expansions(environment_id, collection_id, **kwargs)[source]

Get the expansion list.

Returns the current expansion list for the specified collection. If an expansion list is not specified, an object with empty expansion arrays is returned.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_expansions(environment_id, collection_id, expansions, **kwargs)[source]

Create or update expansion list.

Create or replace the Expansion list for this collection. The maximum number of expanded terms per collection is 500. The current expansion list is replaced with the uploaded content.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • expansions (list[Expansion]) – An array of query expansion definitions. Each object in the expansions array represents a term or set of terms that

will be expanded into other terms. Each expansion object can be configured as bidirectional or unidirectional. Bidirectional means that all terms are expanded to all other terms in the object. Unidirectional means that a set list of terms can be expanded into a second list of terms.

To create a bi-directional expansion specify an expanded_terms array. When

found in a query, all items in the expanded_terms array are then expanded to the other items in the same array.

To create a uni-directional expansion, specify both an array of input_terms

and an array of expanded_terms. When items in the input_terms array are present in a query, they are expanded using the items listed in the expanded_terms array. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_expansions(environment_id, collection_id, **kwargs)[source]

Delete the expansion list.

Remove the expansion information for this collection. The expansion list must be deleted to disable query expansion for a collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_tokenization_dictionary_status(environment_id, collection_id, **kwargs)[source]

Get tokenization dictionary status.

Returns the current status of the tokenization dictionary for the specified collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_tokenization_dictionary(environment_id, collection_id, tokenization_rules=None, **kwargs)[source]

Create tokenization dictionary.

Upload a custom tokenization dictionary to use with the specified collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • tokenization_rules (list[TokenDictRule]) – An array of tokenization rules.

Each rule contains, the original text string, component tokens, any alternate character set readings, and which part_of_speech the text is from. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_tokenization_dictionary(environment_id, collection_id, **kwargs)[source]

Delete tokenization dictionary.

Delete the tokenization dictionary from the collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_stopword_list_status(environment_id, collection_id, **kwargs)[source]

Get stopword list status.

Returns the current status of the stopword list for the specified collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_stopword_list(environment_id, collection_id, stopword_file, stopword_filename=None, **kwargs)[source]

Create stopword list.

Upload a custom stopword list to use with the specified collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • stopword_file (file) – The content of the stopword list to ingest.

  • stopword_filename (str) – The filename for stopword_file.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_stopword_list(environment_id, collection_id, **kwargs)[source]

Delete a custom stopword list.

Delete a custom stopword list from the collection. After a custom stopword list is deleted, the default list is used for the collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

add_document(environment_id, collection_id, file=None, filename=None, file_content_type=None, metadata=None, **kwargs)[source]

Add a document.

Add a document to a collection with optional metadata.
  • The version query parameter is still required.

  • Returns immediately after the system has accepted the document for processing.

  • The user must provide document content, metadata, or both. If the request is

missing both document content and metadata, it is rejected.
  • The user can set the Content-Type parameter on the file part to

indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example, application/octet-stream), then the service attempts to automatically detect the document’s media type.

  • The following field names are reserved and will be filtered out if present

after normalization: id, score, highlight, and any field with the prefix of: _, +, or -

  • Fields with empty name values after normalization are filtered out before

indexing.
  • Fields containing the following characters after normalization are filtered

out before indexing: # and ,

Note: Documents can be added with a specific document_id by using the

_/v1/environments/{environment_id}/collections/{collection_id}/documents method.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • file (file) – The content of the document to ingest. The maximum supported

file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected. :param str filename: The filename for file. :param str file_content_type: The content type of file. :param str metadata: The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

“Creator”: “Johnny Appleseed”, “Subject”: “Apples”

} ``. :param dict headers: A `dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_document_status(environment_id, collection_id, document_id, **kwargs)[source]

Get document details.

Fetch status details about a submitted document. Note: this operation does not return the document itself. Instead, it returns only the document’s processing status and any notices (warnings or errors) that were generated when the document was ingested. Use the query API to retrieve the actual document content.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • document_id (str) – The ID of the document.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_document(environment_id, collection_id, document_id, file=None, filename=None, file_content_type=None, metadata=None, **kwargs)[source]

Update a document.

Replace an existing document or add a document with a specified document_id. Starts ingesting a document with optional metadata. Note: When uploading a new document with this method it automatically replaces any document stored with the same document_id if it exists.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • document_id (str) – The ID of the document.

  • file (file) – The content of the document to ingest. The maximum supported

file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected. :param str filename: The filename for file. :param str file_content_type: The content type of file. :param str metadata: The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

“Creator”: “Johnny Appleseed”, “Subject”: “Apples”

} ``. :param dict headers: A `dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_document(environment_id, collection_id, document_id, **kwargs)[source]

Delete a document.

If the given document ID is invalid, or if the document is not found, then the a success response is returned (HTTP status code 200) with the status set to ‘deleted’.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • document_id (str) – The ID of the document.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

query(environment_id, collection_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate=None, deduplicate_field=None, collection_ids=None, similar=None, similar_document_ids=None, similar_fields=None, bias=None, logging_opt_out=None, **kwargs)[source]

Query a collection.

By using this method, you can construct long queries. For details, see the [Discovery documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-query-concepts#query-concepts).

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • filter (str) – A cacheable query that excludes documents that don’t mention

the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. :param bool passages: A passages query that returns the most relevant passages from the results. :param str aggregation: An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference. :param int count: Number of results to return. :param str return_fields: A comma-separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. :param str sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. This parameter cannot be used in the same query as the bias parameter. :param bool highlight: When true, a highlight field is returned for each result which contains the fields which match the query with <em></em> tags around the matching query terms. :param str passages_fields: A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included. :param int passages_count: The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100. :param int passages_characters: The approximate number of characters that any one passage will have. :param bool deduplicate: When true, and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only; offset is not considered. This parameter is currently Beta functionality. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param str collection_ids: A comma-separated list of collection IDs to be queried against. Required when querying multiple collections, invalid when performing a single collection query. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param str similar_document_ids: A comma-separated list of document IDs to find similar documents. Tip: Include the natural_language_query parameter to expand the scope of the document similarity search with the natural language query. Other query parameters, such as filter and query, are subsequently applied and reduce the scope. :param str similar_fields: A comma-separated list of field names that are used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param str bias: Field which the returned results will be biased against. The specified field must be either a date or number format. When a date type field is specified returned results are biased towards field values closer to the current date. When a number type field is specified, returned results are biased towards higher field values. This parameter cannot be used in the same query as the sort parameter. :param bool logging_opt_out: If true, queries are not stored in the Discovery Logs endpoint. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

query_notices(environment_id, collection_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate_field=None, similar=None, similar_document_ids=None, similar_fields=None, **kwargs)[source]

Query system notices.

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the [Discovery documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-query-concepts#query-concepts) for more details on the query language.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • filter (str) – A cacheable query that excludes documents that don’t mention

the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. :param bool passages: A passages query that returns the most relevant passages from the results. :param str aggregation: An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference. :param int count: Number of results to return. The maximum for the count and offset values together in any one query is 10000. :param list[str] return_fields: A comma-separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000. :param list[str] sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param bool highlight: When true, a highlight field is returned for each result which contains the fields which match the query with <em></em> tags around the matching query terms. :param list[str] passages_fields: A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included. :param int passages_count: The maximum number of passages to return. The search returns fewer passages if the requested total is not found. :param int passages_characters: The approximate number of characters that any one passage will have. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param list[str] similar_document_ids: A comma-separated list of document IDs to find similar documents. Tip: Include the natural_language_query parameter to expand the scope of the document similarity search with the natural language query. Other query parameters, such as filter and query, are subsequently applied and reduce the scope. :param list[str] similar_fields: A comma-separated list of field names that are used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

federated_query(environment_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate=None, deduplicate_field=None, collection_ids=None, similar=None, similar_document_ids=None, similar_fields=None, bias=None, logging_opt_out=None, **kwargs)[source]

Query multiple collections.

By using this method, you can construct long queries that search multiple collection. For details, see the [Discovery documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-query-concepts#query-concepts).

Parameters
  • environment_id (str) – The ID of the environment.

  • filter (str) – A cacheable query that excludes documents that don’t mention

the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. :param bool passages: A passages query that returns the most relevant passages from the results. :param str aggregation: An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference. :param int count: Number of results to return. :param str return_fields: A comma-separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. :param str sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. This parameter cannot be used in the same query as the bias parameter. :param bool highlight: When true, a highlight field is returned for each result which contains the fields which match the query with <em></em> tags around the matching query terms. :param str passages_fields: A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included. :param int passages_count: The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100. :param int passages_characters: The approximate number of characters that any one passage will have. :param bool deduplicate: When true, and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only; offset is not considered. This parameter is currently Beta functionality. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param str collection_ids: A comma-separated list of collection IDs to be queried against. Required when querying multiple collections, invalid when performing a single collection query. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param str similar_document_ids: A comma-separated list of document IDs to find similar documents. Tip: Include the natural_language_query parameter to expand the scope of the document similarity search with the natural language query. Other query parameters, such as filter and query, are subsequently applied and reduce the scope. :param str similar_fields: A comma-separated list of field names that are used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param str bias: Field which the returned results will be biased against. The specified field must be either a date or number format. When a date type field is specified returned results are biased towards field values closer to the current date. When a number type field is specified, returned results are biased towards higher field values. This parameter cannot be used in the same query as the sort parameter. :param bool logging_opt_out: If true, queries are not stored in the Discovery Logs endpoint. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

federated_query_notices(environment_id, collection_ids, filter=None, query=None, natural_language_query=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, deduplicate_field=None, similar=None, similar_document_ids=None, similar_fields=None, **kwargs)[source]

Query multiple collection system notices.

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the [Discovery documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-query-concepts#query-concepts) for more details on the query language.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_ids (list[str]) – A comma-separated list of collection IDs to be

queried against. :param str filter: A cacheable query that excludes documents that don’t mention the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. :param str aggregation: An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference. :param int count: Number of results to return. The maximum for the count and offset values together in any one query is 10000. :param list[str] return_fields: A comma-separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000. :param list[str] sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param bool highlight: When true, a highlight field is returned for each result which contains the fields which match the query with <em></em> tags around the matching query terms. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param list[str] similar_document_ids: A comma-separated list of document IDs to find similar documents. Tip: Include the natural_language_query parameter to expand the scope of the document similarity search with the natural language query. Other query parameters, such as filter and query, are subsequently applied and reduce the scope. :param list[str] similar_fields: A comma-separated list of field names that are used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

query_entities(environment_id, collection_id, feature=None, entity=None, context=None, count=None, evidence_count=None, **kwargs)[source]

Knowledge Graph entity query.

See the [Knowledge Graph documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-kg#kg) for more details.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • feature (str) – The entity query feature to perform. Supported features are

disambiguate and similar_entities. :param QueryEntitiesEntity entity: A text string that appears within the entity text field. :param QueryEntitiesContext context: Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England. :param int count: The number of results to return. The default is 10. The maximum is 1000. :param int evidence_count: The number of evidence items to return for each result. The default is 0. The maximum number of evidence items per query is 10,000. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

query_relations(environment_id, collection_id, entities=None, context=None, sort=None, filter=None, count=None, evidence_count=None, **kwargs)[source]

Knowledge Graph relationship query.

See the [Knowledge Graph documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-kg#kg) for more details.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • entities (list[QueryRelationsEntity]) – An array of entities to find

relationships for. :param QueryEntitiesContext context: Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England. :param str sort: The sorting method for the relationships, can be score or frequency. frequency is the number of unique times each entity is identified. The default is score. This parameter cannot be used in the same query as the bias parameter. :param QueryRelationsFilter filter: :param int count: The number of results to return. The default is 10. The maximum is 1000. :param int evidence_count: The number of evidence items to return for each result. The default is 0. The maximum number of evidence items per query is 10,000. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

list_training_data(environment_id, collection_id, **kwargs)[source]

List training data.

Lists the training data for the specified collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

add_training_data(environment_id, collection_id, natural_language_query=None, filter=None, examples=None, **kwargs)[source]

Add query to training data.

Adds a query to the training data for this collection. The query can contain a filter and natural language query.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • natural_language_query (str) – The natural text query for the new training

query. :param str filter: The filter used on the collection before the natural_language_query is applied. :param list[TrainingExample] examples: Array of training examples. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_all_training_data(environment_id, collection_id, **kwargs)[source]

Delete all training data.

Deletes all training data from a collection.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_training_data(environment_id, collection_id, query_id, **kwargs)[source]

Get details about a query.

Gets details for a specific training data query, including the query string and all examples.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • query_id (str) – The ID of the query used for training.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_training_data(environment_id, collection_id, query_id, **kwargs)[source]

Delete a training data query.

Removes the training data query and all associated examples from the training data set.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • query_id (str) – The ID of the query used for training.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_training_examples(environment_id, collection_id, query_id, **kwargs)[source]

List examples for a training data query.

List all examples for this training data query.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • query_id (str) – The ID of the query used for training.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_training_example(environment_id, collection_id, query_id, document_id=None, cross_reference=None, relevance=None, **kwargs)[source]

Add example to training data query.

Adds a example to this training data query.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • query_id (str) – The ID of the query used for training.

  • document_id (str) – The document ID associated with this training example.

  • cross_reference (str) – The cross reference associated with this training

example. :param int relevance: The relevance of the training example. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_training_example(environment_id, collection_id, query_id, example_id, **kwargs)[source]

Delete example for training data query.

Deletes the example document with the given ID from the training data query.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • query_id (str) – The ID of the query used for training.

  • example_id (str) – The ID of the document as it is indexed.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_training_example(environment_id, collection_id, query_id, example_id, cross_reference=None, relevance=None, **kwargs)[source]

Change label or cross reference for example.

Changes the label or cross reference query for this training data example.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • query_id (str) – The ID of the query used for training.

  • example_id (str) – The ID of the document as it is indexed.

  • cross_reference (str) – The example to add.

  • relevance (int) – The relevance value for this example.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_training_example(environment_id, collection_id, query_id, example_id, **kwargs)[source]

Get details for training data example.

Gets the details for this training example.

Parameters
  • environment_id (str) – The ID of the environment.

  • collection_id (str) – The ID of the collection.

  • query_id (str) – The ID of the query used for training.

  • example_id (str) – The ID of the document as it is indexed.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_user_data(customer_id, **kwargs)[source]

Delete labeled data.

Deletes all data associated with a specified customer ID. The method has no effect if no data is associated with the customer ID. You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes data. For more information about personal data and customer IDs, see [Information security](https://cloud.ibm.com/docs/services/discovery?topic=discovery-information-security#information-security).

Parameters
  • customer_id (str) – The customer ID for which all data is to be deleted.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_event(type, data, **kwargs)[source]

Create event.

The Events API can be used to create log entries that are associated with specific queries. For example, you can record which documents in the results set were “clicked” by a user and when that click occured.

Parameters
  • type (str) – The event type to be created.

  • data (EventData) – Query event data object.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

query_log(filter=None, query=None, count=None, offset=None, sort=None, **kwargs)[source]

Search the query and event log.

Searches the query and event log to find query sessions that match the specified criteria. Searching the logs endpoint uses the standard Discovery query syntax for the parameters that are supported.

Parameters

filter (str) – A cacheable query that excludes documents that don’t mention

the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. :param int count: Number of results to return. The maximum for the count and offset values together in any one query is 10000. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000. :param list[str] sort: A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_metrics_query(start_time=None, end_time=None, result_type=None, **kwargs)[source]

Number of queries over time.

Total number of queries using the natural_language_query parameter over a specific time window.

Parameters

start_time (datetime) – Metric is computed from data recorded after this

timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param datetime end_time: Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param str result_type: The type of result to consider when calculating the metric. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_metrics_query_event(start_time=None, end_time=None, result_type=None, **kwargs)[source]

Number of queries with an event over time.

Total number of queries using the natural_language_query parameter that have a corresponding “click” event over a specified time window. This metric requires having integrated event tracking in your application using the Events API.

Parameters

start_time (datetime) – Metric is computed from data recorded after this

timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param datetime end_time: Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param str result_type: The type of result to consider when calculating the metric. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_metrics_query_no_results(start_time=None, end_time=None, result_type=None, **kwargs)[source]

Number of queries with no search results over time.

Total number of queries using the natural_language_query parameter that have no results returned over a specified time window.

Parameters

start_time (datetime) – Metric is computed from data recorded after this

timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param datetime end_time: Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param str result_type: The type of result to consider when calculating the metric. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_metrics_event_rate(start_time=None, end_time=None, result_type=None, **kwargs)[source]

Percentage of queries with an associated event.

The percentage of queries using the natural_language_query parameter that have a corresponding “click” event over a specified time window. This metric requires having integrated event tracking in your application using the Events API.

Parameters

start_time (datetime) – Metric is computed from data recorded after this

timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param datetime end_time: Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format. :param str result_type: The type of result to consider when calculating the metric. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_metrics_query_token_event(count=None, **kwargs)[source]

Most frequent query tokens with an event.

The most frequent query tokens parsed from the natural_language_query parameter and their corresponding “click” event rate within the recording period (queries and events are stored for 30 days). A query token is an individual word or unigram within the query string.

Parameters

count (int) – Number of results to return. The maximum for the count and

offset values together in any one query is 10000. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

list_credentials(environment_id, **kwargs)[source]

List credentials.

List all the source credentials that have been created for this service instance.

Note: All credentials are sent over an encrypted connection and encrypted at

rest.

Parameters
  • environment_id (str) – The ID of the environment.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_credentials(environment_id, source_type=None, credential_details=None, status=None, **kwargs)[source]

Create credentials.

Creates a set of credentials to connect to a remote source. Created credentials are used in a configuration to associate a collection with the remote source. Note: All credentials are sent over an encrypted connection and encrypted at rest.

Parameters
  • environment_id (str) – The ID of the environment.

  • source_type (str) – The source that this credentials object connects to.

  • box indicates the credentials are used to connect an instance of Enterprise

Box. - salesforce indicates the credentials are used to connect to Salesforce. - sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online. - web_crawl indicates the credentials are used to perform a web crawl. = cloud_object_storage indicates the credentials are used to connect to an IBM Cloud Object Store. :param CredentialDetails credential_details: Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source. :param str status: The current status of this set of credentials. connected indicates that the credentials are available to use with the source configuration of a collection. invalid refers to the credentials (for example, the password provided has expired) and must be corrected before they can be used with a collection. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

get_credentials(environment_id, credential_id, **kwargs)[source]

View Credentials.

Returns details about the specified credentials.

Note: Secure credential information such as a password or SSH key is never

returned and must be obtained from the source system.

Parameters
  • environment_id (str) – The ID of the environment.

  • credential_id (str) – The unique identifier for a set of source credentials.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_credentials(environment_id, credential_id, source_type=None, credential_details=None, status=None, **kwargs)[source]

Update credentials.

Updates an existing set of source credentials. Note: All credentials are sent over an encrypted connection and encrypted at rest.

Parameters
  • environment_id (str) – The ID of the environment.

  • credential_id (str) – The unique identifier for a set of source credentials.

  • source_type (str) – The source that this credentials object connects to.

  • box indicates the credentials are used to connect an instance of Enterprise

Box. - salesforce indicates the credentials are used to connect to Salesforce. - sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online. - web_crawl indicates the credentials are used to perform a web crawl. = cloud_object_storage indicates the credentials are used to connect to an IBM Cloud Object Store. :param CredentialDetails credential_details: Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source. :param str status: The current status of this set of credentials. connected indicates that the credentials are available to use with the source configuration of a collection. invalid refers to the credentials (for example, the password provided has expired) and must be corrected before they can be used with a collection. :param dict headers: A dict containing the request headers :return: A DetailedResponse containing the result, headers and HTTP status code. :rtype: DetailedResponse

delete_credentials(environment_id, credential_id, **kwargs)[source]

Delete credentials.

Deletes a set of stored credentials from your Discovery instance.

Parameters
  • environment_id (str) – The ID of the environment.

  • credential_id (str) – The unique identifier for a set of source credentials.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_gateways(environment_id, **kwargs)[source]

List Gateways.

List the currently configured gateways.

Parameters
  • environment_id (str) – The ID of the environment.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_gateway(environment_id, name=None, **kwargs)[source]

Create Gateway.

Create a gateway configuration to use with a remotely installed gateway.

Parameters
  • environment_id (str) – The ID of the environment.

  • name (str) – User-defined name.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_gateway(environment_id, gateway_id, **kwargs)[source]

List Gateway Details.

List information about the specified gateway.

Parameters
  • environment_id (str) – The ID of the environment.

  • gateway_id (str) – The requested gateway ID.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_gateway(environment_id, gateway_id, **kwargs)[source]

Delete Gateway.

Delete the specified gateway configuration.

Parameters
  • environment_id (str) – The ID of the environment.

  • gateway_id (str) – The requested gateway ID.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

class AggregationResult(key=None, matching_results=None, aggregations=None)[source]

Bases: object

AggregationResult.

Attr str key

(optional) Key that matched the aggregation type.

Attr int matching_results

(optional) Number of matching results.

Attr list[QueryAggregation] aggregations

(optional) Aggregations returned in the

case of chained aggregations.

class Calculation(type=None, results=None, matching_results=None, aggregations=None, field=None, value=None)[source]

Bases: object

Calculation.

Attr str field

(optional) The field where the aggregation is located in the

document. :attr float value: (optional) Value of the aggregation.

class Collection(collection_id=None, name=None, description=None, created=None, updated=None, status=None, configuration_id=None, language=None, document_counts=None, disk_usage=None, training_status=None, crawl_status=None, smart_document_understanding=None)[source]

Bases: object

A collection for storing documents.

Attr str collection_id

(optional) The unique identifier of the collection.

Attr str name

(optional) The name of the collection.

Attr str description

(optional) The description of the collection.

Attr datetime created

(optional) The creation date of the collection in the format

yyyy-MM-dd’T’HH:mmcon:ss.SSS’Z’. :attr datetime updated: (optional) The timestamp of when the collection was last updated in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr str status: (optional) The status of the collection. :attr str configuration_id: (optional) The unique identifier of the collection’s configuration. :attr str language: (optional) The language of the documents stored in the collection. Permitted values include en (English), de (German), and es (Spanish). :attr DocumentCounts document_counts: (optional) :attr CollectionDiskUsage disk_usage: (optional) Summary of the disk usage statistics for this collection. :attr TrainingStatus training_status: (optional) :attr CollectionCrawlStatus crawl_status: (optional) Object containing information about the crawl status of this collection. :attr SduStatus smart_document_understanding: (optional) Object containing smart document understanding information for this collection.

class CollectionCrawlStatus(source_crawl=None)[source]

Bases: object

Object containing information about the crawl status of this collection.

Attr SourceStatus source_crawl

(optional) Object containing source crawl status

information.

class CollectionDiskUsage(used_bytes=None)[source]

Bases: object

Summary of the disk usage statistics for this collection.

Attr int used_bytes

(optional) Number of bytes used by the collection.

class CollectionUsage(available=None, maximum_allowed=None)[source]

Bases: object

Summary of the collection usage in the environment.

Attr int available

(optional) Number of active collections in the environment.

Attr int maximum_allowed

(optional) Total number of collections allowed in the

environment.

class Configuration(name, configuration_id=None, created=None, updated=None, description=None, conversions=None, enrichments=None, normalizations=None, source=None)[source]

Bases: object

A custom configuration for the environment.

Attr str configuration_id

(optional) The unique identifier of the configuration.

Attr str name

The name of the configuration.

Attr datetime created

(optional) The creation date of the configuration in the

format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr datetime updated: (optional) The timestamp of when the configuration was last updated in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr str description: (optional) The description of the configuration, if available. :attr Conversions conversions: (optional) Document conversion settings. :attr list[Enrichment] enrichments: (optional) An array of document enrichment settings for the configuration. :attr list[NormalizationOperation] normalizations: (optional) Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array. :attr Source source: (optional) Object containing source parameters for the configuration.

class Conversions(pdf=None, word=None, html=None, segment=None, json_normalizations=None, image_text_recognition=None)[source]

Bases: object

Document conversion settings.

Attr PdfSettings pdf

(optional) A list of PDF conversion settings.

Attr WordSettings word

(optional) A list of Word conversion settings.

Attr HtmlSettings html

(optional) A list of HTML conversion settings.

Attr SegmentSettings segment

(optional) A list of Document Segmentation settings.

Attr list[NormalizationOperation] json_normalizations

(optional) Defines operations

that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array. :attr bool image_text_recognition: (optional) When true, automatic text extraction from images (this includes images embedded in supported document formats, for example PDF, and suppported image formats, for example TIFF) is performed on documents uploaded to the collection. This field is supported on Advanced and higher plans only. Lite plans do not support image text recognition.

class CreateEventResponse(type=None, data=None)[source]

Bases: object

An object defining the event being created.

Attr str type

(optional) The event type that was created.

Attr EventData data

(optional) Query event data object.

class CredentialDetails(credential_type=None, client_id=None, enterprise_id=None, url=None, username=None, organization_url=None, site_collection_path=None, client_secret=None, public_key_id=None, private_key=None, passphrase=None, password=None, gateway_id=None, source_version=None, web_application_url=None, domain=None, endpoint=None, access_key_id=None, secret_access_key=None)[source]

Bases: object

Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source.

Attr str credential_type

(optional) The authentication method for this credentials

definition. The credential_type specified must be supported by the source_type. The following combinations are possible: - “source_type”: “box” - valid credential_type`s: `oauth2 - “source_type”: “salesforce” - valid credential_type`s: `username_password - “source_type”: “sharepoint” - valid credential_type`s: `saml with source_version of online, or ntlm_v1 with source_version of 2016 - “source_type”: “web_crawl” - valid credential_type`s: `noauth or basic - “source_type”: “cloud_object_storage”` - valid credential_type`s: `aws4_hmac. :attr str client_id: (optional) The client_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. :attr str enterprise_id: (optional) The enterprise_id of the Box site that these credentials connect to. Only valid, and required, with a source_type of box. :attr str url: (optional) The url of the source that these credentials connect to. Only valid, and required, with a credential_type of username_password, noauth, and basic. :attr str username: (optional) The username of the source that these credentials connect to. Only valid, and required, with a credential_type of saml, username_password, basic, or ntlm_v1. :attr str organization_url: (optional) The organization_url of the source that these credentials connect to. Only valid, and required, with a credential_type of saml. :attr str site_collection_path: (optional) The site_collection.path of the source that these credentials connect to. Only valid, and required, with a source_type of sharepoint. :attr str client_secret: (optional) The client_secret of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials. :attr str public_key_id: (optional) The public_key_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials. :attr str private_key: (optional) The private_key of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials. :attr str passphrase: (optional) The passphrase of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials. :attr str password: (optional) The password of the source that these credentials connect to. Only valid, and required, with credential_type**s of `saml`, `username_password`, `basic`, or `ntlm_v1`. **Note: When used with a source_type of salesforce, the password consists of the Salesforce password and a valid Salesforce security token concatenated. This value is never returned and is only used when creating or modifying credentials. :attr str gateway_id: (optional) The ID of the gateway to be connected through (when connecting to intranet sites). Only valid with a credential_type of noauth, basic, or ntlm_v1. Gateways are created using the /v1/environments/{environment_id}/gateways methods. :attr str source_version: (optional) The type of Sharepoint repository to connect to. Only valid, and required, with a source_type of sharepoint. :attr str web_application_url: (optional) SharePoint OnPrem WebApplication URL. Only valid, and required, with a source_version of 2016. If a port is not supplied, the default to port 80 for http and port 443 for https connections are used. :attr str domain: (optional) The domain used to log in to your OnPrem SharePoint account. Only valid, and required, with a source_version of 2016. :attr str endpoint: (optional) The endpoint associated with the cloud object store that your are connecting to. Only valid, and required, with a credential_type of aws4_hmac. :attr str access_key_id: (optional) The access key ID associated with the cloud object store. Only valid, and required, with a credential_type of aws4_hmac. This value is never returned and is only used when creating or modifying credentials. For more infomation, see the [cloud object store documentation](https://cloud.ibm.com/docs/services/cloud-object-storage?topic=cloud-object-storage-using-hmac-credentials#using-hmac-credentials). :attr str secret_access_key: (optional) The secret access key associated with the cloud object store. Only valid, and required, with a credential_type of aws4_hmac. This value is never returned and is only used when creating or modifying credentials. For more infomation, see the [cloud object store documentation](https://cloud.ibm.com/docs/services/cloud-object-storage?topic=cloud-object-storage-using-hmac-credentials#using-hmac-credentials).

class Credentials(credential_id=None, source_type=None, credential_details=None, status=None)[source]

Bases: object

Object containing credential information.

Attr str credential_id

(optional) Unique identifier for this set of credentials.

Attr str source_type

(optional) The source that this credentials object connects to.

  • box indicates the credentials are used to connect an instance of Enterprise Box.

  • salesforce indicates the credentials are used to connect to Salesforce.

  • sharepoint indicates the credentials are used to connect to Microsoft SharePoint

Online. - web_crawl indicates the credentials are used to perform a web crawl. = cloud_object_storage indicates the credentials are used to connect to an IBM Cloud Object Store. :attr CredentialDetails credential_details: (optional) Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source. :attr str status: (optional) The current status of this set of credentials. connected indicates that the credentials are available to use with the source configuration of a collection. invalid refers to the credentials (for example, the password provided has expired) and must be corrected before they can be used with a collection.

class CredentialsList(credentials=None)[source]

Bases: object

CredentialsList.

Attr list[Credentials] credentials

(optional) An array of credential definitions

that were created for this instance.

class DeleteCollectionResponse(collection_id, status)[source]

Bases: object

DeleteCollectionResponse.

Attr str collection_id

The unique identifier of the collection that is being

deleted. :attr str status: The status of the collection. The status of a successful deletion operation is deleted.

class DeleteConfigurationResponse(configuration_id, status, notices=None)[source]

Bases: object

DeleteConfigurationResponse.

Attr str configuration_id

The unique identifier for the configuration.

Attr str status

Status of the configuration. A deleted configuration has the status

deleted. :attr list[Notice] notices: (optional) An array of notice messages, if any.

class DeleteCredentials(credential_id=None, status=None)[source]

Bases: object

Object returned after credentials are deleted.

Attr str credential_id

(optional) The unique identifier of the credentials that have

been deleted. :attr str status: (optional) The status of the deletion request.

class DeleteDocumentResponse(document_id=None, status=None)[source]

Bases: object

DeleteDocumentResponse.

Attr str document_id

(optional) The unique identifier of the document.

Attr str status

(optional) Status of the document. A deleted document has the status

deleted.

class DeleteEnvironmentResponse(environment_id, status)[source]

Bases: object

DeleteEnvironmentResponse.

Attr str environment_id

The unique identifier for the environment.

Attr str status

Status of the environment.

class DiskUsage(used_bytes=None, maximum_allowed_bytes=None)[source]

Bases: object

Summary of the disk usage statistics for the environment.

Attr int used_bytes

(optional) Number of bytes within the environment’s disk

capacity that are currently used to store data. :attr int maximum_allowed_bytes: (optional) Total number of bytes available in the environment’s disk capacity.

class DocumentAccepted(document_id=None, status=None, notices=None)[source]

Bases: object

DocumentAccepted.

Attr str document_id

(optional) The unique identifier of the ingested document.

Attr str status

(optional) Status of the document in the ingestion process. A status

of processing is returned for documents that are ingested with a version date before 2019-01-01. The pending status is returned for all others. :attr list[Notice] notices: (optional) Array of notices produced by the document-ingestion process.

class DocumentCounts(available=None, processing=None, failed=None, pending=None)[source]

Bases: object

DocumentCounts.

Attr int available

(optional) The total number of available documents in the

collection. :attr int processing: (optional) The number of documents in the collection that are currently being processed. :attr int failed: (optional) The number of documents in the collection that failed to be ingested. :attr int pending: (optional) The number of documents that have been uploaded to the collection, but have not yet started processing.

class DocumentSnapshot(step=None, snapshot=None)[source]

Bases: object

DocumentSnapshot.

Attr str step

(optional) The step in the document conversion process that the

snapshot object represents. :attr dict snapshot: (optional) Snapshot of the conversion.

class DocumentStatus(document_id, status, status_description, notices, configuration_id=None, filename=None, file_type=None, sha1=None)[source]

Bases: object

Status information about a submitted document.

Attr str document_id

The unique identifier of the document.

Attr str configuration_id

(optional) The unique identifier for the configuration.

Attr str status

Status of the document in the ingestion process.

Attr str status_description

Description of the document status.

Attr str filename

(optional) Name of the original source file (if available).

Attr str file_type

(optional) The type of the original source file.

Attr str sha1

(optional) The SHA-1 hash of the original source file (formatted as a

hexadecimal string). :attr list[Notice] notices: Array of notices produced by the document-ingestion process.

class Enrichment(destination_field, source_field, enrichment_name, description=None, overwrite=None, ignore_downstream_errors=None, options=None)[source]

Bases: object

Enrichment.

Attr str description

(optional) Describes what the enrichment step does.

Attr str destination_field

Field where enrichments will be stored. This field must

already exist or be at most 1 level deeper than an existing field. For example, if text is a top-level field with no sub-fields, text.foo is a valid destination but text.foo.bar is not. :attr str source_field: Field to be enriched. Arrays can be specified as the source_field if the enrichment service for this enrichment is set to natural_language_undstanding. :attr bool overwrite: (optional) Indicates that the enrichments will overwrite the destination_field field if it already exists. :attr str enrichment_name: Name of the enrichment service to call. Current options are natural_language_understanding and elements.

When using natual_language_understanding, the options object must contain

Natural Language Understanding options.

When using elements the options object must contain Element Classification

options. Additionally, when using the elements enrichment the configuration specified and files ingested must meet all the criteria specified in [the documentation](https://cloud.ibm.com/docs/services/discovery?topic=discovery-element-classification#element-classification). :attr bool ignore_downstream_errors: (optional) If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing. :attr EnrichmentOptions options: (optional) Options which are specific to a particular enrichment.

class EnrichmentOptions(features=None, language=None, model=None)[source]

Bases: object

Options which are specific to a particular enrichment.

Attr NluEnrichmentFeatures features

(optional)

Attr str language

(optional) ISO 639-1 code indicating the language to use for the

analysis. This code overrides the automatic language detection performed by the service. Valid codes are ar (Arabic), en (English), fr (French), de (German), it (Italian), pt (Portuguese), ru (Russian), es (Spanish), and sv (Swedish). Note: Not all features support all languages, automatic detection is recommended. :attr str model: (optional) For use with `elements` enrichments only. The element extraction model to use. Models available are: contract.

class Environment(environment_id=None, name=None, description=None, created=None, updated=None, status=None, read_only=None, size=None, requested_size=None, index_capacity=None, search_status=None)[source]

Bases: object

Details about an environment.

Attr str environment_id

(optional) Unique identifier for the environment.

Attr str name

(optional) Name that identifies the environment.

Attr str description

(optional) Description of the environment.

Attr datetime created

(optional) Creation date of the environment, in the format

yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr datetime updated: (optional) Date of most recent environment update, in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr str status: (optional) Current status of the environment. resizing is displayed when a request to increase the environment size has been made, but is still in the process of being completed. :attr bool read_only: (optional) If true, the environment contains read-only collections that are maintained by IBM. :attr str size: (optional) Current size of the environment. :attr str requested_size: (optional) The new size requested for this environment. Only returned when the environment status is resizing. Note: Querying and indexing can still be performed during an environment upsize. :attr IndexCapacity index_capacity: (optional) Details about the resource usage and capacity of the environment. :attr SearchStatus search_status: (optional) Information about the Continuous Relevancy Training for this environment.

class EnvironmentDocuments(indexed=None, maximum_allowed=None)[source]

Bases: object

Summary of the document usage statistics for the environment.

Attr int indexed

(optional) Number of documents indexed for the environment.

Attr int maximum_allowed

(optional) Total number of documents allowed in the

environment’s capacity.

class EventData(environment_id, session_token, collection_id, document_id, client_timestamp=None, display_rank=None, query_id=None)[source]

Bases: object

Query event data object.

Attr str environment_id

The environment_id associated with the query that the

event is associated with. :attr str session_token: The session token that was returned as part of the query results that this event is associated with. :attr datetime client_timestamp: (optional) The optional timestamp for the event that was created. If not provided, the time that the event was created in the log was used. :attr int display_rank: (optional) The rank of the result item which the event is associated with. :attr str collection_id: The collection_id of the document that this event is associated with. :attr str document_id: The document_id of the document that this event is associated with. :attr str query_id: (optional) The query identifier stored in the log. The query and any events associated with that query are stored with the same query_id.

class Expansion(expanded_terms, input_terms=None)[source]

Bases: object

An expansion definition. Each object respresents one set of expandable strings. For example, you could have expansions for the word hot in one object, and expansions for the word cold in another.

Attr list[str] input_terms

(optional) A list of terms that will be expanded for this

expansion. If specified, only the items in this list are expanded. :attr list[str] expanded_terms: A list of terms that this expansion will be expanded to. If specified without input_terms, it also functions as the input term list.

class Expansions(expansions)[source]

Bases: object

The query expansion definitions for the specified collection.

Attr list[Expansion] expansions

An array of query expansion definitions. Each object in the expansions array represents a term or set of terms that will

be expanded into other terms. Each expansion object can be configured as bidirectional or unidirectional. Bidirectional means that all terms are expanded to all other terms in the object. Unidirectional means that a set list of terms can be expanded into a second list of terms.

To create a bi-directional expansion specify an expanded_terms array. When found

in a query, all items in the expanded_terms array are then expanded to the other items in the same array.

To create a uni-directional expansion, specify both an array of input_terms and

an array of expanded_terms. When items in the input_terms array are present in a query, they are expanded using the items listed in the expanded_terms array.

class Field(field_name=None, field_type=None)[source]

Bases: object

Field.

Attr str field_name

(optional) The name of the field.

Attr str field_type

(optional) The type of the field.

class Filter(type=None, results=None, matching_results=None, aggregations=None, match=None)[source]

Bases: object

Filter.

Attr str match

(optional) The match the aggregated results queried for.

class FontSetting(level=None, min_size=None, max_size=None, bold=None, italic=None, name=None)[source]

Bases: object

FontSetting.

Attr int level

(optional) The HTML heading level that any content with the matching

font is converted to. :attr int min_size: (optional) The minimum size of the font to match. :attr int max_size: (optional) The maximum size of the font to match. :attr bool bold: (optional) When true, the font is matched if it is bold. :attr bool italic: (optional) When true, the font is matched if it is italic. :attr str name: (optional) The name of the font.

class Gateway(gateway_id=None, name=None, status=None, token=None, token_id=None)[source]

Bases: object

Object describing a specific gateway.

Attr str gateway_id

(optional) The gateway ID of the gateway.

Attr str name

(optional) The user defined name of the gateway.

Attr str status

(optional) The current status of the gateway. connected means the

gateway is connected to the remotly installed gateway. idle means this gateway is not currently in use. :attr str token: (optional) The generated token for this gateway. The value of this field is used when configuring the remotly installed gateway. :attr str token_id: (optional) The generated token_id for this gateway. The value of this field is used when configuring the remotly installed gateway.

class GatewayDelete(gateway_id=None, status=None)[source]

Bases: object

Gatway deletion confirmation.

Attr str gateway_id

(optional) The gateway ID of the deleted gateway.

Attr str status

(optional) The status of the request.

class GatewayList(gateways=None)[source]

Bases: object

Object containing gateways array.

Attr list[Gateway] gateways

(optional) Array of configured gateway connections.

class Histogram(type=None, results=None, matching_results=None, aggregations=None, field=None, interval=None)[source]

Bases: object

Histogram.

Attr str field

(optional) The field where the aggregation is located in the

document. :attr int interval: (optional) Interval of the aggregation. (For ‘histogram’ type).

class HtmlSettings(exclude_tags_completely=None, exclude_tags_keep_content=None, keep_content=None, exclude_content=None, keep_tag_attributes=None, exclude_tag_attributes=None)[source]

Bases: object

A list of HTML conversion settings.

Attr list[str] exclude_tags_completely

(optional) Array of HTML tags that are

excluded completely. :attr list[str] exclude_tags_keep_content: (optional) Array of HTML tags which are excluded but still retain content. :attr XPathPatterns keep_content: (optional) :attr XPathPatterns exclude_content: (optional) :attr list[str] keep_tag_attributes: (optional) An array of HTML tag attributes to keep in the converted document. :attr list[str] exclude_tag_attributes: (optional) Array of HTML tag attributes to exclude.

class IndexCapacity(documents=None, disk_usage=None, collections=None)[source]

Bases: object

Details about the resource usage and capacity of the environment.

Attr EnvironmentDocuments documents

(optional) Summary of the document usage

statistics for the environment. :attr DiskUsage disk_usage: (optional) Summary of the disk usage statistics for the environment. :attr CollectionUsage collections: (optional) Summary of the collection usage in the environment.

class ListCollectionFieldsResponse(fields=None)[source]

Bases: object

The list of fetched fields. The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.

  • Fields which contain nested JSON objects are assigned a type of “nested”.

  • Fields which belong to a nested object are prefixed with .properties (for

example, warnings.properties.severity means that the warnings object has a property called severity).

  • Fields returned from the News collection are prefixed with

v{N}-fullnews-t3-{YEAR}.mappings (for example, v5-fullnews-t3-2016.mappings.text.properties.author).

Attr list[Field] fields

(optional) An array containing information about each field

in the collections.

class ListCollectionsResponse(collections=None)[source]

Bases: object

ListCollectionsResponse.

Attr list[Collection] collections

(optional) An array containing information about

each collection in the environment.

class ListConfigurationsResponse(configurations=None)[source]

Bases: object

ListConfigurationsResponse.

Attr list[Configuration] configurations

(optional) An array of Configurations that

are available for the service instance.

class ListEnvironmentsResponse(environments=None)[source]

Bases: object

ListEnvironmentsResponse.

Attr list[Environment] environments

(optional) An array of [environments] that are

available for the service instance.

class LogQueryResponse(matching_results=None, results=None)[source]

Bases: object

Object containing results that match the requested logs query.

Attr int matching_results

(optional) Number of matching results.

Attr list[LogQueryResponseResult] results

(optional) Array of log query response

results.

class LogQueryResponseResult(environment_id=None, customer_id=None, document_type=None, natural_language_query=None, document_results=None, created_timestamp=None, client_timestamp=None, query_id=None, session_token=None, collection_id=None, display_rank=None, document_id=None, event_type=None, result_type=None)[source]

Bases: object

Individual result object for a logs query. Each object represents either a query to a Discovery collection or an event that is associated with a query.

Attr str environment_id

(optional) The environment ID that is associated with this

log entry. :attr str customer_id: (optional) The customer_id label that was specified in the header of the query or event API call that corresponds to this log entry. :attr str document_type: (optional) The type of log entry returned.

query indicates that the log represents the results of a call to the single

collection query method.

event indicates that the log represents a call to the events API.

Attr str natural_language_query

(optional) The value of the

natural_language_query query parameter that was used to create these results. Only returned with logs of type query. Note: Other query parameters (such as filter or deduplicate) might have been used with this query, but are not recorded. :attr LogQueryResponseResultDocuments document_results: (optional) Object containing result information that was returned by the query used to create this log entry. Only returned with logs of type query. :attr datetime created_timestamp: (optional) Date that the log result was created. Returned in YYYY-MM-DDThh:mm:ssZ format. :attr datetime client_timestamp: (optional) Date specified by the user when recording an event. Returned in YYYY-MM-DDThh:mm:ssZ format. Only returned with logs of type event. :attr str query_id: (optional) Identifier that corresponds to the natural_language_query string used in the original or associated query. All event and query log entries that have the same original natural_language_query string also have them same query_id. This field can be used to recall all event and query log results that have the same original query (event logs do not contain the original natural_language_query field). :attr str session_token: (optional) Unique identifier (within a 24-hour period) that identifies a single query log and any event logs that were created for it. Note: If the exact same query is run at the exact same time on different days, the session_token for those queries might be identical. However, the created_timestamp differs. Note: Session tokens are case sensitive. To avoid matching on session tokens that are identical except for case, use the exact match operator (::) when you query for a specific session token. :attr str collection_id: (optional) The collection ID of the document associated with this event. Only returned with logs of type event. :attr int display_rank: (optional) The original display rank of the document associated with this event. Only returned with logs of type event. :attr str document_id: (optional) The document ID of the document associated with this event. Only returned with logs of type event. :attr str event_type: (optional) The type of event that this object respresents. Possible values are

  • query the log of a query to a collection

  • click the result of a call to the events endpoint.

Attr str result_type

(optional) The type of result that this event is associated

with. Only returned with logs of type event.

class LogQueryResponseResultDocuments(results=None, count=None)[source]

Bases: object

Object containing result information that was returned by the query used to create this log entry. Only returned with logs of type query.

Attr list[LogQueryResponseResultDocumentsResult] results

(optional) Array of log

query response results. :attr int count: (optional) The number of results returned in the query associate with this log.

class LogQueryResponseResultDocumentsResult(position=None, document_id=None, score=None, confidence=None, collection_id=None)[source]

Bases: object

Each object in the results array corresponds to an individual document returned by the original query.

Attr int position

(optional) The result rank of this document. A position of 1

indicates that it was the first returned result. :attr str document_id: (optional) The document_id of the document that this result represents. :attr float score: (optional) The raw score of this result. A higher score indicates a greater match to the query parameters. :attr float confidence: (optional) The confidence score of the result’s analysis. A higher score indicating greater confidence. :attr str collection_id: (optional) The collection_id of the document represented by this result.

class MetricAggregation(interval=None, event_type=None, results=None)[source]

Bases: object

An aggregation analyzing log information for queries and events.

Attr str interval

(optional) The measurement interval for this metric. Metric

intervals are always 1 day (1d). :attr str event_type: (optional) The event type associated with this metric result. This field, when present, will always be click. :attr list[MetricAggregationResult] results: (optional) Array of metric aggregation query results.

class MetricAggregationResult(key_as_string=None, key=None, matching_results=None, event_rate=None)[source]

Bases: object

Aggregation result data for the requested metric.

Attr datetime key_as_string

(optional) Date in string form representing the start of

this interval. :attr int key: (optional) Unix epoch time equivalent of the key_as_string, that represents the start of this interval. :attr int matching_results: (optional) Number of matching results. :attr float event_rate: (optional) The number of queries with associated events divided by the total number of queries for the interval. Only returned with event_rate metrics.

class MetricResponse(aggregations=None)[source]

Bases: object

The response generated from a call to a metrics method.

Attr list[MetricAggregation] aggregations

(optional) Array of metric aggregations.

class MetricTokenAggregation(event_type=None, results=None)[source]

Bases: object

An aggregation analyzing log information for queries and events.

Attr str event_type

(optional) The event type associated with this metric result.

This field, when present, will always be click. :attr list[MetricTokenAggregationResult] results: (optional) Array of results for the metric token aggregation.

class MetricTokenAggregationResult(key=None, matching_results=None, event_rate=None)[source]

Bases: object

Aggregation result data for the requested metric.

Attr str key

(optional) The content of the natural_language_query parameter used

in the query that this result represents. :attr int matching_results: (optional) Number of matching results. :attr float event_rate: (optional) The number of queries with associated events divided by the total number of queries currently stored (queries and events are stored in the log for 30 days).

class MetricTokenResponse(aggregations=None)[source]

Bases: object

The response generated from a call to a metrics method that evaluates tokens.

Attr list[MetricTokenAggregation] aggregations

(optional) Array of metric token

aggregations.

class Nested(type=None, results=None, matching_results=None, aggregations=None, path=None)[source]

Bases: object

Nested.

Attr str path

(optional) The area of the results the aggregation was restricted to.

class NluEnrichmentCategories(**kwargs)[source]

Bases: object

An object that indicates the Categories enrichment will be applied to the specified field.

class NluEnrichmentConcepts(limit=None)[source]

Bases: object

An object specifiying the concepts enrichment and related parameters.

Attr int limit

(optional) The maximum number of concepts enrichments to extact from

each instance of the specified field.

class NluEnrichmentEmotion(document=None, targets=None)[source]

Bases: object

An object specifying the emotion detection enrichment and related parameters.

Attr bool document

(optional) When true, emotion detection is performed on the

entire field. :attr list[str] targets: (optional) A comma-separated list of target strings that will have any associated emotions detected.

class NluEnrichmentEntities(sentiment=None, emotion=None, limit=None, mentions=None, mention_types=None, sentence_locations=None, model=None)[source]

Bases: object

An object speficying the Entities enrichment and related parameters.

Attr bool sentiment

(optional) When true, sentiment analysis of entities will be

performed on the specified field. :attr bool emotion: (optional) When true, emotion detection of entities will be performed on the specified field. :attr int limit: (optional) The maximum number of entities to extract for each instance of the specified field. :attr bool mentions: (optional) When true, the number of mentions of each identified entity is recorded. The default is false. :attr bool mention_types: (optional) When true, the types of mentions for each idetifieid entity is recorded. The default is false. :attr bool sentence_locations: (optional) When true, a list of sentence locations for each instance of each identified entity is recorded. The default is false. :attr str model: (optional) The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, or the default public model alchemy.

class NluEnrichmentFeatures(keywords=None, entities=None, sentiment=None, emotion=None, categories=None, semantic_roles=None, relations=None, concepts=None)[source]

Bases: object

NluEnrichmentFeatures.

Attr NluEnrichmentKeywords keywords

(optional) An object specifying the Keyword

enrichment and related parameters. :attr NluEnrichmentEntities entities: (optional) An object speficying the Entities enrichment and related parameters. :attr NluEnrichmentSentiment sentiment: (optional) An object specifying the sentiment extraction enrichment and related parameters. :attr NluEnrichmentEmotion emotion: (optional) An object specifying the emotion detection enrichment and related parameters. :attr NluEnrichmentCategories categories: (optional) An object that indicates the Categories enrichment will be applied to the specified field. :attr NluEnrichmentSemanticRoles semantic_roles: (optional) An object specifiying the semantic roles enrichment and related parameters. :attr NluEnrichmentRelations relations: (optional) An object specifying the relations enrichment and related parameters. :attr NluEnrichmentConcepts concepts: (optional) An object specifiying the concepts enrichment and related parameters.

class NluEnrichmentKeywords(sentiment=None, emotion=None, limit=None)[source]

Bases: object

An object specifying the Keyword enrichment and related parameters.

Attr bool sentiment

(optional) When true, sentiment analysis of keywords will be

performed on the specified field. :attr bool emotion: (optional) When true, emotion detection of keywords will be performed on the specified field. :attr int limit: (optional) The maximum number of keywords to extract for each instance of the specified field.

class NluEnrichmentRelations(model=None)[source]

Bases: object

An object specifying the relations enrichment and related parameters.

Attr str model

(optional) *For use with natural_language_understanding enrichments

only.* The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, the default is`en-news`.

class NluEnrichmentSemanticRoles(entities=None, keywords=None, limit=None)[source]

Bases: object

An object specifiying the semantic roles enrichment and related parameters.

Attr bool entities

(optional) When true, entities are extracted from the

identified sentence parts. :attr bool keywords: (optional) When true, keywords are extracted from the identified sentence parts. :attr int limit: (optional) The maximum number of semantic roles enrichments to extact from each instance of the specified field.

class NluEnrichmentSentiment(document=None, targets=None)[source]

Bases: object

An object specifying the sentiment extraction enrichment and related parameters.

Attr bool document

(optional) When true, sentiment analysis is performed on the

entire field. :attr list[str] targets: (optional) A comma-separated list of target strings that will have any associated sentiment analyzed.

class NormalizationOperation(operation=None, source_field=None, destination_field=None)[source]

Bases: object

NormalizationOperation.

Attr str operation

(optional) Identifies what type of operation to perform.

copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a _copy_ followed by a _remove_). merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents. remove - Deletes the source_field field. The destination_field is ignored for this operation. remove_nulls - Removes all nested null (blank) field values from the ingested document. source_field and destination_field are ignored by this operation because _remove_nulls_ operates on the entire ingested document. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive). :attr str source_field: (optional) The source field for the operation. :attr str destination_field: (optional) The destination field for the operation.

class Notice(notice_id=None, created=None, document_id=None, query_id=None, severity=None, step=None, description=None)[source]

Bases: object

A notice produced for the collection.

Attr str notice_id

(optional) Identifies the notice. Many notices might have the

same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include: index_failed, index_failed_too_many_requests, index_failed_incompatible_field, index_failed_cluster_unavailable, ingestion_timeout, ingestion_error, bad_request, internal_error, missing_model, unsupported_model, smart_document_understanding_failed_incompatible_field, smart_document_understanding_failed_internal_error, smart_document_understanding_failed_internal_error, smart_document_understanding_failed_warning, smart_document_understanding_page_error, smart_document_understanding_page_warning. Note: This is not a complete list, other values might be returned. :attr datetime created: (optional) The creation date of the collection in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr str document_id: (optional) Unique identifier of the document. :attr str query_id: (optional) Unique identifier of the query used for relevance training. :attr str severity: (optional) Severity level of the notice. :attr str step: (optional) Ingestion or training step in which the notice occurred. Typical step values include: classify_elements, smartDocumentUnderstanding, ingestion, indexing, convert. Note: This is not a complete list, other values might be returned. :attr str description: (optional) The description of the notice.

class PdfHeadingDetection(fonts=None)[source]

Bases: object

PdfHeadingDetection.

Attr list[FontSetting] fonts

(optional)

class PdfSettings(heading=None)[source]

Bases: object

A list of PDF conversion settings.

Attr PdfHeadingDetection heading

(optional)

class QueryAggregation(type=None, results=None, matching_results=None, aggregations=None)[source]

Bases: object

An aggregation produced by Discovery to analyze the input provided.

Attr str type

(optional) The type of aggregation command used. For example: term,

filter, max, min, etc. :attr list[AggregationResult] results: (optional) Array of aggregation results. :attr int matching_results: (optional) Number of matching results. :attr list[QueryAggregation] aggregations: (optional) Aggregations returned by Discovery.

class QueryEntitiesContext(text=None)[source]

Bases: object

Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.

Attr str text

(optional) Entity text to provide context for the queried entity and

rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.

class QueryEntitiesEntity(text=None, type=None)[source]

Bases: object

A text string that appears within the entity text field.

Attr str text

(optional) Entity text content.

Attr str type

(optional) The type of the specified entity.

class QueryEntitiesResponse(entities=None)[source]

Bases: object

An object that contains an array of entities resulting from the query.

Attr list[QueryEntitiesResponseItem] entities

(optional) Array of entities that

results from the query.

class QueryEntitiesResponseItem(text=None, type=None, evidence=None)[source]

Bases: object

Object containing Entity query response information.

Attr str text

(optional) Entity text content.

Attr str type

(optional) The type of the result entity.

Attr list[QueryEvidence] evidence

(optional) List of different evidentiary items to

support the result.

class QueryEvidence(document_id=None, field=None, start_offset=None, end_offset=None, entities=None)[source]

Bases: object

Description of evidence location supporting Knoweldge Graph query result.

Attr str document_id

(optional) The docuemnt ID (as indexed in Discovery) of the

evidence location. :attr str field: (optional) The field of the document where the supporting evidence was identified. :attr int start_offset: (optional) The start location of the evidence in the identified field. This value is inclusive. :attr int end_offset: (optional) The end location of the evidence in the identified field. This value is inclusive. :attr list[QueryEvidenceEntity] entities: (optional) An array of entity objects that show evidence of the result.

class QueryEvidenceEntity(type=None, text=None, start_offset=None, end_offset=None)[source]

Bases: object

Entity description and location within evidence field.

Attr str type

(optional) The entity type for this entity. Possible types vary based

on model used. :attr str text: (optional) The original text of this entity as found in the evidence field. :attr int start_offset: (optional) The start location of the entity text in the identified field. This value is inclusive. :attr int end_offset: (optional) The end location of the entity text in the identified field. This value is exclusive.

class QueryFilterType(exclude=None, include=None)[source]

Bases: object

QueryFilterType.

Attr list[str] exclude

(optional) A comma-separated list of types to exclude.

Attr list[str] include

(optional) A comma-separated list of types to include. All

other types are excluded.

class QueryNoticesResponse(matching_results=None, results=None, aggregations=None, passages=None, duplicates_removed=None)[source]

Bases: object

QueryNoticesResponse.

Attr int matching_results

(optional) The number of matching results.

Attr list[QueryNoticesResult] results

(optional) Array of document results that

match the query. :attr list[QueryAggregation] aggregations: (optional) Array of aggregation results that match the query. :attr list[QueryPassages] passages: (optional) Array of passage results that match the query. :attr int duplicates_removed: (optional) The number of duplicates removed from this notices query.

class QueryNoticesResult(id=None, metadata=None, collection_id=None, result_metadata=None, title=None, code=None, filename=None, file_type=None, sha1=None, notices=None, **kwargs)[source]

Bases: object

QueryNoticesResult.

Attr str id

(optional) The unique identifier of the document.

Attr dict metadata

(optional) Metadata of the document.

Attr str collection_id

(optional) The collection ID of the collection containing the

document for this result. :attr QueryResultMetadata result_metadata: (optional) Metadata of a query result. :attr str title: (optional) Automatically extracted result title. :attr int code: (optional) The internal status code returned by the ingestion subsystem indicating the overall result of ingesting the source document. :attr str filename: (optional) Name of the original source file (if available). :attr str file_type: (optional) The type of the original source file. :attr str sha1: (optional) The SHA-1 hash of the original source file (formatted as a hexadecimal string). :attr list[Notice] notices: (optional) Array of notices for the document.

class QueryPassages(document_id=None, passage_score=None, passage_text=None, start_offset=None, end_offset=None, field=None)[source]

Bases: object

QueryPassages.

Attr str document_id

(optional) The unique identifier of the document from which the

passage has been extracted. :attr float passage_score: (optional) The confidence score of the passages’s analysis. A higher score indicates greater confidence. :attr str passage_text: (optional) The content of the extracted passage. :attr int start_offset: (optional) The position of the first character of the extracted passage in the originating field. :attr int end_offset: (optional) The position of the last character of the extracted passage in the originating field. :attr str field: (optional) The label of the field from which the passage has been extracted.

class QueryRelationsArgument(entities=None)[source]

Bases: object

QueryRelationsArgument.

Attr list[QueryEntitiesEntity] entities

(optional) Array of query entities.

class QueryRelationsEntity(text=None, type=None, exact=None)[source]

Bases: object

QueryRelationsEntity.

Attr str text

(optional) Entity text content.

Attr str type

(optional) The type of the specified entity.

Attr bool exact

(optional) If false, implicit querying is performed. The default is

false.

class QueryRelationsFilter(relation_types=None, entity_types=None, document_ids=None)[source]

Bases: object

QueryRelationsFilter.

Attr QueryFilterType relation_types

(optional)

Attr QueryFilterType entity_types

(optional)

Attr list[str] document_ids

(optional) A comma-separated list of document IDs to

include in the query.

class QueryRelationsRelationship(type=None, frequency=None, arguments=None, evidence=None)[source]

Bases: object

QueryRelationsRelationship.

Attr str type

(optional) The identified relationship type.

Attr int frequency

(optional) The number of times the relationship is mentioned.

Attr list[QueryRelationsArgument] arguments

(optional) Information about the

relationship. :attr list[QueryEvidence] evidence: (optional) List of different evidentiary items to support the result.

class QueryRelationsResponse(relations=None)[source]

Bases: object

QueryRelationsResponse.

Attr list[QueryRelationsRelationship] relations

(optional) Array of relationships

for the relations query.

class QueryResponse(matching_results=None, results=None, aggregations=None, passages=None, duplicates_removed=None, session_token=None, retrieval_details=None)[source]

Bases: object

A response containing the documents and aggregations for the query.

Attr int matching_results

(optional) The number of matching results for the query.

Attr list[QueryResult] results

(optional) Array of document results for the query.

Attr list[QueryAggregation] aggregations

(optional) Array of aggregation results for

the query. :attr list[QueryPassages] passages: (optional) Array of passage results for the query. :attr int duplicates_removed: (optional) The number of duplicate results removed. :attr str session_token: (optional) The session token for this query. The session token can be used to add events associated with this query to the query and event log. Important: Session tokens are case sensitive. :attr RetrievalDetails retrieval_details: (optional) An object contain retrieval type information.

class QueryResult(id=None, metadata=None, collection_id=None, result_metadata=None, title=None, **kwargs)[source]

Bases: object

QueryResult.

Attr str id

(optional) The unique identifier of the document.

Attr dict metadata

(optional) Metadata of the document.

Attr str collection_id

(optional) The collection ID of the collection containing the

document for this result. :attr QueryResultMetadata result_metadata: (optional) Metadata of a query result. :attr str title: (optional) Automatically extracted result title.

class QueryResultMetadata(score, confidence=None)[source]

Bases: object

Metadata of a query result.

Attr float score

An unbounded measure of the relevance of a particular result,

dependent on the query and matching document. A higher score indicates a greater match to the query parameters. :attr float confidence: (optional) The confidence score for the given result. Calculated based on how relevant the result is estimated to be. confidence can range from 0.0 to 1.0. The higher the number, the more relevant the document. The confidence value for a result was calculated using the model specified in the document_retrieval_strategy field of the result set.

class RetrievalDetails(document_retrieval_strategy=None)[source]

Bases: object

An object contain retrieval type information.

Attr str document_retrieval_strategy

(optional) Indentifies the document retrieval

strategy used for this query. relevancy_training indicates that the results were returned using a relevancy trained model. continuous_relevancy_training indicates that the results were returned using the continuous relevancy training model created by result feedback analysis. untrained means the results were returned using the standard untrained model.

Note: In the event of trained collections being queried, but the trained model is

not used to return results, the document_retrieval_strategy will be listed as untrained.

class SduStatus(enabled=None, total_annotated_pages=None, total_pages=None, total_documents=None, custom_fields=None)[source]

Bases: object

Object containing smart document understanding information for this collection.

Attr bool enabled

(optional) When true, smart document understanding conversion is

enabled for this collection. All collections created with a version date after 2019-04-30 have smart document understanding enabled. If false, documents added to the collection are converted using the conversion settings specified in the configuration associated with the collection. :attr int total_annotated_pages: (optional) The total number of pages annotated using smart document understanding in this collection. :attr int total_pages: (optional) The current number of pages that can be used for training smart document understanding. The total_pages number is calculated as the total number of pages identified from the documents listed in the total_documents field. :attr int total_documents: (optional) The total number of documents in this collection that can be used to train smart document understanding. For lite plan collections, the maximum is the first 20 uploaded documents (not including HTML or JSON documents). For other plans, the maximum is the first 40 uploaded documents (not including HTML or JSON documents). When the maximum is reached, additional documents uploaded to the collection are not considered for training smart document understanding. :attr SduStatusCustomFields custom_fields: (optional) Information about custom smart document understanding fields that exist in this collection.

class SduStatusCustomFields(defined=None, maximum_allowed=None)[source]

Bases: object

Information about custom smart document understanding fields that exist in this collection.

Attr int defined

(optional) The number of custom fields defined for this collection.

Attr int maximum_allowed

(optional) The maximum number of custom fields that are

allowed in this collection.

class SearchStatus(scope=None, status=None, status_description=None, last_trained=None)[source]

Bases: object

Information about the Continuous Relevancy Training for this environment.

Attr str scope

(optional) Current scope of the training. Always returned as

environment. :attr str status: (optional) The current status of Continuous Relevancy Training for this environment. :attr str status_description: (optional) Long description of the current Continuous Relevancy Training status. :attr date last_trained: (optional) The date stamp of the most recent completed training for this environment.

class SegmentSettings(enabled=None, selector_tags=None, annotated_fields=None)[source]

Bases: object

A list of Document Segmentation settings.

Attr bool enabled

(optional) Enables/disables the Document Segmentation feature.

Attr list[str] selector_tags

(optional) Defines the heading level that splits into

document segments. Valid values are h1, h2, h3, h4, h5, h6. The content of the header field that the segmentation splits at is used as the title field for that segmented result. Only valid if used with a collection that has enabled set to false in the smart_document_understanding object. :attr list[str] annotated_fields: (optional) Defines the annotated smart document understanding fields that the document is split on. The content of the annotated field that the segmentation splits at is used as the title field for that segmented result. For example, if the field sub-title is specified, when a document is uploaded each time the smart documement understanding conversion encounters a field of type sub-title the document is split at that point and the content of the field used as the title of the remaining content. Thnis split is performed for all instances of the listed fields in the uploaded document. Only valid if used with a collection that has enabled set to true in the smart_document_understanding object.

class Source(type=None, credential_id=None, schedule=None, options=None)[source]

Bases: object

Object containing source parameters for the configuration.

Attr str type

(optional) The type of source to connect to.

  • box indicates the configuration is to connect an instance of Enterprise Box.

  • salesforce indicates the configuration is to connect to Salesforce.

  • sharepoint indicates the configuration is to connect to Microsoft SharePoint

Online. - web_crawl indicates the configuration is to perform a web page crawl. - cloud_object_storage indicates the configuration is to connect to a cloud object store. :attr str credential_id: (optional) The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object. :attr SourceSchedule schedule: (optional) Object containing the schedule information for the source. :attr SourceOptions options: (optional) The options object defines which items to crawl from the source system.

class SourceOptions(folders=None, objects=None, site_collections=None, urls=None, buckets=None, crawl_all_buckets=None)[source]

Bases: object

The options object defines which items to crawl from the source system.

Attr list[SourceOptionsFolder] folders

(optional) Array of folders to crawl from the

Box source. Only valid, and required, when the type field of the source object is set to box. :attr list[SourceOptionsObject] objects: (optional) Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to salesforce. :attr list[SourceOptionsSiteColl] site_collections: (optional) Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to sharepoint. :attr list[SourceOptionsWebCrawl] urls: (optional) Array of Web page URLs to begin crawling the web from. Only valid and required when the type field of the source object is set to web_crawl. :attr list[SourceOptionsBuckets] buckets: (optional) Array of cloud object store buckets to begin crawling. Only valid and required when the type field of the source object is set to cloud_object_store, and the crawl_all_buckets field is false or not specified. :attr bool crawl_all_buckets: (optional) When true, all buckets in the specified cloud object store are crawled. If set to true, the buckets array must not be specified.

class SourceOptionsBuckets(name, limit=None)[source]

Bases: object

Object defining a cloud object store bucket to crawl.

Attr str name

The name of the cloud object store bucket to crawl.

Attr int limit

(optional) The number of documents to crawl from this cloud object

store bucket. If not specified, all documents in the bucket are crawled.

class SourceOptionsFolder(owner_user_id, folder_id, limit=None)[source]

Bases: object

Object that defines a box folder to crawl with this configuration.

Attr str owner_user_id

The Box user ID of the user who owns the folder to crawl.

Attr str folder_id

The Box folder ID of the folder to crawl.

Attr int limit

(optional) The maximum number of documents to crawl for this folder.

By default, all documents in the folder are crawled.

class SourceOptionsObject(name, limit=None)[source]

Bases: object

Object that defines a Salesforce document object type crawl with this configuration.

Attr str name

The name of the Salesforce document object to crawl. For example,

case. :attr int limit: (optional) The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.

class SourceOptionsSiteColl(site_collection_path, limit=None)[source]

Bases: object

Object that defines a Microsoft SharePoint site collection to crawl with this configuration.

Attr str site_collection_path

The Microsoft SharePoint Online site collection path

to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration. :attr int limit: (optional) The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.

class SourceOptionsWebCrawl(url, limit_to_starting_hosts=None, crawl_speed=None, allow_untrusted_certificate=None, maximum_hops=None, request_timeout=None, override_robots_txt=None, blacklist=None)[source]

Bases: object

Object defining which URL to crawl and how to crawl it.

Attr str url

The starting URL to crawl.

Attr bool limit_to_starting_hosts

(optional) When true, crawls of the specified

URL are limited to the host part of the url field. :attr str crawl_speed: (optional) The number of concurrent URLs to fetch. gentle means one URL is fetched at a time with a delay between each call. normal means as many as two URLs are fectched concurrently with a short delay between fetch calls. aggressive means that up to ten URLs are fetched concurrently with a short delay between fetch calls. :attr bool allow_untrusted_certificate: (optional) When true, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers. :attr int maximum_hops: (optional) The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on. :attr int request_timeout: (optional) The maximum milliseconds to wait for a response from the web server. :attr bool override_robots_txt: (optional) When true, the crawler will ignore any robots.txt encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set to true when a gateway_id is specied in the credentials. :attr list[str] blacklist: (optional) Array of URL’s to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing https://ibm.com/watson also excludes https://ibm.com/watson/discovery.

class SourceSchedule(enabled=None, time_zone=None, frequency=None)[source]

Bases: object

Object containing the schedule information for the source.

Attr bool enabled

(optional) When true, the source is re-crawled based on the

frequency field in this object. When false the source is not re-crawled; When false and connecting to Salesforce the source is crawled annually. :attr str time_zone: (optional) The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list. :attr str frequency: (optional) The crawl schedule in the specified time_zone. - five_minutes: Runs every five minutes. - hourly: Runs every hour. - daily: Runs every day between 00:00 and 06:00. - weekly: Runs every week on Sunday between 00:00 and 06:00. - monthly: Runs the on the first Sunday of every month between 00:00 and 06:00.

class SourceStatus(status=None, next_crawl=None)[source]

Bases: object

Object containing source crawl status information.

Attr str status

(optional) The current status of the source crawl for this

collection. This field returns not_configured if the default configuration for this source does not have a source object defined. - running indicates that a crawl to fetch more documents is in progress. - complete indicates that the crawl has completed with no errors. - queued indicates that the crawl has been paused by the system and will automatically restart when possible. - unknown indicates that an unidentified error has occured in the service. :attr datetime next_crawl: (optional) Date in RFC 3339 format indicating the time of the next crawl attempt.

class Term(type=None, results=None, matching_results=None, aggregations=None, field=None, count=None)[source]

Bases: object

Term.

Attr str field

(optional) The field where the aggregation is located in the

document. :attr int count: (optional)

class TestDocument(configuration_id=None, status=None, enriched_field_units=None, original_media_type=None, snapshots=None, notices=None)[source]

Bases: object

TestDocument.

Attr str configuration_id

(optional) The unique identifier for the configuration.

Attr str status

(optional) Status of the preview operation.

Attr int enriched_field_units

(optional) The number of 10-kB chunks of field data

that were enriched. This can be used to estimate the cost of running a real ingestion. :attr str original_media_type: (optional) Format of the test document. :attr list[DocumentSnapshot] snapshots: (optional) An array of objects that describe each step in the preview process. :attr list[Notice] notices: (optional) An array of notice messages about the preview operation.

class Timeslice(type=None, results=None, matching_results=None, aggregations=None, field=None, interval=None, anomaly=None)[source]

Bases: object

Timeslice.

Attr str field

(optional) The field where the aggregation is located in the

document. :attr str interval: (optional) Interval of the aggregation. Valid date interval values are second/seconds minute/minutes, hour/hours, day/days, week/weeks, month/months, and year/years. :attr bool anomaly: (optional) Used to indicate that anomaly detection should be performed. Anomaly detection is used to locate unusual datapoints within a time series.

class TokenDictRule(text, tokens, part_of_speech, readings=None)[source]

Bases: object

An object defining a single tokenizaion rule.

Attr str text

The string to tokenize.

Attr list[str] tokens

Array of tokens that the text field is split into when

found. :attr list[str] readings: (optional) Array of tokens that represent the content of the text field in an alternate character set. :attr str part_of_speech: The part of speech that the text string belongs to. For example noun. Custom parts of speech can be specified.

class TokenDictStatusResponse(status=None, type=None)[source]

Bases: object

Object describing the current status of the wordlist.

Attr str status

(optional) Current wordlist status for the specified collection.

Attr str type

(optional) The type for this wordlist. Can be

tokenization_dictionary or stopwords.

class TopHits(type=None, results=None, matching_results=None, aggregations=None, size=None, hits=None)[source]

Bases: object

TopHits.

Attr int size

(optional) Number of top hits returned by the aggregation.

Attr TopHitsResults hits

(optional)

class TopHitsResults(matching_results=None, hits=None)[source]

Bases: object

TopHitsResults.

Attr int matching_results

(optional) Number of matching results.

Attr list[QueryResult] hits

(optional) Top results returned by the aggregation.

class TrainingDataSet(environment_id=None, collection_id=None, queries=None)[source]

Bases: object

TrainingDataSet.

Attr str environment_id

(optional) The environment id associated with this training

data set. :attr str collection_id: (optional) The collection id associated with this training data set. :attr list[TrainingQuery] queries: (optional) Array of training queries.

class TrainingExample(document_id=None, cross_reference=None, relevance=None)[source]

Bases: object

TrainingExample.

Attr str document_id

(optional) The document ID associated with this training

example. :attr str cross_reference: (optional) The cross reference associated with this training example. :attr int relevance: (optional) The relevance of the training example.

class TrainingExampleList(examples=None)[source]

Bases: object

TrainingExampleList.

Attr list[TrainingExample] examples

(optional) Array of training examples.

class TrainingQuery(query_id=None, natural_language_query=None, filter=None, examples=None)[source]

Bases: object

TrainingQuery.

Attr str query_id

(optional) The query ID associated with the training query.

Attr str natural_language_query

(optional) The natural text query for the training

query. :attr str filter: (optional) The filter used on the collection before the natural_language_query is applied. :attr list[TrainingExample] examples: (optional) Array of training examples.

class TrainingStatus(total_examples=None, available=None, processing=None, minimum_queries_added=None, minimum_examples_added=None, sufficient_label_diversity=None, notices=None, successfully_trained=None, data_updated=None)[source]

Bases: object

TrainingStatus.

Attr int total_examples

(optional) The total number of training examples uploaded to

this collection. :attr bool available: (optional) When true, the collection has been successfully trained. :attr bool processing: (optional) When true, the collection is currently processing training. :attr bool minimum_queries_added: (optional) When true, the collection has a sufficent amount of queries added for training to occur. :attr bool minimum_examples_added: (optional) When true, the collection has a sufficent amount of examples added for training to occur. :attr bool sufficient_label_diversity: (optional) When true, the collection has a sufficent amount of diversity in labeled results for training to occur. :attr int notices: (optional) The number of notices associated with this data set. :attr datetime successfully_trained: (optional) The timestamp of when the collection was successfully trained. :attr datetime data_updated: (optional) The timestamp of when the data was uploaded.

class WordHeadingDetection(fonts=None, styles=None)[source]

Bases: object

WordHeadingDetection.

Attr list[FontSetting] fonts

(optional)

Attr list[WordStyle] styles

(optional)

class WordSettings(heading=None)[source]

Bases: object

A list of Word conversion settings.

Attr WordHeadingDetection heading

(optional)

class WordStyle(level=None, names=None)[source]

Bases: object

WordStyle.

Attr int level

(optional) HTML head level that content matching this style is tagged

with. :attr list[str] names: (optional) Array of word style names to convert.

class XPathPatterns(xpaths=None)[source]

Bases: object

XPathPatterns.

Attr list[str] xpaths

(optional) An array to XPaths.