ibm_watson.discovery_v2 module

IBM Watson™ Discovery for IBM Cloud Pak for Data is a cognitive search and content analytics engine that you can add to applications to identify patterns, trends and actionable insights to drive better decision-making. Securely unify structured and unstructured data with pre-enriched content, and use a simplified query language to eliminate the need for manual filtering of results.

class DiscoveryV2(version, authenticator=None)[source]

Bases: ibm_cloud_sdk_core.base_service.BaseService

The Discovery V2 service.

default_service_url = None
list_collections(project_id, **kwargs)[source]

List collections.

Lists existing collections for the specified project.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

query(project_id, *, collection_ids=None, filter=None, query=None, natural_language_query=None, aggregation=None, count=None, return_=None, offset=None, sort=None, highlight=None, spelling_suggestions=None, table_results=None, suggested_refinements=None, passages=None, **kwargs)[source]

Query a project.

By using this method, you can construct queries. For details, see the [Discovery documentation](https://cloud.ibm.com/docs/services/discovery-data?topic=discovery-data-query-concepts).

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • collection_ids (list[str]) – (optional) A comma-separated list of collection IDs to be queried against.

  • filter (str) – (optional) A cacheable query that excludes documents that don’t mention the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set.

  • query (str) – (optional) A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results.

  • natural_language_query (str) – (optional) A natural language query that returns relevant documents by utilizing training data and natural language understanding.

  • aggregation (str) – (optional) An aggregation search that returns an exact answer by combining query search with filters. Useful for applications to build lists, tables, and time series. For a full list of possible aggregations, see the Query reference.

  • count (int) – (optional) Number of results to return.

  • return (list[str]) – (optional) A list of the fields in the document hierarchy to return. If this parameter not specified, then all top-level fields are returned.

  • offset (int) – (optional) The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results.

  • sort (str) – (optional) A comma-separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. This parameter cannot be used in the same query as the bias parameter.

  • highlight (bool) – (optional) When true, a highlight field is returned for each result which contains the fields which match the query with <em></em> tags around the matching query terms.

  • spelling_suggestions (bool) – (optional) When true and the natural_language_query parameter is used, the natural_language_query parameter is spell checked. The most likely correction is returned in the suggested_query field of the response (if one exists).

  • table_results (QueryLargeTableResults) – (optional) Configuration for table retrieval.

  • suggested_refinements (QueryLargeSuggestedRefinements) – (optional) Configuration for suggested refinements.

  • passages (QueryLargePassages) – (optional) Configuration for passage retrieval.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_autocompletion(project_id, prefix, *, collection_ids=None, field=None, count=None, **kwargs)[source]

Get Autocomplete Suggestions.

Returns completion query suggestions for the specified prefix.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • prefix (str) – The prefix to use for autocompletion. For example, the prefix Ho could autocomplete to Hot, Housing, or How do I upgrade. Possible completions are.

  • collection_ids (list[str]) – (optional) Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.

  • field (str) – (optional) The field in the result documents that autocompletion suggestions are identified from.

  • count (int) – (optional) The number of autocompletion suggestions to return.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

query_notices(project_id, *, filter=None, query=None, natural_language_query=None, count=None, offset=None, **kwargs)[source]

Query system notices.

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • filter (str) – (optional) A cacheable query that excludes documents that don’t mention the query content. Filter searches are better for metadata-type searches and for assessing the concepts in the data set.

  • query (str) – (optional) A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first.

  • natural_language_query (str) – (optional) A natural language query that returns relevant documents by utilizing training data and natural language understanding.

  • count (int) – (optional) Number of results to return. The maximum for the count and offset values together in any one query is 10000.

  • offset (int) – (optional) The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10 and the offset is 8, it returns the last two results. The maximum for the count and offset values together in any one query is 10000.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_fields(project_id, *, collection_ids=None, **kwargs)[source]

List fields.

Gets a list of the unique fields (and their types) stored in the the specified collections.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • collection_ids (list[str]) – (optional) Comma separated list of the collection IDs. If this parameter is not specified, all collections in the project are used.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_component_settings(project_id, **kwargs)[source]

Configuration settings for components.

Returns default configuration settings for components.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

add_document(project_id, collection_id, *, file=None, filename=None, file_content_type=None, metadata=None, x_watson_discovery_force=None, **kwargs)[source]

Add a document.

Add a document to a collection with optional metadata.
Returns immediately after the system has accepted the document for processing.
  • The user must provide document content, metadata, or both. If the request is

missing both document content and metadata, it is rejected.
  • The user can set the Content-Type parameter on the file part to

indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example, application/octet-stream), then the service attempts to automatically detect the document’s media type.

  • The following field names are reserved and will be filtered out if present

after normalization: id, score, highlight, and any field with the prefix of: _, +, or -

  • Fields with empty name values after normalization are filtered out before

indexing.
  • Fields containing the following characters after normalization are filtered

out before indexing: # and ,

If the document is uploaded to a collection that has it’s data shared with

another collection, the X-Watson-Discovery-Force header must be set to true.

Note: Documents can be added with a specific document_id by using the

_/v2/projects/{project_id}/collections/{collection_id}/documents method. Note: This operation only works on collections created to accept direct file uploads. It cannot be used to modify a collection that conects to an external source such as Microsoft SharePoint.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • collection_id (str) – The ID of the collection.

  • file (file) – (optional) The content of the document to ingest. The maximum supported file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected.

  • filename (str) – (optional) The filename for file.

  • file_content_type (str) – (optional) The content type of file.

  • metadata (str) –

    (optional) The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

    ”Creator”: “Johnny Appleseed”, “Subject”: “Apples”

    } ```.

  • x_watson_discovery_force (bool) – (optional) When true, the uploaded document is added to the collection even if the data for that collection is shared with other collections.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_document(project_id, collection_id, document_id, *, file=None, filename=None, file_content_type=None, metadata=None, x_watson_discovery_force=None, **kwargs)[source]

Update a document.

Replace an existing document or add a document with a specified document_id. Starts ingesting a document with optional metadata. If the document is uploaded to a collection that has it’s data shared with another collection, the X-Watson-Discovery-Force header must be set to true. Note: When uploading a new document with this method it automatically replaces any document stored with the same document_id if it exists. Note: This operation only works on collections created to accept direct file uploads. It cannot be used to modify a collection that conects to an external source such as Microsoft SharePoint.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • collection_id (str) – The ID of the collection.

  • document_id (str) – The ID of the document.

  • file (file) – (optional) The content of the document to ingest. The maximum supported file size when adding a file to a collection is 50 megabytes, the maximum supported file size when testing a confiruration is 1 megabyte. Files larger than the supported size are rejected.

  • filename (str) – (optional) The filename for file.

  • file_content_type (str) – (optional) The content type of file.

  • metadata (str) –

    (optional) The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {

    ”Creator”: “Johnny Appleseed”, “Subject”: “Apples”

    } ```.

  • x_watson_discovery_force (bool) – (optional) When true, the uploaded document is added to the collection even if the data for that collection is shared with other collections.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_document(project_id, collection_id, document_id, *, x_watson_discovery_force=None, **kwargs)[source]

Delete a document.

If the given document ID is invalid, or if the document is not found, then the a success response is returned (HTTP status code 200) with the status set to ‘deleted’. Note: This operation only works on collections created to accept direct file uploads. It cannot be used to modify a collection that conects to an external source such as Microsoft SharePoint.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • collection_id (str) – The ID of the collection.

  • document_id (str) – The ID of the document.

  • x_watson_discovery_force (bool) – (optional) When true, the uploaded document is added to the collection even if the data for that collection is shared with other collections.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

list_training_queries(project_id, **kwargs)[source]

List training queries.

List the training queries for the specified project.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

delete_training_queries(project_id, **kwargs)[source]

Delete training queries.

Removes all training queries for the specified project.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

create_training_query(project_id, natural_language_query, examples, *, filter=None, **kwargs)[source]

Create training query.

Add a query to the training data for this project. The query can contain a filter and natural language query.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • natural_language_query (str) – The natural text query for the training query.

  • examples (list[TrainingExample]) – Array of training examples.

  • filter (str) – (optional) The filter used on the collection before the natural_language_query is applied.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

get_training_query(project_id, query_id, **kwargs)[source]

Get a training data query.

Get details for a specific training data query, including the query string and all examples.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • query_id (str) – The ID of the query used for training.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

update_training_query(project_id, query_id, natural_language_query, examples, *, filter=None, **kwargs)[source]

Update a training query.

Updates an existing training query and it’s examples.

Parameters
  • project_id (str) – The ID of the project. This information can be found from the deploy page of the Discovery administrative tooling.

  • query_id (str) – The ID of the query used for training.

  • natural_language_query (str) – The natural text query for the training query.

  • examples (list[TrainingExample]) – Array of training examples.

  • filter (str) – (optional) The filter used on the collection before the natural_language_query is applied.

  • headers (dict) – A dict containing the request headers

Returns

A DetailedResponse containing the result, headers and HTTP status code.

Return type

DetailedResponse

class AddDocumentEnums[source]

Bases: object

class FileContentType[source]

Bases: enum.Enum

The content type of file.

APPLICATION_JSON = 'application/json'
APPLICATION_MSWORD = 'application/msword'
APPLICATION_VND_OPENXMLFORMATS_OFFICEDOCUMENT_WORDPROCESSINGML_DOCUMENT = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
APPLICATION_PDF = 'application/pdf'
TEXT_HTML = 'text/html'
APPLICATION_XHTML_XML = 'application/xhtml+xml'
class UpdateDocumentEnums[source]

Bases: object

class FileContentType[source]

Bases: enum.Enum

The content type of file.

APPLICATION_JSON = 'application/json'
APPLICATION_MSWORD = 'application/msword'
APPLICATION_VND_OPENXMLFORMATS_OFFICEDOCUMENT_WORDPROCESSINGML_DOCUMENT = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
APPLICATION_PDF = 'application/pdf'
TEXT_HTML = 'text/html'
APPLICATION_XHTML_XML = 'application/xhtml+xml'
class Collection(*, collection_id=None, name=None)[source]

Bases: object

A collection for storing documents.

Attr str collection_id

(optional) The unique identifier of the collection.

Attr str name

(optional) The name of the collection.

class Completions(*, completions=None)[source]

Bases: object

An object containing an array of autocompletion suggestions.

Attr list[str] completions

(optional) Array of autcomplete suggestion based on the provided prefix.

class ComponentSettingsAggregation(*, name=None, label=None, multiple_selections_allowed=None, visualization_type=None)[source]

Bases: object

Display settings for aggregations.

Attr str name

(optional) Identifier used to map aggregation settings to aggregation configuration.

Attr str label

(optional) User-friendly alias for the aggregation.

Attr bool multiple_selections_allowed

(optional) Whether users is allowed to select more than one of the aggregation terms.

Attr str visualization_type

(optional) Type of visualization to use when rendering the aggregation.

class VisualizationTypeEnum[source]

Bases: enum.Enum

Type of visualization to use when rendering the aggregation.

AUTO = 'auto'
FACET_TABLE = 'facet_table'
WORD_CLOUD = 'word_cloud'
MAP = 'map'
class ComponentSettingsFieldsShown(*, body=None, title=None)[source]

Bases: object

Fields shown in the results section of the UI.

Attr ComponentSettingsFieldsShownBody body

(optional) Body label.

Attr ComponentSettingsFieldsShownTitle title

(optional) Title label.

class ComponentSettingsFieldsShownBody(*, use_passage=None, field=None)[source]

Bases: object

Body label.

Attr bool use_passage

(optional) Use the whole passage as the body.

Attr str field

(optional) Use a specific field as the title.

class ComponentSettingsFieldsShownTitle(*, field=None)[source]

Bases: object

Title label.

Attr str field

(optional) Use a specific field as the title.

class ComponentSettingsResponse(*, fields_shown=None, autocomplete=None, structured_search=None, results_per_page=None, aggregations=None)[source]

Bases: object

A response containing the default component settings.

Attr ComponentSettingsFieldsShown fields_shown

(optional) Fields shown in the results section of the UI.

Attr bool autocomplete

(optional) Whether or not autocomplete is enabled.

Attr bool structured_search

(optional) Whether or not structured search is enabled.

Attr int results_per_page

(optional) Number or results shown per page.

Attr list[ComponentSettingsAggregation] aggregations

(optional) a list of component setting aggregations.

class DeleteDocumentResponse(*, document_id=None, status=None)[source]

Bases: object

Information returned when a document is deleted.

Attr str document_id

(optional) The unique identifier of the document.

Attr str status

(optional) Status of the document. A deleted document has the status deleted.

class StatusEnum[source]

Bases: enum.Enum

Status of the document. A deleted document has the status deleted.

DELETED = 'deleted'
class DocumentAccepted(*, document_id=None, status=None)[source]

Bases: object

Information returned after an uploaded document is accepted.

Attr str document_id

(optional) The unique identifier of the ingested document.

Attr str status

(optional) Status of the document in the ingestion process. A status of processing is returned for documents that are ingested with a version date before 2019-01-01. The pending status is returned for all others.

class StatusEnum[source]

Bases: enum.Enum

Status of the document in the ingestion process. A status of processing is returned for documents that are ingested with a version date before 2019-01-01. The pending status is returned for all others.

PROCESSING = 'processing'
PENDING = 'pending'
class DocumentAttribute(*, type=None, text=None, location=None)[source]

Bases: object

List of document attributes.

Attr str type

(optional) The type of attribute.

Attr str text

(optional) The text associated with the attribute.

Attr TableElementLocation location

(optional) The numeric location of the identified element in the document, represented with two integers labeled begin and end.

class Field(*, field=None, type=None, collection_id=None)[source]

Bases: object

Object containing field details.

Attr str field

(optional) The name of the field.

Attr str type

(optional) The type of the field.

Attr str collection_id

(optional) The collection Id of the collection where the field was found.

class TypeEnum[source]

Bases: enum.Enum

The type of the field.

NESTED = 'nested'
STRING = 'string'
DATE = 'date'
LONG = 'long'
INTEGER = 'integer'
SHORT = 'short'
BYTE = 'byte'
DOUBLE = 'double'
FLOAT = 'float'
BOOLEAN = 'boolean'
BINARY = 'binary'
class ListCollectionsResponse(*, collections=None)[source]

Bases: object

Response object containing an array of collection details.

Attr list[Collection] collections

(optional) An array containing information about each collection in the project.

class ListFieldsResponse(*, fields=None)[source]

Bases: object

The list of fetched fields. The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.

  • Fields which contain nested objects are assigned a type of “nested”.

  • Fields which belong to a nested object are prefixed with .properties (for

example, warnings.properties.severity means that the warnings object has a property called severity).

Attr list[Field] fields

(optional) An array containing information about each field in the collections.

class Notice(*, notice_id=None, created=None, document_id=None, collection_id=None, query_id=None, severity=None, step=None, description=None)[source]

Bases: object

A notice produced for the collection.

Attr str notice_id

(optional) Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. Typical notice IDs include: index_failed, index_failed_too_many_requests, index_failed_incompatible_field, index_failed_cluster_unavailable, ingestion_timeout, ingestion_error, bad_request, internal_error, missing_model, unsupported_model, smart_document_understanding_failed_incompatible_field, smart_document_understanding_failed_internal_error, smart_document_understanding_failed_internal_error, smart_document_understanding_failed_warning, smart_document_understanding_page_error, smart_document_understanding_page_warning. Note: This is not a complete list, other values might be returned.

Attr datetime created

(optional) The creation date of the collection in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’.

Attr str document_id

(optional) Unique identifier of the document.

Attr str collection_id

(optional) Unique identifier of the collection.

Attr str query_id

(optional) Unique identifier of the query used for relevance training.

Attr str severity

(optional) Severity level of the notice.

Attr str step

(optional) Ingestion or training step in which the notice occurred.

Attr str description

(optional) The description of the notice.

class SeverityEnum[source]

Bases: enum.Enum

Severity level of the notice.

WARNING = 'warning'
ERROR = 'error'
class QueryAggregation(type)[source]

Bases: object

An abstract aggregation type produced by Discovery to analyze the input provided.

Attr str type

The type of aggregation command used. Options include: term, histogram, timeslice, nested, filter, min, max, sum, average, unique_count, and top_hits.

class QueryCalculationAggregation(type, field, *, value=None)[source]

Bases: object

Returns a scalar calculation across all documents for the field specified. Possible calculations include min, max, sum, average, and unique_count.

Attr str field

The field to perform the calculation on.

Attr float value

(optional) The value of the calculation.

class QueryFilterAggregation(type, match, matching_results, *, aggregations=None)[source]

Bases: object

A modifier that will narrow down the document set of the sub aggregations it precedes.

Attr str match

The filter written in Discovery Query Language syntax applied to the documents before sub aggregations are run.

Attr int matching_results

Number of documents matching the filter.

Attr list[QueryAggregation] aggregations

(optional) An array of sub aggregations.

class QueryHistogramAggregation(type, field, interval, *, results=None)[source]

Bases: object

Numeric interval segments to categorize documents by using field values from a single numeric field to describe the category.

Attr str field

The numeric field name used to create the histogram.

Attr int interval

The size of the sections the results are split into.

Attr list[QueryHistogramAggregationResult] results

(optional) Array of numeric intervals.

class QueryHistogramAggregationResult(key, matching_results, *, aggregations=None)[source]

Bases: object

Histogram numeric interval result.

Attr int key

The value of the upper bound for the numeric segment.

Attr int matching_results

Number of documents with the specified key as the upper bound.

Attr list[QueryAggregation] aggregations

(optional) An array of sub aggregations.

class QueryLargePassages(*, enabled=None, per_document=None, max_per_document=None, fields=None, count=None, characters=None)[source]

Bases: object

Configuration for passage retrieval.

Attr bool enabled

(optional) A passages query that returns the most relevant passages from the results.

Attr bool per_document

(optional) When true, passages will be returned whithin their respective result.

Attr int max_per_document

(optional) Maximum number of passages to return per result.

Attr list[str] fields

(optional) A list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included.

Attr int count

(optional) The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100.

Attr int characters

(optional) The approximate number of characters that any one passage will have.

class QueryLargeSuggestedRefinements(*, enabled=None, count=None)[source]

Bases: object

Configuration for suggested refinements.

Attr bool enabled

(optional) Whether to perform suggested refinements.

Attr int count

(optional) Maximum number of suggested refinements texts to be returned. The default is 10. The maximum is 100.

class QueryLargeTableResults(*, enabled=None, count=None)[source]

Bases: object

Configuration for table retrieval.

Attr bool enabled

(optional) Whether to enable table retrieval.

Attr int count

(optional) Maximum number of tables to return.

class QueryNestedAggregation(type, path, matching_results, *, aggregations=None)[source]

Bases: object

A restriction that alter the document set used for sub aggregations it precedes to nested documents found in the field specified.

Attr str path

The path to the document field to scope sub aggregations to.

Attr int matching_results

Number of nested documents found in the specified field.

Attr list[QueryAggregation] aggregations

(optional) An array of sub aggregations.

class QueryNoticesResponse(*, matching_results=None, notices=None)[source]

Bases: object

Object containing notice query results.

Attr int matching_results

(optional) The number of matching results.

Attr list[Notice] notices

(optional) Array of document results that match the query.

class QueryResponse(*, matching_results=None, results=None, aggregations=None, retrieval_details=None, suggested_query=None, suggested_refinements=None, table_results=None)[source]

Bases: object

A response containing the documents and aggregations for the query.

Attr int matching_results

(optional) The number of matching results for the query.

Attr list[QueryResult] results

(optional) Array of document results for the query.

Attr list[QueryAggregation] aggregations

(optional) Array of aggregations for the query.

Attr RetrievalDetails retrieval_details

(optional) An object contain retrieval type information.

Attr str suggested_query

(optional) Suggested correction to the submitted natural_language_query value.

Attr list[QuerySuggestedRefinement] suggested_refinements

(optional) Array of suggested refinments.

Attr list[QueryTableResult] table_results

(optional) Array of table results.

class QueryResult(document_id, result_metadata, *, metadata=None, document_passages=None, **kwargs)[source]

Bases: object

Result document for the specified query.

Attr str document_id

The unique identifier of the document.

Attr dict metadata

(optional) Metadata of the document.

Attr QueryResultMetadata result_metadata

Metadata of a query result.

Attr list[QueryResultPassage] document_passages

(optional) Passages returned by Discovery.

class QueryResultMetadata(collection_id, *, document_retrieval_source=None, confidence=None)[source]

Bases: object

Metadata of a query result.

Attr str document_retrieval_source

(optional) The document retrieval source that produced this search result.

Attr str collection_id

The collection id associated with this training data set.

Attr float confidence

(optional) The confidence score for the given result. Calculated based on how relevant the result is estimated to be. confidence can range from 0.0 to 1.0. The higher the number, the more relevant the document. The confidence value for a result was calculated using the model specified in the document_retrieval_strategy field of the result set. This field is only returned if the natural_language_query parameter is specified in the query.

class DocumentRetrievalSourceEnum[source]

Bases: enum.Enum

The document retrieval source that produced this search result.

SEARCH = 'search'
CURATION = 'curation'
class QueryResultPassage(*, passage_text=None, start_offset=None, end_offset=None, field=None)[source]

Bases: object

A passage query result.

Attr str passage_text

(optional) The content of the extracted passage.

Attr int start_offset

(optional) The position of the first character of the extracted passage in the originating field.

Attr int end_offset

(optional) The position of the last character of the extracted passage in the originating field.

Attr str field

(optional) The label of the field from which the passage has been extracted.

class QuerySuggestedRefinement(*, text=None)[source]

Bases: object

A suggested additional query term or terms user to filter results.

Attr str text

(optional) The text used to filter.

class QueryTableResult(*, table_id=None, source_document_id=None, collection_id=None, table_html=None, table_html_offset=None, table=None)[source]

Bases: object

A tables whose content or context match a search query.

Attr str table_id

(optional) The identifier for the retrieved table.

Attr str source_document_id

(optional) The identifier of the document the table was retrieved from.

Attr str collection_id

(optional) The identifier of the collection the table was retrieved from.

Attr str table_html

(optional) HTML snippet of the table info.

Attr int table_html_offset

(optional) The offset of the table html snippet in the original document html.

Attr TableResultTable table

(optional) Full table object retrieved from Table Understanding Enrichment.

class QueryTermAggregation(type, field, *, count=None, results=None)[source]

Bases: object

Returns the top values for the field specified.

Attr str field

The field in the document used to generate top values from.

Attr int count

(optional) The number of top values returned.

Attr list[QueryTermAggregationResult] results

(optional) Array of top values for the field.

class QueryTermAggregationResult(key, matching_results, *, aggregations=None)[source]

Bases: object

Top value result for the term aggregation.

Attr str key

Value of the field with a non-zero frequency in the document set.

Attr int matching_results

Number of documents containing the ‘key’.

Attr list[QueryAggregation] aggregations

(optional) An array of sub aggregations.

class QueryTimesliceAggregation(type, field, interval, *, results=None)[source]

Bases: object

A specialized histogram aggregation that uses dates to create interval segments.

Attr str field

The date field name used to create the timeslice.

Attr str interval

The date interval value. Valid values are seconds, minutes, hours, days, weeks, and years.

Attr list[QueryTimesliceAggregationResult] results

(optional) Array of aggregation results.

class QueryTimesliceAggregationResult(key_as_string, key, matching_results, *, aggregations=None)[source]

Bases: object

A timeslice interval segment.

Attr str key_as_string

String date value of the upper bound for the timeslice interval in ISO-8601 format.

Attr int key

Numeric date value of the upper bound for the timeslice interval in UNIX miliseconds since epoch.

Attr int matching_results

Number of documents with the specified key as the upper bound.

Attr list[QueryAggregation] aggregations

(optional) An array of sub aggregations.

class QueryTopHitsAggregation(type, size, *, hits=None)[source]

Bases: object

Returns the top documents ranked by the score of the query.

Attr int size

The number of documents to return.

Attr QueryTopHitsAggregationResult hits

(optional)

class QueryTopHitsAggregationResult(matching_results, *, hits=None)[source]

Bases: object

A query response containing the matching documents for the preceding aggregations.

Attr int matching_results

Number of matching results.

Attr list[dict] hits

(optional) An array of the document results.

class RetrievalDetails(*, document_retrieval_strategy=None)[source]

Bases: object

An object contain retrieval type information.

Attr str document_retrieval_strategy

(optional) Indentifies the document retrieval strategy used for this query. relevancy_training indicates that the results were returned using a relevancy trained model.

Note: In the event of trained collections being queried, but the trained

model is not used to return results, the document_retrieval_strategy will be listed as untrained.

class DocumentRetrievalStrategyEnum[source]

Bases: enum.Enum

Indentifies the document retrieval strategy used for this query. relevancy_training indicates that the results were returned using a relevancy trained model.

Note: In the event of trained collections being queried, but the trained

model is not used to return results, the document_retrieval_strategy will be listed as untrained.

UNTRAINED = 'untrained'
RELEVANCY_TRAINING = 'relevancy_training'
class TableBodyCells(*, cell_id=None, location=None, text=None, row_index_begin=None, row_index_end=None, column_index_begin=None, column_index_end=None, row_header_ids=None, row_header_texts=None, row_header_texts_normalized=None, column_header_ids=None, column_header_texts=None, column_header_texts_normalized=None, attributes=None)[source]

Bases: object

Cells that are not table header, column header, or row header cells.

Attr str cell_id

(optional) The unique ID of the cell in the current table.

Attr TableElementLocation location

(optional) The numeric location of the identified element in the document, represented with two integers labeled begin and end.

Attr str text

(optional) The textual contents of this cell from the input document without associated markup content.

Attr int row_index_begin

(optional) The begin index of this cell’s row location in the current table.

Attr int row_index_end

(optional) The end index of this cell’s row location in the current table.

Attr int column_index_begin

(optional) The begin index of this cell’s column location in the current table.

Attr int column_index_end

(optional) The end index of this cell’s column location in the current table.

Attr list[TableRowHeaderIds] row_header_ids

(optional) A list of table row header ids.

Attr list[TableRowHeaderTexts] row_header_texts

(optional) A list of table row header texts.

Attr list[TableRowHeaderTextsNormalized] row_header_texts_normalized

(optional) A list of table row header texts normalized.

Attr list[TableColumnHeaderIds] column_header_ids

(optional) A list of table column header ids.

Attr list[TableColumnHeaderTexts] column_header_texts

(optional) A list of table column header texts.

Attr list[TableColumnHeaderTextsNormalized] column_header_texts_normalized

(optional) A list of table column header texts normalized.

Attr list[DocumentAttribute] attributes

(optional) A list of document attributes.

class TableCellKey(*, cell_id=None, location=None, text=None)[source]

Bases: object

A key in a key-value pair.

Attr str cell_id

(optional) The unique ID of the key in the table.

Attr TableElementLocation location

(optional) The numeric location of the identified element in the document, represented with two integers labeled begin and end.

Attr str text

(optional) The text content of the table cell without HTML markup.

class TableCellValues(*, cell_id=None, location=None, text=None)[source]

Bases: object

A value in a key-value pair.

Attr str cell_id

(optional) The unique ID of the value in the table.

Attr TableElementLocation location

(optional) The numeric location of the identified element in the document, represented with two integers labeled begin and end.

Attr str text

(optional) The text content of the table cell without HTML markup.

class TableColumnHeaderIds(*, id=None)[source]

Bases: object

An array of values, each being the id value of a column header that is applicable to the current cell.

Attr str id

(optional) The id value of a column header.

class TableColumnHeaderTexts(*, text=None)[source]

Bases: object

An array of values, each being the text value of a column header that is applicable to the current cell.

Attr str text

(optional) The text value of a column header.

class TableColumnHeaderTextsNormalized(*, text_normalized=None)[source]

Bases: object

If you provide customization input, the normalized version of the column header texts according to the customization; otherwise, the same value as column_header_texts.

Attr str text_normalized

(optional) The normalized version of a column header text.

class TableColumnHeaders(*, cell_id=None, location=None, text=None, text_normalized=None, row_index_begin=None, row_index_end=None, column_index_begin=None, column_index_end=None)[source]

Bases: object

Column-level cells, each applicable as a header to other cells in the same column as itself, of the current table.

Attr str cell_id

(optional) The unique ID of the cell in the current table.

Attr object location

(optional) The location of the column header cell in the current table as defined by its begin and end offsets, respectfully, in the input document.

Attr str text

(optional) The textual contents of this cell from the input document without associated markup content.

Attr str text_normalized

(optional) If you provide customization input, the normalized version of the cell text according to the customization; otherwise, the same value as text.

Attr int row_index_begin

(optional) The begin index of this cell’s row location in the current table.

Attr int row_index_end

(optional) The end index of this cell’s row location in the current table.

Attr int column_index_begin

(optional) The begin index of this cell’s column location in the current table.

Attr int column_index_end

(optional) The end index of this cell’s column location in the current table.

class TableElementLocation(begin, end)[source]

Bases: object

The numeric location of the identified element in the document, represented with two integers labeled begin and end.

Attr int begin

The element’s begin index.

Attr int end

The element’s end index.

class TableHeaders(*, cell_id=None, location=None, text=None, row_index_begin=None, row_index_end=None, column_index_begin=None, column_index_end=None)[source]

Bases: object

The contents of the current table’s header.

Attr str cell_id

(optional) The unique ID of the cell in the current table.

Attr object location

(optional) The location of the table header cell in the current table as defined by its begin and end offsets, respectfully, in the input document.

Attr str text

(optional) The textual contents of the cell from the input document without associated markup content.

Attr int row_index_begin

(optional) The begin index of this cell’s row location in the current table.

Attr int row_index_end

(optional) The end index of this cell’s row location in the current table.

Attr int column_index_begin

(optional) The begin index of this cell’s column location in the current table.

Attr int column_index_end

(optional) The end index of this cell’s column location in the current table.

class TableKeyValuePairs(*, key=None, value=None)[source]

Bases: object

Key-value pairs detected across cell boundaries.

Attr TableCellKey key

(optional) A key in a key-value pair.

Attr list[TableCellValues] value

(optional) A list of values in a key-value pair.

class TableResultTable(*, location=None, text=None, section_title=None, title=None, table_headers=None, row_headers=None, column_headers=None, key_value_pairs=None, body_cells=None, contexts=None)[source]

Bases: object

Full table object retrieved from Table Understanding Enrichment.

Attr TableElementLocation location

(optional) The numeric location of the identified element in the document, represented with two integers labeled begin and end.

Attr str text

(optional) The textual contents of the current table from the input document without associated markup content.

Attr TableTextLocation section_title

(optional) Text and associated location within a table.

Attr TableTextLocation title

(optional) Text and associated location within a table.

Attr list[TableHeaders] table_headers

(optional) An array of table-level cells that apply as headers to all the other cells in the current table.

Attr list[TableRowHeaders] row_headers

(optional) An array of row-level cells, each applicable as a header to other cells in the same row as itself, of the current table.

Attr list[TableColumnHeaders] column_headers

(optional) An array of column-level cells, each applicable as a header to other cells in the same column as itself, of the current table.

Attr list[TableKeyValuePairs] key_value_pairs

(optional) An array of key-value pairs identified in the current table.

Attr list[TableBodyCells] body_cells

(optional) An array of cells that are neither table header nor column header nor row header cells, of the current table with corresponding row and column header associations.

Attr list[TableTextLocation] contexts

(optional) An array of lists of textual entries across the document related to the current table being parsed.

class TableRowHeaderIds(*, id=None)[source]

Bases: object

An array of values, each being the id value of a row header that is applicable to this body cell.

Attr str id

(optional) The id values of a row header.

class TableRowHeaderTexts(*, text=None)[source]

Bases: object

An array of values, each being the text value of a row header that is applicable to this body cell.

Attr str text

(optional) The text value of a row header.

class TableRowHeaderTextsNormalized(*, text_normalized=None)[source]

Bases: object

If you provide customization input, the normalized version of the row header texts according to the customization; otherwise, the same value as row_header_texts.

Attr str text_normalized

(optional) The normalized version of a row header text.

class TableRowHeaders(*, cell_id=None, location=None, text=None, text_normalized=None, row_index_begin=None, row_index_end=None, column_index_begin=None, column_index_end=None)[source]

Bases: object

Row-level cells, each applicable as a header to other cells in the same row as itself, of the current table.

Attr str cell_id

(optional) The unique ID of the cell in the current table.

Attr TableElementLocation location

(optional) The numeric location of the identified element in the document, represented with two integers labeled begin and end.

Attr str text

(optional) The textual contents of this cell from the input document without associated markup content.

Attr str text_normalized

(optional) If you provide customization input, the normalized version of the cell text according to the customization; otherwise, the same value as text.

Attr int row_index_begin

(optional) The begin index of this cell’s row location in the current table.

Attr int row_index_end

(optional) The end index of this cell’s row location in the current table.

Attr int column_index_begin

(optional) The begin index of this cell’s column location in the current table.

Attr int column_index_end

(optional) The end index of this cell’s column location in the current table.

class TableTextLocation(*, text=None, location=None)[source]

Bases: object

Text and associated location within a table.

Attr str text

(optional) The text retrieved.

Attr TableElementLocation location

(optional) The numeric location of the identified element in the document, represented with two integers labeled begin and end.

class TrainingExample(document_id, collection_id, relevance, *, created=None, updated=None)[source]

Bases: object

Object containing example response details for a training query.

Attr str document_id

The document ID associated with this training example.

Attr str collection_id

The collection ID associated with this training example.

Attr int relevance

The relevance of the training example.

Attr date created

(optional) The date and time the example was created.

Attr date updated

(optional) The date and time the example was updated.

class TrainingQuery(natural_language_query, examples, *, query_id=None, filter=None, created=None, updated=None)[source]

Bases: object

Object containing training query details.

Attr str query_id

(optional) The query ID associated with the training query.

Attr str natural_language_query

The natural text query for the training query.

Attr str filter

(optional) The filter used on the collection before the natural_language_query is applied.

Attr date created

(optional) The date and time the query was created.

Attr date updated

(optional) The date and time the query was updated.

Attr list[TrainingExample] examples

Array of training examples.

class TrainingQuerySet(*, queries=None)[source]

Bases: object

Object specifying the training queries contained in the identified training set.

Attr list[TrainingQuery] queries

(optional) Array of training queries.