watson_developer_cloud.discovery_v1 module¶
The IBM Watson™ Discovery Service is a cognitive search and content analytics engine that you can add to applications to identify patterns, trends and actionable insights to drive better decision-making. Securely unify structured and unstructured data with pre-enriched content, and use a simplified query language to eliminate the need for manual filtering of results.
-
class
DiscoveryV1
(version, url='https://gateway.watsonplatform.net/discovery/api', username=None, password=None, iam_api_key=None, iam_access_token=None, iam_url=None)[source]¶ Bases:
watson_developer_cloud.watson_service.WatsonService
The Discovery V1 service.
-
default_url
= 'https://gateway.watsonplatform.net/discovery/api'¶
-
create_environment
(name, description=None, size=None, **kwargs)[source]¶ Create an environment.
Creates a new environment for private data. An environment must be created before collections can be created. Note: You can create only one environment for private data per service instance. An attempt to create another environment results in an error.
Parameters: Returns: A dict containing the Environment response.
Return type:
-
delete_environment
(environment_id, **kwargs)[source]¶ Delete environment.
Parameters: Returns: A dict containing the DeleteEnvironmentResponse response.
Return type:
-
get_environment
(environment_id, **kwargs)[source]¶ Get environment info.
Parameters: Returns: A dict containing the Environment response.
Return type:
-
list_environments
(name=None, **kwargs)[source]¶ List environments.
List existing environments for the service instance.
Parameters: Returns: A dict containing the ListEnvironmentsResponse response.
Return type:
-
list_fields
(environment_id, collection_ids, **kwargs)[source]¶ List fields across collections.
Gets a list of the unique fields (and their types) stored in the indexes of the specified collections.
Parameters: queried against. :param dict headers: A dict containing the request headers :return: A dict containing the ListCollectionFieldsResponse response. :rtype: dict
-
update_environment
(environment_id, name=None, description=None, **kwargs)[source]¶ Update an environment.
Updates an environment. The environment’s name and description parameters can be changed. You must specify a name for the environment.
Parameters: Returns: A dict containing the Environment response.
Return type:
-
create_configuration
(environment_id, name, description=None, conversions=None, enrichments=None, normalizations=None, source=None, **kwargs)[source]¶ Add configuration.
Creates a new configuration. If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration. The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.
Parameters: - environment_id (str) – The ID of the environment.
- name (str) – The name of the configuration.
- description (str) – The description of the configuration, if available.
- conversions (Conversions) – The document conversion settings for the
configuration. :param list[Enrichment] enrichments: An array of document enrichment settings for the configuration. :param list[NormalizationOperation] normalizations: Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array. :param Source source: Object containing source parameters for the configuration. :param dict headers: A dict containing the request headers :return: A dict containing the Configuration response. :rtype: dict
-
delete_configuration
(environment_id, configuration_id, **kwargs)[source]¶ Delete a configuration.
The deletion is performed unconditionally. A configuration deletion request succeeds even if the configuration is referenced by a collection or document ingestion. However, documents that have already been submitted for processing continue to use the deleted configuration. Documents are always processed with a snapshot of the configuration as it existed at the time the document was submitted.
Parameters: Returns: A dict containing the DeleteConfigurationResponse response.
Return type:
-
get_configuration
(environment_id, configuration_id, **kwargs)[source]¶ Get configuration details.
Parameters: Returns: A dict containing the Configuration response.
Return type:
-
list_configurations
(environment_id, name=None, **kwargs)[source]¶ List configurations.
Lists existing configurations for the service instance.
Parameters: Returns: A dict containing the ListConfigurationsResponse response.
Return type:
-
update_configuration
(environment_id, configuration_id, name, description=None, conversions=None, enrichments=None, normalizations=None, source=None, **kwargs)[source]¶ Update a configuration.
- Replaces an existing configuration.
- Completely replaces the original configuration.
- The configuration_id, updated, and created fields are accepted in
the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties.
- Documents are processed with a snapshot of the configuration as it was at the
time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.
Parameters: - environment_id (str) – The ID of the environment.
- configuration_id (str) – The ID of the configuration.
- name (str) – The name of the configuration.
- description (str) – The description of the configuration, if available.
- conversions (Conversions) – The document conversion settings for the
configuration. :param list[Enrichment] enrichments: An array of document enrichment settings for the configuration. :param list[NormalizationOperation] normalizations: Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array. :param Source source: Object containing source parameters for the configuration. :param dict headers: A dict containing the request headers :return: A dict containing the Configuration response. :rtype: dict
-
test_configuration_in_environment
(environment_id, configuration=None, step=None, configuration_id=None, file=None, metadata=None, file_content_type=None, filename=None, **kwargs)[source]¶ Test configuration.
Runs a sample document through the default or your configuration and returns diagnostic information designed to help you understand how the document was processed. The document is not added to the index.
Parameters: this part is provided, then the provided configuration is used to process the document. If the configuration_id is also provided (both are present at the same time), then request is rejected. The maximum supported configuration size is 1 MB. Configuration parts larger than 1 MB are rejected. See the GET /configurations/{configuration_id} operation for an example configuration. :param str step: Specify to only run the input document through the given step instead of running the input document through the entire ingestion workflow. Valid values are convert, enrich, and normalize. :param str configuration_id: The ID of the configuration to use to process the document. If the configuration form part is also provided (both are present at the same time), then the request will be rejected. :param file file: The content of the document to ingest. The maximum supported file size is 50 megabytes. Files larger than 50 megabytes is rejected. :param str metadata: If you’re using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {
“Creator”: “Johnny Appleseed”, “Subject”: “Apples”} ``. :param str file_content_type: The content type of file. :param str filename: The filename for file. :param dict headers: A `dict containing the request headers :return: A dict containing the TestDocument response. :rtype: dict
-
create_collection
(environment_id, name, description=None, configuration_id=None, language=None, **kwargs)[source]¶ Create a collection.
Parameters: is to be created. :param str language: The language of the documents stored in the collection, in the form of an ISO 639-1 language code. :param dict headers: A dict containing the request headers :return: A dict containing the Collection response. :rtype: dict
-
delete_collection
(environment_id, collection_id, **kwargs)[source]¶ Delete a collection.
Parameters: Returns: A dict containing the DeleteCollectionResponse response.
Return type:
-
get_collection
(environment_id, collection_id, **kwargs)[source]¶ Get collection details.
Parameters: Returns: A dict containing the Collection response.
Return type:
-
list_collection_fields
(environment_id, collection_id, **kwargs)[source]¶ List collection fields.
Gets a list of the unique fields (and their types) stored in the index.
Parameters: Returns: A dict containing the ListCollectionFieldsResponse response.
Return type:
-
list_collections
(environment_id, name=None, **kwargs)[source]¶ List collections.
Lists existing collections for the service instance.
Parameters: Returns: A dict containing the ListCollectionsResponse response.
Return type:
-
update_collection
(environment_id, collection_id, name, description=None, configuration_id=None, **kwargs)[source]¶ Update a collection.
Parameters: is to be updated. :param dict headers: A dict containing the request headers :return: A dict containing the Collection response. :rtype: dict
-
create_expansions
(environment_id, collection_id, expansions, **kwargs)[source]¶ Create or update expansion list.
Create or replace the Expansion list for this collection. The maximum number of expanded terms per collection is 500. The current expansion list is replaced with the uploaded content.
Parameters: will be expanded into other terms. Each expansion object can be configured as bidirectional or unidirectional. Bidirectional means that all terms are expanded to all other terms in the object. Unidirectional means that a set list of terms can be expanded into a second list of terms.
To create a bi-directional expansion specify an expanded_terms array. Whenfound in a query, all items in the expanded_terms array are then expanded to the other items in the same array.
To create a uni-directional expansion, specify both an array of input_termsand an array of expanded_terms. When items in the input_terms array are present in a query, they are expanded using the items listed in the expanded_terms array. :param dict headers: A dict containing the request headers :return: A dict containing the Expansions response. :rtype: dict
-
delete_expansions
(environment_id, collection_id, **kwargs)[source]¶ Delete the expansion list.
Remove the expansion information for this collection. The expansion list must be deleted to disable query expansion for a collection.
Parameters: Return type: None
-
list_expansions
(environment_id, collection_id, **kwargs)[source]¶ Get the expansion list.
Returns the current expansion list for the specified collection. If an expansion list is not specified, an object with empty expansion arrays is returned.
Parameters: Returns: A dict containing the Expansions response.
Return type:
-
add_document
(environment_id, collection_id, file=None, metadata=None, file_content_type=None, filename=None, **kwargs)[source]¶ Add a document.
- Add a document to a collection with optional metadata.
- The version query parameter is still required.
- Returns immediately after the system has accepted the document for processing.
- The user must provide document content, metadata, or both. If the request is
- missing both document content and metadata, it is rejected.
- The user can set the Content-Type parameter on the file part to
indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example, application/octet-stream), then the service attempts to automatically detect the document’s media type.
- The following field names are reserved and will be filtered out if present
after normalization: id, score, highlight, and any field with the prefix of: _, +, or -
- Fields with empty name values after normalization are filtered out before
- indexing.
- Fields containing the following characters after normalization are filtered
out before indexing: # and ,.
Parameters: file size is 50 megabytes. Files larger than 50 megabytes is rejected. :param str metadata: If you’re using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {
“Creator”: “Johnny Appleseed”, “Subject”: “Apples”} ``. :param str file_content_type: The content type of file. :param str filename: The filename for file. :param dict headers: A `dict containing the request headers :return: A dict containing the DocumentAccepted response. :rtype: dict
-
delete_document
(environment_id, collection_id, document_id, **kwargs)[source]¶ Delete a document.
If the given document ID is invalid, or if the document is not found, then the a success response is returned (HTTP status code 200) with the status set to ‘deleted’.
Parameters: Returns: A dict containing the DeleteDocumentResponse response.
Return type:
-
get_document_status
(environment_id, collection_id, document_id, **kwargs)[source]¶ Get document details.
Fetch status details about a submitted document. Note: this operation does not return the document itself. Instead, it returns only the document’s processing status and any notices (warnings or errors) that were generated when the document was ingested. Use the query API to retrieve the actual document content.
Parameters: Returns: A dict containing the DocumentStatus response.
Return type:
-
update_document
(environment_id, collection_id, document_id, file=None, metadata=None, file_content_type=None, filename=None, **kwargs)[source]¶ Update a document.
Replace an existing document. Starts ingesting a document with optional metadata.
Parameters: file size is 50 megabytes. Files larger than 50 megabytes is rejected. :param str metadata: If you’re using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: ``` {
“Creator”: “Johnny Appleseed”, “Subject”: “Apples”} ``. :param str file_content_type: The content type of file. :param str filename: The filename for file. :param dict headers: A `dict containing the request headers :return: A dict containing the DocumentAccepted response. :rtype: dict
-
federated_query
(environment_id, collection_ids, filter=None, query=None, natural_language_query=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, deduplicate=None, deduplicate_field=None, similar=None, similar_document_ids=None, similar_fields=None, passages=None, passages_fields=None, passages_count=None, passages_characters=None, **kwargs)[source]¶ Query documents in multiple collections.
See the [Discovery service documentation](https://console.bluemix.net/docs/services/discovery/using.html) for more details.
Parameters: queried against. :param str filter: A cacheable query that limits the documents returned to exclude any documents that don’t mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time. :param str aggregation: An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference. :param int count: Number of documents to return. :param list[str] return_fields: A comma separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results. :param list[str] sort: A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param bool highlight: When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false. :param bool deduplicate: When true and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only; offset is not considered. This parameter is currently Beta functionality. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param list[str] similar_document_ids: A comma-separated list of document IDs that will be used to find similar documents. Note: If the natural_language_query parameter is also specified, it will be used to expand the scope of the document similarity search to include the natural language query. Other query parameters, such as filter and query are subsequently applied and reduce the query scope. :param list[str] similar_fields: A comma-separated list of field names that will be used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param bool passages: A passages query that returns the most relevant passages from the results. :param list[str] passages_fields: A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included. :param int passages_count: The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100. :param int passages_characters: The approximate number of characters that any one passage will have. The default is 400. The minimum is 50. The maximum is 2000. :param dict headers: A dict containing the request headers :return: A dict containing the QueryResponse response. :rtype: dict
-
federated_query_notices
(environment_id, collection_ids, filter=None, query=None, natural_language_query=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, deduplicate_field=None, similar=None, similar_document_ids=None, similar_fields=None, **kwargs)[source]¶ Query multiple collection system notices.
Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the [Discovery service documentation](https://console.bluemix.net/docs/services/discovery/using.html) for more details on the query language.
Parameters: queried against. :param str filter: A cacheable query that limits the documents returned to exclude any documents that don’t mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time. :param str aggregation: An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference. :param int count: Number of documents to return. :param list[str] return_fields: A comma separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results. :param list[str] sort: A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param bool highlight: When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param list[str] similar_document_ids: A comma-separated list of document IDs that will be used to find similar documents. Note: If the natural_language_query parameter is also specified, it will be used to expand the scope of the document similarity search to include the natural language query. Other query parameters, such as filter and query are subsequently applied and reduce the query scope. :param list[str] similar_fields: A comma-separated list of field names that will be used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param dict headers: A dict containing the request headers :return: A dict containing the QueryNoticesResponse response. :rtype: dict
-
query
(environment_id, collection_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate=None, deduplicate_field=None, similar=None, similar_document_ids=None, similar_fields=None, **kwargs)[source]¶ Query your collection.
After your content is uploaded and enriched by the Discovery service, you can build queries to search your content. For details, see the [Discovery service documentation](https://console.bluemix.net/docs/services/discovery/using.html).
Parameters: any documents that don’t mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time. :param bool passages: A passages query that returns the most relevant passages from the results. :param str aggregation: An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference. :param int count: Number of documents to return. :param list[str] return_fields: A comma separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results. :param list[str] sort: A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param bool highlight: When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false. :param list[str] passages_fields: A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included. :param int passages_count: The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100. :param int passages_characters: The approximate number of characters that any one passage will have. The default is 400. The minimum is 50. The maximum is 2000. :param bool deduplicate: When true and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only; offset is not considered. This parameter is currently Beta functionality. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param list[str] similar_document_ids: A comma-separated list of document IDs that will be used to find similar documents. Note: If the natural_language_query parameter is also specified, it will be used to expand the scope of the document similarity search to include the natural language query. Other query parameters, such as filter and query are subsequently applied and reduce the query scope. :param list[str] similar_fields: A comma-separated list of field names that will be used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param dict headers: A dict containing the request headers :return: A dict containing the QueryResponse response. :rtype: dict
-
query_entities
(environment_id, collection_id, feature=None, entity=None, context=None, count=None, evidence_count=None, **kwargs)[source]¶ Knowledge Graph entity query.
See the [Knowledge Graph documentation](https://console.bluemix.net/docs/services/discovery/building-kg.html) for more details.
Parameters: disambiguate and similar_entities. :param QueryEntitiesEntity entity: A text string that appears within the entity text field. :param QueryEntitiesContext context: Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England. :param int count: The number of results to return. The default is 10. The maximum is 1000. :param int evidence_count: The number of evidence items to return for each result. The default is 0. The maximum number of evidence items per query is 10,000. :param dict headers: A dict containing the request headers :return: A dict containing the QueryEntitiesResponse response. :rtype: dict
-
query_notices
(environment_id, collection_id, filter=None, query=None, natural_language_query=None, passages=None, aggregation=None, count=None, return_fields=None, offset=None, sort=None, highlight=None, passages_fields=None, passages_count=None, passages_characters=None, deduplicate_field=None, similar=None, similar_document_ids=None, similar_fields=None, **kwargs)[source]¶ Query system notices.
Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the [Discovery service documentation](https://console.bluemix.net/docs/services/discovery/using.html) for more details on the query language.
Parameters: any documents that don’t mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set. :param str query: A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time. :param str natural_language_query: A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time. :param bool passages: A passages query that returns the most relevant passages from the results. :param str aggregation: An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference. :param int count: Number of documents to return. :param list[str] return_fields: A comma separated list of the portion of the document hierarchy to return. :param int offset: The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results. :param list[str] sort: A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified. :param bool highlight: When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false. :param list[str] passages_fields: A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included. :param int passages_count: The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100. :param int passages_characters: The approximate number of characters that any one passage will have. The default is 400. The minimum is 50. The maximum is 2000. :param str deduplicate_field: When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality. :param bool similar: When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter. :param list[str] similar_document_ids: A comma-separated list of document IDs that will be used to find similar documents. Note: If the natural_language_query parameter is also specified, it will be used to expand the scope of the document similarity search to include the natural language query. Other query parameters, such as filter and query are subsequently applied and reduce the query scope. :param list[str] similar_fields: A comma-separated list of field names that will be used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison. :param dict headers: A dict containing the request headers :return: A dict containing the QueryNoticesResponse response. :rtype: dict
-
query_relations
(environment_id, collection_id, entities=None, context=None, sort=None, filter=None, count=None, evidence_count=None, **kwargs)[source]¶ Knowledge Graph relationship query.
See the [Knowledge Graph documentation](https://console.bluemix.net/docs/services/discovery/building-kg.html) for more details.
Parameters: - environment_id (str) – The ID of the environment.
- collection_id (str) – The ID of the collection.
- entities (list[QueryRelationsEntity]) – An array of entities to find
relationships for. :param QueryEntitiesContext context: Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England. :param str sort: The sorting method for the relationships, can be score or frequency. frequency is the number of unique times each entity is identified. The default is score. :param QueryRelationsFilter filter: Filters to apply to the relationship query. :param int count: The number of results to return. The default is 10. The maximum is 1000. :param int evidence_count: The number of evidence items to return for each result. The default is 0. The maximum number of evidence items per query is 10,000. :param dict headers: A dict containing the request headers :return: A dict containing the QueryRelationsResponse response. :rtype: dict
-
add_training_data
(environment_id, collection_id, natural_language_query=None, filter=None, examples=None, **kwargs)[source]¶ Add query to training data.
Adds a query to the training data for this collection. The query can contain a filter and natural language query.
Parameters: Returns: A dict containing the TrainingQuery response.
Return type:
-
create_training_example
(environment_id, collection_id, query_id, document_id=None, cross_reference=None, relevance=None, **kwargs)[source]¶ Add example to training data query.
Adds a example to this training data query.
Parameters: Returns: A dict containing the TrainingExample response.
Return type:
-
delete_all_training_data
(environment_id, collection_id, **kwargs)[source]¶ Delete all training data.
Deletes all training data from a collection.
Parameters: Return type: None
-
delete_training_data
(environment_id, collection_id, query_id, **kwargs)[source]¶ Delete a training data query.
Removes the training data query and all associated examples from the training data set.
Parameters: Return type: None
-
delete_training_example
(environment_id, collection_id, query_id, example_id, **kwargs)[source]¶ Delete example for training data query.
Deletes the example document with the given ID from the training data query.
Parameters: Return type: None
-
get_training_data
(environment_id, collection_id, query_id, **kwargs)[source]¶ Get details about a query.
Gets details for a specific training data query, including the query string and all examples.
Parameters: Returns: A dict containing the TrainingQuery response.
Return type:
-
get_training_example
(environment_id, collection_id, query_id, example_id, **kwargs)[source]¶ Get details for training data example.
Gets the details for this training example.
Parameters: Returns: A dict containing the TrainingExample response.
Return type:
-
list_training_data
(environment_id, collection_id, **kwargs)[source]¶ List training data.
Lists the training data for the specified collection.
Parameters: Returns: A dict containing the TrainingDataSet response.
Return type:
-
list_training_examples
(environment_id, collection_id, query_id, **kwargs)[source]¶ List examples for a training data query.
List all examples for this training data query.
Parameters: Returns: A dict containing the TrainingExampleList response.
Return type:
-
update_training_example
(environment_id, collection_id, query_id, example_id, cross_reference=None, relevance=None, **kwargs)[source]¶ Change label or cross reference for example.
Changes the label or cross reference query for this training data example.
Parameters: - environment_id (str) – The ID of the environment.
- collection_id (str) – The ID of the collection.
- query_id (str) – The ID of the query used for training.
- example_id (str) – The ID of the document as it is indexed.
- cross_reference (str) –
- relevance (int) –
- headers (dict) – A dict containing the request headers
Returns: A dict containing the TrainingExample response.
Return type:
-
delete_user_data
(customer_id, **kwargs)[source]¶ Delete labeled data.
Deletes all data associated with a specified customer ID. The method has no effect if no data is associated with the customer ID. You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes data. For more information about personal data and customer IDs, see [Information security](https://console.bluemix.net/docs/services/discovery/information-security.html).
Parameters: Return type: None
-
create_credentials
(environment_id, source_type=None, credential_details=None, **kwargs)[source]¶ Create credentials.
Creates a set of credentials to connect to a remote source. Created credentials are used in a configuration to associate a collection with the remote source. Note: All credentials are sent over an encrypted connection and encrypted at rest.
Parameters: - box indicates the credentials are used to connect an instance of Enterprise
Box. - salesforce indicates the credentials are used to connect to Salesforce. - sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online. :param CredentialDetails credential_details: Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source. :param dict headers: A dict containing the request headers :return: A dict containing the Credentials response. :rtype: dict
-
delete_credentials
(environment_id, credential_id, **kwargs)[source]¶ Delete credentials.
Deletes a set of stored credentials from your Discovery instance.
Parameters: Returns: A dict containing the DeleteCredentials response.
Return type:
-
get_credentials
(environment_id, credential_id, **kwargs)[source]¶ View Credentials.
- Returns details about the specified credentials.
- Note: Secure credential information such as a password or SSH key is never
returned and must be obtained from the source system.
Parameters: Returns: A dict containing the Credentials response.
Return type:
-
list_credentials
(environment_id, **kwargs)[source]¶ List credentials.
- List all the source credentials that have been created for this service instance.
- Note: All credentials are sent over an encrypted connection and encrypted at
rest.
Parameters: Returns: A dict containing the CredentialsList response.
Return type:
-
update_credentials
(environment_id, credential_id, source_type=None, credential_details=None, **kwargs)[source]¶ Update credentials.
Updates an existing set of source credentials. Note: All credentials are sent over an encrypted connection and encrypted at rest.
Parameters: - box indicates the credentials are used to connect an instance of Enterprise
Box. - salesforce indicates the credentials are used to connect to Salesforce. - sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online. :param CredentialDetails credential_details: Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source. :param dict headers: A dict containing the request headers :return: A dict containing the Credentials response. :rtype: dict
-
-
class
AggregationResult
(key=None, matching_results=None, aggregations=None)[source]¶ Bases:
object
AggregationResult.
Attr str key: (optional) Key that matched the aggregation type. Attr int matching_results: (optional) Number of matching results. Attr list[QueryAggregation] aggregations: (optional) Aggregations returned in the case of chained aggregations.
-
class
Collection
(collection_id=None, name=None, description=None, created=None, updated=None, status=None, configuration_id=None, language=None, document_counts=None, disk_usage=None, training_status=None, source_crawl=None)[source]¶ Bases:
object
A collection for storing documents.
Attr str collection_id: (optional) The unique identifier of the collection. Attr str name: (optional) The name of the collection. Attr str description: (optional) The description of the collection. Attr datetime created: (optional) The creation date of the collection in the format yyyy-MM-dd’T’HH:mmcon:ss.SSS’Z’. :attr datetime updated: (optional) The timestamp of when the collection was last updated in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr str status: (optional) The status of the collection. :attr str configuration_id: (optional) The unique identifier of the collection’s configuration. :attr str language: (optional) The language of the documents stored in the collection. Permitted values include en (English), de (German), and es (Spanish). :attr DocumentCounts document_counts: (optional) The object providing information about the documents in the collection. Present only when retrieving details of a collection. :attr CollectionDiskUsage disk_usage: (optional) The object providing information about the disk usage of the collection. Present only when retrieving details of a collection. :attr TrainingStatus training_status: (optional) Provides information about the status of relevance training for collection. :attr SourceStatus source_crawl: (optional) Object containing source crawl status information.
-
class
CollectionDiskUsage
(used_bytes=None)[source]¶ Bases:
object
Summary of the disk usage statistics for this collection.
Attr int used_bytes: (optional) Number of bytes used by the collection.
-
class
CollectionUsage
(available=None, maximum_allowed=None)[source]¶ Bases:
object
Summary of the collection usage in the environment.
Attr int available: (optional) Number of active collections in the environment. Attr int maximum_allowed: (optional) Total number of collections allowed in the environment.
-
class
Configuration
(name, configuration_id=None, created=None, updated=None, description=None, conversions=None, enrichments=None, normalizations=None, source=None)[source]¶ Bases:
object
A custom configuration for the environment.
Attr str configuration_id: (optional) The unique identifier of the configuration. Attr str name: The name of the configuration. Attr datetime created: (optional) The creation date of the configuration in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr datetime updated: (optional) The timestamp of when the configuration was last updated in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr str description: (optional) The description of the configuration, if available. :attr Conversions conversions: (optional) The document conversion settings for the configuration. :attr list[Enrichment] enrichments: (optional) An array of document enrichment settings for the configuration. :attr list[NormalizationOperation] normalizations: (optional) Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array. :attr Source source: (optional) Object containing source parameters for the configuration.
-
class
Conversions
(pdf=None, word=None, html=None, segment=None, json_normalizations=None)[source]¶ Bases:
object
Document conversion settings.
Attr PdfSettings pdf: (optional) A list of PDF conversion settings. Attr WordSettings word: (optional) A list of Word conversion settings. Attr HtmlSettings html: (optional) A list of HTML conversion settings. Attr SegmentSettings segment: (optional) A list of Document Segmentation settings. Attr list[NormalizationOperation] json_normalizations: (optional) Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.
-
class
CredentialDetails
(credential_type=None, client_id=None, enterprise_id=None, url=None, username=None, organization_url=None, site_collection_path=None, client_secret=None, public_key_id=None, private_key=None, passphrase=None, password=None)[source]¶ Bases:
object
Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source.
Attr str credential_type: (optional) The authentication method for this credentials definition. The credential_type specified must be supported by the source_type. The following combinations are possible: - “source_type”: “box” - valid credential_type`s: `oauth2 - “source_type”: “salesforce” - valid credential_type`s: `username_password - “source_type”: “sharepoint” - valid credential_type`s: `saml. :attr str client_id: (optional) The client_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. :attr str enterprise_id: (optional) The enterprise_id of the Box site that these credentials connect to. Only valid, and required, with a source_type of box. :attr str url: (optional) The url of the source that these credentials connect to. Only valid, and required, with a credential_type of username_password. :attr str username: (optional) The username of the source that these credentials connect to. Only valid, and required, with a credential_type of saml and username_password. :attr str organization_url: (optional) The organization_url of the source that these credentials connect to. Only valid, and required, with a credential_type of saml. :attr str site_collection_path: (optional) The site_collection.path of the source that these credentials connect to. Only valid, and required, with a source_type of sharepoint. :attr str client_secret: (optional) The client_secret of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials. :attr str public_key_id: (optional) The public_key_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials. :attr str private_key: (optional) The private_key of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials. :attr str passphrase: (optional) The passphrase of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials. :attr str password: (optional) The password of the source that these credentials connect to. Only valid, and required, with credential_type**s of `saml` and `username_password`. **Note: When used with a source_type of salesforce, the password consists of the Salesforce password and a valid Salesforce security token concatenated. This value is never returned and is only used when creating or modifying credentials.
-
class
Credentials
(credential_id=None, source_type=None, credential_details=None)[source]¶ Bases:
object
Object containing credential information.
Attr str credential_id: (optional) Unique identifier for this set of credentials. Attr str source_type: (optional) The source that this credentials object connects to. - box indicates the credentials are used to connect an instance of Enterprise Box.
- salesforce indicates the credentials are used to connect to Salesforce.
- sharepoint indicates the credentials are used to connect to Microsoft SharePoint
Online. :attr CredentialDetails credential_details: (optional) Object containing details of the stored credentials. Obtain credentials for your source from the administrator of the source.
-
class
CredentialsList
(credentials=None)[source]¶ Bases:
object
CredentialsList.
Attr list[Credentials] credentials: (optional) An array of credential definitions that were created for this instance.
-
class
DeleteCollectionResponse
(collection_id, status)[source]¶ Bases:
object
DeleteCollectionResponse.
Attr str collection_id: The unique identifier of the collection that is being deleted. :attr str status: The status of the collection. The status of a successful deletion operation is deleted.
-
class
DeleteConfigurationResponse
(configuration_id, status, notices=None)[source]¶ Bases:
object
DeleteConfigurationResponse.
Attr str configuration_id: The unique identifier for the configuration. Attr str status: Status of the configuration. A deleted configuration has the status deleted. :attr list[Notice] notices: (optional) An array of notice messages, if any.
-
class
DeleteCredentials
(credential_id=None, status=None)[source]¶ Bases:
object
Object returned after credentials are deleted.
Attr str credential_id: (optional) The unique identifier of the credentials that have been deleted. :attr str status: (optional) The status of the deletion request.
-
class
DeleteDocumentResponse
(document_id=None, status=None)[source]¶ Bases:
object
DeleteDocumentResponse.
Attr str document_id: (optional) The unique identifier of the document. Attr str status: (optional) Status of the document. A deleted document has the status deleted.
-
class
DeleteEnvironmentResponse
(environment_id, status)[source]¶ Bases:
object
DeleteEnvironmentResponse.
Attr str environment_id: The unique identifier for the environment. Attr str status: Status of the environment.
-
class
DiskUsage
(used_bytes=None, maximum_allowed_bytes=None, total_bytes=None, used=None, total=None, percent_used=None)[source]¶ Bases:
object
Summary of the disk usage statistics for the environment.
Attr int used_bytes: (optional) Number of bytes within the environment’s disk capacity that are currently used to store data. :attr int maximum_allowed_bytes: (optional) Total number of bytes available in the environment’s disk capacity. :attr int total_bytes: (optional) Deprecated: Total number of bytes available in the environment’s disk capacity. :attr str used: (optional) Deprecated: Amount of disk capacity used, in KB or GB format. :attr str total: (optional) Deprecated: Total amount of the environment’s disk capacity, in KB or GB format. :attr float percent_used: (optional) Deprecated: Percentage of the environment’s disk capacity that is being used.
-
class
DocumentAccepted
(document_id=None, status=None, notices=None)[source]¶ Bases:
object
DocumentAccepted.
Attr str document_id: (optional) The unique identifier of the ingested document. Attr str status: (optional) Status of the document in the ingestion process. Attr list[Notice] notices: (optional) Array of notices produced by the document-ingestion process.
-
class
DocumentCounts
(available=None, processing=None, failed=None)[source]¶ Bases:
object
DocumentCounts.
Attr int available: (optional) The total number of available documents in the collection. :attr int processing: (optional) The number of documents in the collection that are currently being processed. :attr int failed: (optional) The number of documents in the collection that failed to be ingested.
-
class
DocumentSnapshot
(step=None, snapshot=None)[source]¶ Bases:
object
DocumentSnapshot.
Attr str step: (optional) Attr object snapshot: (optional)
-
class
DocumentStatus
(document_id, status, status_description, notices, configuration_id=None, created=None, updated=None, filename=None, file_type=None, sha1=None)[source]¶ Bases:
object
Status information about a submitted document.
Attr str document_id: The unique identifier of the document. Attr str configuration_id: (optional) The unique identifier for the configuration. Attr datetime created: (optional) The creation date of the document in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr datetime updated: (optional) Date of the most recent document update, in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr str status: Status of the document in the ingestion process. :attr str status_description: Description of the document status. :attr str filename: (optional) Name of the original source file (if available). :attr str file_type: (optional) The type of the original source file. :attr str sha1: (optional) The SHA-1 hash of the original source file (formatted as a hexadecimal string). :attr list[Notice] notices: Array of notices produced by the document-ingestion process.
-
class
Enrichment
(destination_field, source_field, enrichment_name, description=None, overwrite=None, ignore_downstream_errors=None, options=None)[source]¶ Bases:
object
Enrichment.
Attr str description: (optional) Describes what the enrichment step does. Attr str destination_field: Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if text is a top-level field with no sub-fields, text.foo is a valid destination but text.foo.bar is not. :attr str source_field: Field to be enriched. :attr bool overwrite: (optional) Indicates that the enrichments will overwrite the destination_field field if it already exists. :attr str enrichment_name: Name of the enrichment service to call. Current options are natural_language_understanding and elements.
When using natual_language_understanding, the options object must contain- Natural Language Understanding options.
- When using elements the options object must contain Element Classification
options. Additionally, when using the elements enrichment the configuration specified and files ingested must meet all the criteria specified in [the documentation](https://console.bluemix.net/docs/services/discovery/element-classification.html)
Previous API versions also supported alchemy_language.Attr bool ignore_downstream_errors: (optional) If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing. :attr EnrichmentOptions options: (optional) A list of options specific to the enrichment.
-
class
EnrichmentOptions
(features=None, model=None)[source]¶ Bases:
object
Options which are specific to a particular enrichment.
Attr NluEnrichmentFeatures features: (optional) An object representing the enrichment features that will be applied to the specified field. :attr str model: (optional) For use with `elements` enrichments only. The element extraction model to use. Models available are: contract.
-
class
Environment
(environment_id=None, name=None, description=None, created=None, updated=None, status=None, read_only=None, size=None, index_capacity=None)[source]¶ Bases:
object
Details about an environment.
Attr str environment_id: (optional) Unique identifier for the environment. Attr str name: (optional) Name that identifies the environment. Attr str description: (optional) Description of the environment. Attr datetime created: (optional) Creation date of the environment, in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr datetime updated: (optional) Date of most recent environment update, in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr str status: (optional) Status of the environment. :attr bool read_only: (optional) If true, the environment contains read-only collections that are maintained by IBM. :attr int size: (optional) Deprecated: Size of the environment. :attr IndexCapacity index_capacity: (optional) Details about the resource usage and capacity of the environment.
-
class
EnvironmentDocuments
(indexed=None, maximum_allowed=None)[source]¶ Bases:
object
Summary of the document usage statistics for the environment.
Attr int indexed: (optional) Number of documents indexed for the environment. Attr int maximum_allowed: (optional) Total number of documents allowed in the environment’s capacity.
-
class
Expansion
(expanded_terms, input_terms=None)[source]¶ Bases:
object
An expansion definition. Each object respresents one set of expandable strings. For example, you could have expansions for the word hot in one object, and expansions for the word cold in another.
Attr list[str] input_terms: (optional) A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded. :attr list[str] expanded_terms: A list of terms that this expansion will be expanded to. If specified without input_terms, it also functions as the input term list.
-
class
Expansions
(expansions)[source]¶ Bases:
object
The query expansion definitions for the specified collection.
Attr list[Expansion] expansions: An array of query expansion definitions. Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as bidirectional or unidirectional. Bidirectional means that all terms are expanded to all other terms in the object. Unidirectional means that a set list of terms can be expanded into a second list of terms.
To create a bi-directional expansion specify an expanded_terms array. When foundin a query, all items in the expanded_terms array are then expanded to the other items in the same array.
To create a uni-directional expansion, specify both an array of input_terms andan array of expanded_terms. When items in the input_terms array are present in a query, they are expanded using the items listed in the expanded_terms array.
-
class
Field
(field_name=None, field_type=None)[source]¶ Bases:
object
Field.
Attr str field_name: (optional) The name of the field. Attr str field_type: (optional) The type of the field.
-
class
FontSetting
(level=None, min_size=None, max_size=None, bold=None, italic=None, name=None)[source]¶ Bases:
object
FontSetting.
Attr int level: (optional) Attr int min_size: (optional) Attr int max_size: (optional) Attr bool bold: (optional) Attr bool italic: (optional) Attr str name: (optional)
-
class
HtmlSettings
(exclude_tags_completely=None, exclude_tags_keep_content=None, keep_content=None, exclude_content=None, keep_tag_attributes=None, exclude_tag_attributes=None)[source]¶ Bases:
object
A list of HTML conversion settings.
Attr list[str] exclude_tags_completely: (optional) Attr list[str] exclude_tags_keep_content: (optional) Attr XPathPatterns keep_content: (optional) Attr XPathPatterns exclude_content: (optional) Attr list[str] keep_tag_attributes: (optional) Attr list[str] exclude_tag_attributes: (optional)
-
class
IndexCapacity
(documents=None, disk_usage=None, collections=None, memory_usage=None)[source]¶ Bases:
object
Details about the resource usage and capacity of the environment.
Attr EnvironmentDocuments documents: (optional) Summary of the document usage statistics for the environment. :attr DiskUsage disk_usage: (optional) Summary of the disk usage of the environment. :attr CollectionUsage collections: (optional) Summary of the collection usage in the environment. :attr MemoryUsage memory_usage: (optional) Deprecated: Summary of the memory usage of the environment.
-
class
ListCollectionFieldsResponse
(fields=None)[source]¶ Bases:
object
The list of fetched fields. The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.
- Fields which contain nested JSON objects are assigned a type of “nested”.
- Fields which belong to a nested object are prefixed with .properties (for
example, warnings.properties.severity means that the warnings object has a property called severity).
- Fields returned from the News collection are prefixed with
v{N}-fullnews-t3-{YEAR}.mappings (for example, v5-fullnews-t3-2016.mappings.text.properties.author).
Attr list[Field] fields: (optional) An array containing information about each field in the collections.
-
class
ListCollectionsResponse
(collections=None)[source]¶ Bases:
object
ListCollectionsResponse.
Attr list[Collection] collections: (optional) An array containing information about each collection in the environment.
-
class
ListConfigurationsResponse
(configurations=None)[source]¶ Bases:
object
ListConfigurationsResponse.
Attr list[Configuration] configurations: (optional) An array of Configurations that are available for the service instance.
-
class
ListEnvironmentsResponse
(environments=None)[source]¶ Bases:
object
ListEnvironmentsResponse.
Attr list[Environment] environments: (optional) An array of [environments] that are available for the service instance.
-
class
MemoryUsage
(used_bytes=None, total_bytes=None, used=None, total=None, percent_used=None)[source]¶ Bases:
object
Deprecated: Summary of the memory usage statistics for this environment.
Attr int used_bytes: (optional) Deprecated: Number of bytes used in the environment’s memory capacity. :attr int total_bytes: (optional) Deprecated: Total number of bytes available in the environment’s memory capacity. :attr str used: (optional) Deprecated: Amount of memory capacity used, in KB or GB format. :attr str total: (optional) Deprecated: Total amount of the environment’s memory capacity, in KB or GB format. :attr float percent_used: (optional) Deprecated: Percentage of the environment’s memory capacity that is being used.
-
class
NluEnrichmentCategories
(**kwargs)[source]¶ Bases:
object
An object that indicates the Categories enrichment will be applied to the specified field.
-
class
NluEnrichmentEmotion
(document=None, targets=None)[source]¶ Bases:
object
An object specifying the emotion detection enrichment and related parameters.
Attr bool document: (optional) When true, emotion detection is performed on the entire field. :attr list[str] targets: (optional) A comma-separated list of target strings that will have any associated emotions detected.
-
class
NluEnrichmentEntities
(sentiment=None, emotion=None, limit=None, mentions=None, mention_types=None, sentence_location=None, model=None)[source]¶ Bases:
object
An object speficying the Entities enrichment and related parameters.
Attr bool sentiment: (optional) When true, sentiment analysis of entities will be performed on the specified field. :attr bool emotion: (optional) When true, emotion detection of entities will be performed on the specified field. :attr int limit: (optional) The maximum number of entities to extract for each instance of the specified field. :attr bool mentions: (optional) When true, the number of mentions of each identified entity is recorded. The default is false. :attr bool mention_types: (optional) When true, the types of mentions for each idetifieid entity is recorded. The default is false. :attr bool sentence_location: (optional) When true, a list of sentence locations for each instance of each identified entity is recorded. The default is false. :attr str model: (optional) The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, or the default public model alchemy.
-
class
NluEnrichmentFeatures
(keywords=None, entities=None, sentiment=None, emotion=None, categories=None, semantic_roles=None, relations=None)[source]¶ Bases:
object
NluEnrichmentFeatures.
Attr NluEnrichmentKeywords keywords: (optional) An object specifying the Keyword enrichment and related parameters. :attr NluEnrichmentEntities entities: (optional) An object speficying the Entities enrichment and related parameters. :attr NluEnrichmentSentiment sentiment: (optional) An object specifying the sentiment extraction enrichment and related parameters. :attr NluEnrichmentEmotion emotion: (optional) An object specifying the emotion detection enrichment and related parameters. :attr NluEnrichmentCategories categories: (optional) An object specifying the categories enrichment and related parameters. :attr NluEnrichmentSemanticRoles semantic_roles: (optional) An object specifiying the semantic roles enrichment and related parameters. :attr NluEnrichmentRelations relations: (optional) An object specifying the relations enrichment and related parameters.
-
class
NluEnrichmentKeywords
(sentiment=None, emotion=None, limit=None)[source]¶ Bases:
object
An object specifying the Keyword enrichment and related parameters.
Attr bool sentiment: (optional) When true, sentiment analysis of keywords will be performed on the specified field. :attr bool emotion: (optional) When true, emotion detection of keywords will be performed on the specified field. :attr int limit: (optional) The maximum number of keywords to extract for each instance of the specified field.
-
class
NluEnrichmentRelations
(model=None)[source]¶ Bases:
object
An object specifying the relations enrichment and related parameters.
Attr str model: (optional) *For use with natural_language_understanding enrichments only.* The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, the default is`en-news`.
-
class
NluEnrichmentSemanticRoles
(entities=None, keywords=None, limit=None)[source]¶ Bases:
object
An object specifiying the semantic roles enrichment and related parameters.
Attr bool entities: (optional) When true, entities are extracted from the identified sentence parts. :attr bool keywords: (optional) When true, keywords are extracted from the identified sentence parts. :attr int limit: (optional) The maximum number of semantic roles enrichments to extact from each instance of the specified field.
-
class
NluEnrichmentSentiment
(document=None, targets=None)[source]¶ Bases:
object
An object specifying the sentiment extraction enrichment and related parameters.
Attr bool document: (optional) When true, sentiment analysis is performed on the entire field. :attr list[str] targets: (optional) A comma-separated list of target strings that will have any associated sentiment analyzed.
-
class
NormalizationOperation
(operation=None, source_field=None, destination_field=None)[source]¶ Bases:
object
NormalizationOperation.
Attr str operation: (optional) Identifies what type of operation to perform. copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a _copy_ followed by a _remove_). merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents. remove - Deletes the source_field field. The destination_field is ignored for this operation. remove_nulls - Removes all nested null (blank) field values from the JSON tree. source_field and destination_field are ignored by this operation because _remove_nulls_ operates on the entire JSON tree. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive). :attr str source_field: (optional) The source field for the operation. :attr str destination_field: (optional) The destination field for the operation.
-
class
Notice
(notice_id=None, created=None, document_id=None, query_id=None, severity=None, step=None, description=None)[source]¶ Bases:
object
A notice produced for the collection.
Attr str notice_id: (optional) Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action. :attr datetime created: (optional) The creation date of the collection in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z’. :attr str document_id: (optional) Unique identifier of the document. :attr str query_id: (optional) Unique identifier of the query used for relevance training. :attr str severity: (optional) Severity level of the notice. :attr str step: (optional) Ingestion or training step in which the notice occurred. :attr str description: (optional) The description of the notice.
-
class
PdfHeadingDetection
(fonts=None)[source]¶ Bases:
object
PdfHeadingDetection.
Attr list[FontSetting] fonts: (optional)
-
class
PdfSettings
(heading=None)[source]¶ Bases:
object
A list of PDF conversion settings.
Attr PdfHeadingDetection heading: (optional)
-
class
QueryAggregation
(type=None, results=None, matching_results=None, aggregations=None)[source]¶ Bases:
object
An aggregation produced by the Discovery service to analyze the input provided.
Attr str type: (optional) The type of aggregation command used. For example: term, filter, max, min, etc. :attr list[AggregationResult] results: (optional) :attr int matching_results: (optional) Number of matching results. :attr list[QueryAggregation] aggregations: (optional) Aggregations returned by the Discovery service.
-
class
QueryEntitiesContext
(text=None)[source]¶ Bases:
object
Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.
Attr str text: (optional) Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.
-
class
QueryEntitiesEntity
(text=None, type=None)[source]¶ Bases:
object
A text string that appears within the entity text field.
Attr str text: (optional) Entity text content. Attr str type: (optional) The type of the specified entity.
-
class
QueryEntitiesResponse
(entities=None)[source]¶ Bases:
object
An array of entities resulting from the query.
Attr list[QueryEntitiesResponseItem] entities: (optional)
-
class
QueryEntitiesResponseItem
(text=None, type=None, evidence=None)[source]¶ Bases:
object
Object containing Entity query response information.
Attr str text: (optional) Entity text content. Attr str type: (optional) The type of the result entity. Attr list[QueryEvidence] evidence: (optional) List of different evidentiary items to support the result.
-
class
QueryEvidence
(document_id=None, field=None, start_offset=None, end_offset=None, entities=None)[source]¶ Bases:
object
Description of evidence location supporting Knoweldge Graph query result.
Attr str document_id: (optional) The docuemnt ID (as indexed in Discovery) of the evidence location. :attr str field: (optional) The field of the document where the supporting evidence was identified. :attr int start_offset: (optional) The start location of the evidence in the identified field. This value is inclusive. :attr int end_offset: (optional) The end location of the evidence in the identified field. This value is inclusive. :attr list[QueryEvidenceEntity] entities: (optional) An array of entity objects that show evidence of the result.
-
class
QueryEvidenceEntity
(type=None, text=None, start_offset=None, end_offset=None)[source]¶ Bases:
object
Entity description and location within evidence field.
Attr str type: (optional) The entity type for this entity. Possible types vary based on model used. :attr str text: (optional) The original text of this entity as found in the evidence field. :attr int start_offset: (optional) The start location of the entity text in the identified field. This value is inclusive. :attr int end_offset: (optional) The end location of the entity text in the identified field. This value is exclusive.
-
class
QueryFilterType
(exclude=None, include=None)[source]¶ Bases:
object
QueryFilterType.
Attr list[str] exclude: (optional) A comma-separated list of types to exclude. Attr list[str] include: (optional) A comma-separated list of types to include. All other types are excluded.
-
class
QueryNoticesResponse
(matching_results=None, results=None, aggregations=None, passages=None, duplicates_removed=None)[source]¶ Bases:
object
QueryNoticesResponse.
Attr int matching_results: (optional) Attr list[QueryNoticesResult] results: (optional) Attr list[QueryAggregation] aggregations: (optional) Attr list[QueryPassages] passages: (optional) Attr int duplicates_removed: (optional)
-
class
QueryNoticesResult
(id=None, score=None, metadata=None, collection_id=None, result_metadata=None, code=None, filename=None, file_type=None, sha1=None, notices=None, **kwargs)[source]¶ Bases:
object
QueryNoticesResult.
Attr str id: (optional) The unique identifier of the document. Attr float score: (optional) Deprecated This field is now part of the result_metadata object. :attr object metadata: (optional) Metadata of the document. :attr str collection_id: (optional) The collection ID of the collection containing the document for this result. :attr QueryResultResultMetadata result_metadata: (optional) Metadata of the query result. :attr int code: (optional) The internal status code returned by the ingestion subsystem indicating the overall result of ingesting the source document. :attr str filename: (optional) Name of the original source file (if available). :attr str file_type: (optional) The type of the original source file. :attr str sha1: (optional) The SHA-1 hash of the original source file (formatted as a hexadecimal string). :attr list[Notice] notices: (optional) Array of notices for the document.
-
class
QueryPassages
(document_id=None, passage_score=None, passage_text=None, start_offset=None, end_offset=None, field=None)[source]¶ Bases:
object
QueryPassages.
Attr str document_id: (optional) The unique identifier of the document from which the passage has been extracted. :attr float passage_score: (optional) The confidence score of the passages’s analysis. A higher score indicates greater confidence. :attr str passage_text: (optional) The content of the extracted passage. :attr int start_offset: (optional) The position of the first character of the extracted passage in the originating field. :attr int end_offset: (optional) The position of the last character of the extracted passage in the originating field. :attr str field: (optional) The label of the field from which the passage has been extracted.
-
class
QueryRelationsArgument
(entities=None)[source]¶ Bases:
object
QueryRelationsArgument.
Attr list[QueryEntitiesEntity] entities: (optional)
-
class
QueryRelationsEntity
(text=None, type=None, exact=None)[source]¶ Bases:
object
QueryRelationsEntity.
Attr str text: (optional) Entity text content. Attr str type: (optional) The type of the specified entity. Attr bool exact: (optional) If false, implicit querying is performed. The default is false.
-
class
QueryRelationsFilter
(relation_types=None, entity_types=None, document_ids=None)[source]¶ Bases:
object
QueryRelationsFilter.
Attr QueryFilterType relation_types: (optional) A list of relation types to include or exclude from the query. :attr QueryFilterType entity_types: (optional) A list of entity types to include or exclude from the query. :attr list[str] document_ids: (optional) A comma-separated list of document IDs to include in the query.
-
class
QueryRelationsRelationship
(type=None, frequency=None, arguments=None, evidence=None)[source]¶ Bases:
object
QueryRelationsRelationship.
Attr str type: (optional) The identified relationship type. Attr int frequency: (optional) The number of times the relationship is mentioned. Attr list[QueryRelationsArgument] arguments: (optional) Information about the relationship. :attr list[QueryEvidence] evidence: (optional) List of different evidentiary items to support the result.
-
class
QueryRelationsResponse
(relations=None)[source]¶ Bases:
object
QueryRelationsResponse.
Attr list[QueryRelationsRelationship] relations: (optional)
-
class
QueryResponse
(matching_results=None, results=None, aggregations=None, passages=None, duplicates_removed=None, session_token=None)[source]¶ Bases:
object
A response containing the documents and aggregations for the query.
Attr int matching_results: (optional) Attr list[QueryResult] results: (optional) Attr list[QueryAggregation] aggregations: (optional) Attr list[QueryPassages] passages: (optional) Attr int duplicates_removed: (optional) Attr str session_token: (optional) The session token for this query. The session token can be used to add events associated with this query to the query and event log.
-
class
QueryResult
(id=None, score=None, metadata=None, collection_id=None, result_metadata=None, **kwargs)[source]¶ Bases:
object
QueryResult.
Attr str id: (optional) The unique identifier of the document. Attr float score: (optional) Deprecated This field is now part of the result_metadata object. :attr object metadata: (optional) Metadata of the document. :attr str collection_id: (optional) The collection ID of the collection containing the document for this result. :attr QueryResultResultMetadata result_metadata: (optional) Metadata of the query result.
-
class
QueryResultResultMetadata
(score=None, confidence=None)[source]¶ Bases:
object
Metadata of a query result.
Attr float score: (optional) The raw score of the result. A higher score indicates a greater match to the query parameters. :attr float confidence: (optional) The confidence score of the result’s analysis. A higher score indicates greater confidence.
-
class
SegmentSettings
(enabled=None, selector_tags=None)[source]¶ Bases:
object
A list of Document Segmentation settings.
Attr bool enabled: (optional) Enables/disables the Document Segmentation feature. Attr list[str] selector_tags: (optional) Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6.
-
class
Source
(type=None, credential_id=None, schedule=None, options=None)[source]¶ Bases:
object
Object containing source parameters for the configuration.
Attr str type: (optional) The type of source to connect to. - box indicates the configuration is to connect an instance of Enterprise Box.
- salesforce indicates the configuration is to connect to Salesforce.
- sharepoint indicates the configuration is to connect to Microsoft SharePoint
Online. :attr str credential_id: (optional) The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object. :attr SourceSchedule schedule: (optional) Object containing the schedule information for the source. :attr SourceOptions options: (optional) The options object defines which items to crawl from the source system.
-
class
SourceOptions
(folders=None, objects=None, site_collections=None)[source]¶ Bases:
object
The options object defines which items to crawl from the source system.
Attr list[SourceOptionsFolder] folders: (optional) Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to box. :attr list[SourceOptionsObject] objects: (optional) Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to salesforce. :attr list[SourceOptionsSiteColl] site_collections: (optional) Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to sharepoint.
-
class
SourceOptionsFolder
(owner_user_id, folder_id, limit=None)[source]¶ Bases:
object
Object that defines a box folder to crawl with this configuration.
Attr str owner_user_id: The Box user ID of the user who owns the folder to crawl. Attr str folder_id: The Box folder ID of the folder to crawl. Attr int limit: (optional) The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.
-
class
SourceOptionsObject
(name, limit=None)[source]¶ Bases:
object
Object that defines a Salesforce document object type crawl with this configuration.
Attr str name: The name of the Salesforce document object to crawl. For example, case. :attr int limit: (optional) The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.
-
class
SourceOptionsSiteColl
(site_collection_path, limit=None)[source]¶ Bases:
object
Object that defines a Microsoft SharePoint site collection to crawl with this configuration.
Attr str site_collection_path: The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration. :attr int limit: (optional) The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.
-
class
SourceSchedule
(enabled=None, time_zone=None, frequency=None)[source]¶ Bases:
object
Object containing the schedule information for the source.
Attr bool enabled: (optional) When true, the source is re-crawled based on the frequency field in this object. When false the source is not re-crawled; When false and connecting to Salesforce the source is crawled annually. :attr str time_zone: (optional) The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list. :attr str frequency: (optional) The crawl schedule in the specified time_zone. - daily: Runs every day between 00:00 and 06:00. - weekly: Runs every week on Sunday between 00:00 and 06:00. - monthly: Runs the on the first Sunday of every month between 00:00 and 06:00.
-
class
SourceStatus
(status=None, last_updated=None)[source]¶ Bases:
object
Object containing source crawl status information.
Attr str status: (optional) The current status of the source crawl for this collection. This field returns not_configured if the default configuration for this source does not have a source object defined. - running indicates that a crawl to fetch more documents is in progress. - complete indicates that the crawl has completed with no errors. - complete_with_notices indicates that some notices were generated during the crawl. Notices can be checked by using the notices query method. - stopped indicates that the crawl has stopped but is not complete. :attr datetime last_updated: (optional) Date in UTC format indicating when the last crawl was attempted. If null, no crawl was completed.
-
class
TestDocument
(configuration_id=None, status=None, enriched_field_units=None, original_media_type=None, snapshots=None, notices=None)[source]¶ Bases:
object
TestDocument.
Attr str configuration_id: (optional) The unique identifier for the configuration. Attr str status: (optional) Status of the preview operation. Attr int enriched_field_units: (optional) The number of 10-kB chunks of field data that were enriched. This can be used to estimate the cost of running a real ingestion. :attr str original_media_type: (optional) Format of the test document. :attr list[DocumentSnapshot] snapshots: (optional) An array of objects that describe each step in the preview process. :attr list[Notice] notices: (optional) An array of notice messages about the preview operation.
-
class
TopHitsResults
(matching_results=None, hits=None)[source]¶ Bases:
object
TopHitsResults.
Attr int matching_results: (optional) Number of matching results. Attr list[QueryResult] hits: (optional) Top results returned by the aggregation.
-
class
TrainingDataSet
(environment_id=None, collection_id=None, queries=None)[source]¶ Bases:
object
TrainingDataSet.
Attr str environment_id: (optional) Attr str collection_id: (optional) Attr list[TrainingQuery] queries: (optional)
-
class
TrainingExample
(document_id=None, cross_reference=None, relevance=None)[source]¶ Bases:
object
TrainingExample.
Attr str document_id: (optional) Attr str cross_reference: (optional) Attr int relevance: (optional)
-
class
TrainingExampleList
(examples=None)[source]¶ Bases:
object
TrainingExampleList.
Attr list[TrainingExample] examples: (optional)
-
class
TrainingQuery
(query_id=None, natural_language_query=None, filter=None, examples=None)[source]¶ Bases:
object
TrainingQuery.
Attr str query_id: (optional) Attr str natural_language_query: (optional) Attr str filter: (optional) Attr list[TrainingExample] examples: (optional)
-
class
TrainingStatus
(total_examples=None, available=None, processing=None, minimum_queries_added=None, minimum_examples_added=None, sufficient_label_diversity=None, notices=None, successfully_trained=None, data_updated=None)[source]¶ Bases:
object
TrainingStatus.
Attr int total_examples: (optional) Attr bool available: (optional) Attr bool processing: (optional) Attr bool minimum_queries_added: (optional) Attr bool minimum_examples_added: (optional) Attr bool sufficient_label_diversity: (optional) Attr int notices: (optional) Attr datetime successfully_trained: (optional) Attr datetime data_updated: (optional)
-
class
WordHeadingDetection
(fonts=None, styles=None)[source]¶ Bases:
object
WordHeadingDetection.
Attr list[FontSetting] fonts: (optional) Attr list[WordStyle] styles: (optional)
-
class
WordSettings
(heading=None)[source]¶ Bases:
object
A list of Word conversion settings.
Attr WordHeadingDetection heading: (optional)
-
class
WordStyle
(level=None, names=None)[source]¶ Bases:
object
WordStyle.
Attr int level: (optional) Attr list[str] names: (optional)
-
class
XPathPatterns
(xpaths=None)[source]¶ Bases:
object
XPathPatterns.
Attr list[str] xpaths: (optional)
-
class
Calculation
(type=None, results=None, matching_results=None, aggregations=None, field=None, value=None)[source]¶ Bases:
object
Calculation.
Attr str field: (optional) The field where the aggregation is located in the document. :attr float value: (optional) Value of the aggregation.
-
class
Filter
(type=None, results=None, matching_results=None, aggregations=None, match=None)[source]¶ Bases:
object
Filter.
Attr str match: (optional) The match the aggregated results queried for.
-
class
Histogram
(type=None, results=None, matching_results=None, aggregations=None, field=None, interval=None)[source]¶ Bases:
object
Histogram.
Attr str field: (optional) The field where the aggregation is located in the document. :attr int interval: (optional) Interval of the aggregation. (For ‘histogram’ type).
-
class
Nested
(type=None, results=None, matching_results=None, aggregations=None, path=None)[source]¶ Bases:
object
Nested.
Attr str path: (optional) The area of the results the aggregation was restricted to.
-
class
Term
(type=None, results=None, matching_results=None, aggregations=None, field=None, count=None)[source]¶ Bases:
object
Term.
Attr str field: (optional) The field where the aggregation is located in the document. :attr int count: (optional)
-
class
Timeslice
(type=None, results=None, matching_results=None, aggregations=None, field=None, interval=None, anomaly=None)[source]¶ Bases:
object
Timeslice.
Attr str field: (optional) The field where the aggregation is located in the document. :attr str interval: (optional) Interval of the aggregation. Valid date interval values are second/seconds minute/minutes, hour/hours, day/days, week/weeks, month/months, and year/years. :attr bool anomaly: (optional) Used to inducate that anomaly detection should be performed. Anomaly detection is used to locate unusual datapoints within a time series.