Watson Developer Cloud .NET Standard SDK  4.0.0
The .NET SDK uses the Watson Developer Cloud services, a collection of REST APIs and SDKs that use cognitive computing to solve complex problems.
IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl Class Reference

Object defining which URL to crawl and how to crawl it. More...

Classes

class  CrawlSpeedEnumValue
 The number of concurrent URLs to fetch. gentle means one URL is fetched at a time with a delay between each call. normal means as many as two URLs are fectched concurrently with a short delay between fetch calls. aggressive means that up to ten URLs are fetched concurrently with a short delay between fetch calls. More...
 

Properties

string CrawlSpeed [get, set]
 The number of concurrent URLs to fetch. gentle means one URL is fetched at a time with a delay between each call. normal means as many as two URLs are fectched concurrently with a short delay between fetch calls. aggressive means that up to ten URLs are fetched concurrently with a short delay between fetch calls. Constants for possible values can be found using SourceOptionsWebCrawl.CrawlSpeedEnumValue More...
 
string Url [get, set]
 The starting URL to crawl. More...
 
bool LimitToStartingHosts [get, set]
 When true, crawls of the specified URL are limited to the host part of the url field. More...
 
bool AllowUntrustedCertificate [get, set]
 When true, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers. More...
 
long MaximumHops [get, set]
 The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on. More...
 
long RequestTimeout [get, set]
 The maximum milliseconds to wait for a response from the web server. More...
 
bool OverrideRobotsTxt [get, set]
 When true, the crawler will ignore any robots.txt encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set to true when a gateway_id is specied in the credentials. More...
 
List< string > Blacklist [get, set]
 Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing https://ibm.com/watson also excludes https://ibm.com/watson/discovery. More...
 

Detailed Description

Object defining which URL to crawl and how to crawl it.

Property Documentation

◆ AllowUntrustedCertificate

bool IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.AllowUntrustedCertificate
getset

When true, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.

◆ Blacklist

List<string> IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.Blacklist
getset

Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing https://ibm.com/watson also excludes https://ibm.com/watson/discovery.

◆ CrawlSpeed

string IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.CrawlSpeed
getset

The number of concurrent URLs to fetch. gentle means one URL is fetched at a time with a delay between each call. normal means as many as two URLs are fectched concurrently with a short delay between fetch calls. aggressive means that up to ten URLs are fetched concurrently with a short delay between fetch calls. Constants for possible values can be found using SourceOptionsWebCrawl.CrawlSpeedEnumValue

◆ LimitToStartingHosts

bool IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.LimitToStartingHosts
getset

When true, crawls of the specified URL are limited to the host part of the url field.

◆ MaximumHops

long IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.MaximumHops
getset

The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.

◆ OverrideRobotsTxt

bool IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.OverrideRobotsTxt
getset

When true, the crawler will ignore any robots.txt encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set to true when a gateway_id is specied in the credentials.

◆ RequestTimeout

long IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.RequestTimeout
getset

The maximum milliseconds to wait for a response from the web server.

◆ Url

string IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.Url
getset

The starting URL to crawl.


The documentation for this class was generated from the following file: