Object defining which URL to crawl and how to crawl it.
More...
|
class | CrawlSpeedEnumValue |
| The number of concurrent URLs to fetch. gentle means one URL is fetched at a time with a delay between each call. normal means as many as two URLs are fectched concurrently with a short delay between fetch calls. aggressive means that up to ten URLs are fetched concurrently with a short delay between fetch calls. More...
|
|
|
string | CrawlSpeed [get, set] |
| The number of concurrent URLs to fetch. gentle means one URL is fetched at a time with a delay between each call. normal means as many as two URLs are fectched concurrently with a short delay between fetch calls. aggressive means that up to ten URLs are fetched concurrently with a short delay between fetch calls. Constants for possible values can be found using SourceOptionsWebCrawl.CrawlSpeedEnumValue More...
|
|
string | Url [get, set] |
| The starting URL to crawl. More...
|
|
bool | LimitToStartingHosts [get, set] |
| When true , crawls of the specified URL are limited to the host part of the url field. More...
|
|
bool | AllowUntrustedCertificate [get, set] |
| When true , allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers. More...
|
|
long | MaximumHops [get, set] |
| The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on. More...
|
|
long | RequestTimeout [get, set] |
| The maximum milliseconds to wait for a response from the web server. More...
|
|
bool | OverrideRobotsTxt [get, set] |
| When true , the crawler will ignore any robots.txt encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set to true when a gateway_id is specied in the credentials. More...
|
|
List< string > | Blacklist [get, set] |
| Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing https://ibm.com/watson also excludes https://ibm.com/watson/discovery . More...
|
|
Object defining which URL to crawl and how to crawl it.
◆ AllowUntrustedCertificate
bool IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.AllowUntrustedCertificate |
|
getset |
When true
, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.
◆ Blacklist
List<string> IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.Blacklist |
|
getset |
◆ CrawlSpeed
string IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.CrawlSpeed |
|
getset |
The number of concurrent URLs to fetch. gentle
means one URL is fetched at a time with a delay between each call. normal
means as many as two URLs are fectched concurrently with a short delay between fetch calls. aggressive
means that up to ten URLs are fetched concurrently with a short delay between fetch calls. Constants for possible values can be found using SourceOptionsWebCrawl.CrawlSpeedEnumValue
◆ LimitToStartingHosts
bool IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.LimitToStartingHosts |
|
getset |
When true
, crawls of the specified URL are limited to the host part of the url field.
◆ MaximumHops
long IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.MaximumHops |
|
getset |
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the maximum_hops from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
◆ OverrideRobotsTxt
bool IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.OverrideRobotsTxt |
|
getset |
When true
, the crawler will ignore any robots.txt
encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set to true
when a gateway_id is specied in the credentials.
◆ RequestTimeout
long IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.RequestTimeout |
|
getset |
The maximum milliseconds to wait for a response from the web server.
◆ Url
string IBM.Watson.Discovery.v1.Model.SourceOptionsWebCrawl.Url |
|
getset |
The starting URL to crawl.
The documentation for this class was generated from the following file: