public class SourceOptionsWebCrawl
extends com.ibm.cloud.sdk.core.service.model.GenericModel
Modifier and Type | Class and Description |
---|---|
static class |
SourceOptionsWebCrawl.Builder
Builder.
|
static interface |
SourceOptionsWebCrawl.CrawlSpeed
The number of concurrent URLs to fetch.
|
Modifier and Type | Field and Description |
---|---|
protected java.lang.Boolean |
allowUntrustedCertificate |
protected java.util.List<java.lang.String> |
blacklist |
protected java.lang.String |
crawlSpeed |
protected java.lang.Boolean |
limitToStartingHosts |
protected java.lang.Long |
maximumHops |
protected java.lang.Boolean |
overrideRobotsTxt |
protected java.lang.Long |
requestTimeout |
protected java.lang.String |
url |
Modifier | Constructor and Description |
---|---|
protected |
SourceOptionsWebCrawl(SourceOptionsWebCrawl.Builder builder) |
Modifier and Type | Method and Description |
---|---|
java.lang.Boolean |
allowUntrustedCertificate()
Gets the allowUntrustedCertificate.
|
java.util.List<java.lang.String> |
blacklist()
Gets the blacklist.
|
java.lang.String |
crawlSpeed()
Gets the crawlSpeed.
|
java.lang.Boolean |
limitToStartingHosts()
Gets the limitToStartingHosts.
|
java.lang.Long |
maximumHops()
Gets the maximumHops.
|
SourceOptionsWebCrawl.Builder |
newBuilder()
New builder.
|
java.lang.Boolean |
overrideRobotsTxt()
Gets the overrideRobotsTxt.
|
java.lang.Long |
requestTimeout()
Gets the requestTimeout.
|
java.lang.String |
url()
Gets the url.
|
protected java.lang.String url
@SerializedName(value="limit_to_starting_hosts") protected java.lang.Boolean limitToStartingHosts
@SerializedName(value="crawl_speed") protected java.lang.String crawlSpeed
@SerializedName(value="allow_untrusted_certificate") protected java.lang.Boolean allowUntrustedCertificate
@SerializedName(value="maximum_hops") protected java.lang.Long maximumHops
@SerializedName(value="request_timeout") protected java.lang.Long requestTimeout
@SerializedName(value="override_robots_txt") protected java.lang.Boolean overrideRobotsTxt
protected java.util.List<java.lang.String> blacklist
protected SourceOptionsWebCrawl(SourceOptionsWebCrawl.Builder builder)
public SourceOptionsWebCrawl.Builder newBuilder()
public java.lang.String url()
The starting URL to crawl.
public java.lang.Boolean limitToStartingHosts()
When `true`, crawls of the specified URL are limited to the host part of the **url** field.
public java.lang.String crawlSpeed()
The number of concurrent URLs to fetch. `gentle` means one URL is fetched at a time with a delay between each call. `normal` means as many as two URLs are fectched concurrently with a short delay between fetch calls. `aggressive` means that up to ten URLs are fetched concurrently with a short delay between fetch calls.
public java.lang.Boolean allowUntrustedCertificate()
When `true`, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.
public java.lang.Long maximumHops()
The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the **maximum_hops** from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
public java.lang.Long requestTimeout()
The maximum milliseconds to wait for a response from the web server.
public java.lang.Boolean overrideRobotsTxt()
When `true`, the crawler will ignore any `robots.txt` encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set to `true` when a **gateway_id** is specied in the **credentials**.
public java.util.List<java.lang.String> blacklist()
Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing `https://ibm.com/watson` also excludes `https://ibm.com/watson/discovery`.