com.ibm.cloud.sdk.core.service.model.GenericModel

com.ibm.watson.discovery.v1.model.SourceOptionsWebCrawl

All Implemented Interfaces:: com.ibm.cloud.sdk.core.service.model.ObjectModel

public class SourceOptionsWebCrawl
extends com.ibm.cloud.sdk.core.service.model.GenericModel

Object defining which URL to crawl and how to crawl it.

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`static class`	`SourceOptionsWebCrawl.Builder`	Builder.
`static interface`	`SourceOptionsWebCrawl.CrawlSpeed`	The number of concurrent URLs to fetch.

Method Summary

Modifier and Type	Method	Description
`Boolean`	`allowUntrustedCertificate()`	Gets the allowUntrustedCertificate.
`List<String>`	`blacklist()`	Gets the blacklist.
`String`	`crawlSpeed()`	Gets the crawlSpeed.
`Boolean`	`limitToStartingHosts()`	Gets the limitToStartingHosts.
`Long`	`maximumHops()`	Gets the maximumHops.
`SourceOptionsWebCrawl.Builder`	`newBuilder()`	New builder.
`Boolean`	`overrideRobotsTxt()`	Gets the overrideRobotsTxt.
`Long`	`requestTimeout()`	Gets the requestTimeout.
`String`	`url()`	Gets the url.

Methods inherited from class com.ibm.cloud.sdk.core.service.model.GenericModel

equals, hashCode, toString

Methods inherited from class java.lang.Object

getClass, notify, notifyAll, wait, wait, wait

Method Details
- newBuilder
  
  public SourceOptionsWebCrawl.Builder newBuilder()
  
  New builder.
  
  Returns:
  
  a SourceOptionsWebCrawl builder
- url
  
  public String url()
  
  Gets the url.
  The starting URL to crawl.
  
  Returns:
  
  the url
- limitToStartingHosts
  
  public Boolean limitToStartingHosts()
  
  Gets the limitToStartingHosts.
  When `true`, crawls of the specified URL are limited to the host part of the **url** field.
  
  Returns:
  
  the limitToStartingHosts
- crawlSpeed
  
  public String crawlSpeed()
  
  Gets the crawlSpeed.
  The number of concurrent URLs to fetch. `gentle` means one URL is fetched at a time with a delay between each call. `normal` means as many as two URLs are fectched concurrently with a short delay between fetch calls. `aggressive` means that up to ten URLs are fetched concurrently with a short delay between fetch calls.
  
  Returns:
  
  the crawlSpeed
- allowUntrustedCertificate
  
  public Boolean allowUntrustedCertificate()
  
  Gets the allowUntrustedCertificate.
  When `true`, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.
  
  Returns:
  
  the allowUntrustedCertificate
- maximumHops
  
  public Long maximumHops()
  
  Gets the maximumHops.
  The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the **maximum_hops** from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
  
  Returns:
  
  the maximumHops
- requestTimeout
  
  public Long requestTimeout()
  
  Gets the requestTimeout.
  The maximum milliseconds to wait for a response from the web server.
  
  Returns:
  
  the requestTimeout
- overrideRobotsTxt
  
  public Boolean overrideRobotsTxt()
  
  Gets the overrideRobotsTxt.
  When `true`, the crawler will ignore any `robots.txt` encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set to `true` when a **gateway_id** is specied in the **credentials**.
  
  Returns:
  
  the overrideRobotsTxt
- blacklist
  
  public List<String> blacklist()
  
  Gets the blacklist.
  Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing `https://ibm.com/watson` also excludes `https://ibm.com/watson/discovery`.
  
  Returns:
  
  the blacklist

Class SourceOptionsWebCrawl

Nested Class Summary

Method Summary

Methods inherited from class com.ibm.cloud.sdk.core.service.model.GenericModel

Methods inherited from class java.lang.Object

Method Details

newBuilder

url

limitToStartingHosts

crawlSpeed

allowUntrustedCertificate

maximumHops

requestTimeout

overrideRobotsTxt

blacklist