SourceOptionsWebCrawl (IBM Watson Java SDK 11.0.1 API)

java.lang.Object
- com.ibm.cloud.sdk.core.service.model.GenericModel
- - com.ibm.watson.discovery.v1.model.SourceOptionsWebCrawl

All Implemented Interfaces:

com.ibm.cloud.sdk.core.service.model.ObjectModel
```
public class SourceOptionsWebCrawl
extends com.ibm.cloud.sdk.core.service.model.GenericModel
```
Object defining which URL to crawl and how to crawl it.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`SourceOptionsWebCrawl.Builder` Builder.
`static interface`	`SourceOptionsWebCrawl.CrawlSpeed` The number of concurrent URLs to fetch.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Boolean`	`allowUntrustedCertificate()` Gets the allowUntrustedCertificate.
`List<String>`	`blacklist()` Gets the blacklist.
`String`	`crawlSpeed()` Gets the crawlSpeed.
`Boolean`	`limitToStartingHosts()` Gets the limitToStartingHosts.
`Long`	`maximumHops()` Gets the maximumHops.
`SourceOptionsWebCrawl.Builder`	`newBuilder()` New builder.
`Boolean`	`overrideRobotsTxt()` Gets the overrideRobotsTxt.
`Long`	`requestTimeout()` Gets the requestTimeout.
`String`	`url()` Gets the url.

Methods inherited from class com.ibm.cloud.sdk.core.service.model.GenericModel
equals, hashCode, toString

Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait

- Method Detail
  - newBuilder
```
public SourceOptionsWebCrawl.Builder newBuilder()
```
    New builder.
    
    Returns:
    
    a SourceOptionsWebCrawl builder
  - url
```
public String url()
```
    Gets the url.
    The starting URL to crawl.
    
    Returns:
    
    the url
  - limitToStartingHosts
```
public Boolean limitToStartingHosts()
```
    Gets the limitToStartingHosts.
    When `true`, crawls of the specified URL are limited to the host part of the **url** field.
    
    Returns:
    
    the limitToStartingHosts
  - crawlSpeed
```
public String crawlSpeed()
```
    Gets the crawlSpeed.
    The number of concurrent URLs to fetch. `gentle` means one URL is fetched at a time with a delay between each call. `normal` means as many as two URLs are fectched concurrently with a short delay between fetch calls. `aggressive` means that up to ten URLs are fetched concurrently with a short delay between fetch calls.
    
    Returns:
    
    the crawlSpeed
  - allowUntrustedCertificate
```
public Boolean allowUntrustedCertificate()
```
    Gets the allowUntrustedCertificate.
    When `true`, allows the crawl to interact with HTTPS sites with SSL certificates with untrusted signers.
    
    Returns:
    
    the allowUntrustedCertificate
  - maximumHops
```
public Long maximumHops()
```
    Gets the maximumHops.
    The maximum number of hops to make from the initial URL. When a page is crawled each link on that page will also be crawled if it is within the **maximum_hops** from the initial URL. The first page crawled is 0 hops, each link crawled from the first page is 1 hop, each link crawled from those pages is 2 hops, and so on.
    
    Returns:
    
    the maximumHops
  - requestTimeout
```
public Long requestTimeout()
```
    Gets the requestTimeout.
    The maximum milliseconds to wait for a response from the web server.
    
    Returns:
    
    the requestTimeout
  - overrideRobotsTxt
```
public Boolean overrideRobotsTxt()
```
    Gets the overrideRobotsTxt.
    When `true`, the crawler will ignore any `robots.txt` encountered by the crawler. This should only ever be done when crawling a web site the user owns. This must be be set to `true` when a **gateway_id** is specied in the **credentials**.
    
    Returns:
    
    the overrideRobotsTxt
  - blacklist
```
public List<String> blacklist()
```
    Gets the blacklist.
    Array of URL's to be excluded while crawling. The crawler will not follow links which contains this string. For example, listing `https://ibm.com/watson` also excludes `https://ibm.com/watson/discovery`.
    
    Returns:
    
    the blacklist

Class SourceOptionsWebCrawl

Nested Class Summary

Method Summary

Methods inherited from class com.ibm.cloud.sdk.core.service.model.GenericModel

Methods inherited from class java.lang.Object

Method Detail

newBuilder

url

limitToStartingHosts

crawlSpeed

allowUntrustedCertificate

maximumHops

requestTimeout

overrideRobotsTxt

blacklist