kendra/aws.sdk.kotlin.services.kendra.model/WebCrawlerConfiguration

WebCrawlerConfiguration

class WebCrawlerConfiguration

Provides the configuration information required for Amazon Kendra Web Crawler.

Types

Builder

class Builder

Companion

object Companion

Properties

authenticationConfiguration

val authenticationConfiguration: AuthenticationConfiguration?

Configuration information required to connect to websites using authentication.

crawlDepth

val crawlDepth: Int?

The 'depth' or number of levels from the seed level to crawl. For example, the seed URL page is depth 1 and any hyperlinks on this page that are also crawled are depth 2.

maxContentSizePerPageInMegaBytes

val maxContentSizePerPageInMegaBytes: Float?

The maximum size (in MB) of a web page or attachment to crawl.

maxLinksPerPage

val maxLinksPerPage: Int?

The maximum number of URLs on a web page to include when crawling a website. This number is per web page.

maxUrlsPerMinuteCrawlRate

val maxUrlsPerMinuteCrawlRate: Int?

The maximum number of URLs crawled per website host per minute.

proxyConfiguration

val proxyConfiguration: ProxyConfiguration?

Configuration information required to connect to your internal websites via a web proxy.

urlExclusionPatterns

val urlExclusionPatterns: List<String>?

A list of regular expression patterns to exclude certain URLs to crawl. URLs that match the patterns are excluded from the index. URLs that don't match the patterns are included in the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the URL file isn't included in the index.

urlInclusionPatterns

val urlInclusionPatterns: List<String>?

A list of regular expression patterns to include certain URLs to crawl. URLs that match the patterns are included in the index. URLs that don't match the patterns are excluded from the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the URL file isn't included in the index.

urls

val urls: Urls?

Specifies the seed or starting point URLs of the websites or the sitemap URLs of the websites you want to crawl.

Functions

copy

inline fun copy(block: WebCrawlerConfiguration.Builder.() -> Unit = {}): WebCrawlerConfiguration

equals

open operator override fun equals(other: Any?): Boolean

hashCode

open override fun hashCode(): Int

toString

open override fun toString(): String