WebCrawlerConfiguration
The configuration of web URLs that you want to crawl. You should be authorized to crawl the URLs.
Types
Properties
The configuration of crawl limits for the web URLs.
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
The scope of what is crawled for your URLs.
A string used for identifying the crawler or bot when it accesses a web server. The user agent header value consists of the bedrockbot
, UUID, and a user agent suffix for your crawler (if one is provided). By default, it is set to bedrockbot_UUID
. You can optionally append a custom suffix to bedrockbot_UUID
to allowlist a specific user agent permitted to access your source URLs.