Home

Web Tool

The web tool allows agents to fetch web pages and save content locally for further reading and analysis. Each URL is evaluated by the web guard before the request is made.

web_fetch

Fetches a URL, saves the content to .stencila/cache/web/, and converts HTML to Markdown with images extracted to a media directory.

ParameterTypeRequiredDescription
urlstringThe URL to fetch
rawbooleanIf true, save the response body as-is without conversion. Defaults to false

How it Works

  1. URL validation — the URL must use the http:// or https:// scheme.

  2. HTTP caching — responses are cached locally with full RFC 7234 compliance. Subsequent fetches of the same URL use conditional requests (If-Modified-Since, If-None-Match) and honor 304 Not Modified responses.

  3. Content processing — HTML pages are parsed and converted to Markdown. Images referenced in the page are downloaded in parallel (up to 8 concurrent, with retries) and saved alongside the Markdown file in a media/ subdirectory. Image references in the Markdown are rewritten to point to the local copies.

  4. Output — the tool returns a manifest listing the saved files with sizes and line counts, along with instructions to use read_file, grep, or glob to explore the content.

Responses are limited to 10 MB with a 30-second request timeout.

Guard Rules

The web guard parses each URL, normalizes the host (ASCII case-fold, trailing dot strip) and path (consecutive slash collapse), then evaluates rules in most-specific-first order. Evaluation short-circuits on the first non-Allow verdict.

Rule IDReasonSuggestionLowMediumHigh
web.credential_urlMetadata credential paths return IAM tokens and secrets that can be used for privilege escalationUse the cloud provider's CLI for credential managementDenyDenyDeny
web.metadata_endpointCloud metadata endpoints expose instance credentials and configurationAccess cloud credentials through the provider's CLI or SDK insteadDenyDenyDeny
web.internal_networkFetching internal network addresses can expose services not meant for external access (SSRF)Use a public URL, or access internal services through an appropriate APIDenyDenyDeny
web.non_httpsUnencrypted HTTP requests can expose data in transitUse https:// instead of http://DenyWarnAllow
web.high_risk_portPort is associated with an infrastructure service not typically accessed via HTTPUse the service's dedicated CLI or client library instead of HTTPDenyWarnAllow
web.domain_allowlistDomain is not in the agent's allowed domain listAdd the domain to allowedDomains in the agent definition, or use an allowed domainDenyDenyDeny
web.domain_denylistDomain is in the agent's disallowed domain listRemove the domain from disallowedDomains if access is intended, or use a different sourceDenyDenyDeny
web.parse_failureURL could not be parsedProvide a valid URL (e.g., https://example.com/path)DenyDenyDeny

Metadata Hosts

Requests to these hosts trigger web.metadata_endpoint (or web.credential_url if the path also matches a credential prefix):

  • 169.254.169.254 — AWS, Azure, most cloud providers

  • fd00:ec2::254 — AWS IMDSv2 IPv6 endpoint

  • metadata.google.internal — GCP

  • 100.100.100.200 — Alibaba Cloud

Credential Path Prefixes

These URL path prefixes (on metadata hosts) trigger web.credential_url:

  • /latest/meta-data/iam/security-credentials (AWS IMDSv1/v2)

  • /latest/api/token (AWS IMDSv2)

  • /computeMetadata/v1/instance/service-accounts (GCP)

  • /metadata/identity/oauth2/token (Azure)

  • /latest/meta-data/ram/security-credentials (Alibaba Cloud)

Internal Network Detection

The following are considered internal network addresses and trigger web.internal_network:

  • localhost

  • *.local, *.internal hostname suffixes

  • Loopback IPs: 127.0.0.0/8, ::1

  • Private IPv4: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16

  • Link-local: 169.254.0.0/16, fe80::/10

  • Shared address space: 100.64.0.0/10

  • IPv4-mapped IPv6: ::ffff:0:0/96 (when the mapped address is private)

High-Risk Ports

These ports trigger web.high_risk_port:

PortService
22SSH
23Telnet
25SMTP
135MS RPC
139NetBIOS
445SMB
2375Docker daemon (unencrypted)
2376Docker daemon (TLS)
3306MySQL
5432PostgreSQL
5900VNC
6379Redis
6443Kubernetes API
8200Vault
8500Consul
9200Elasticsearch
27017MongoDB
© 2026 Stencila