Cloudflare Crawl API '/crawl' Explained with Examples

Published : 14 March 2026

34 Views

#api

#crawler

#cloudflare

Web crawling is an essential part of modern software development. Applications that rely on AI, search systems, research tools and analytics platforms often need structured information gathered from websites. Traditionally, developers built their own crawling systems using headless browsers and distributed infrastructure.

To simplify this process, Cloudflare Browser Rendering introduced the /crawl endpoint. This API allows developers to crawl an entire website starting from a single URL while automatically following links and extracting content. The results can be returned in multiple formats including HTML, Markdown and structured JSON.

Cloudflare Crawl API Endpoint

The Crawl API endpoint is the main URL developers use to start a website crawl using Cloudflare Browser Rendering. It allows you to send a request that begins crawling a website from a starting URL and automatically follows links across the site.

Endpoint URL


https://api.cloudflare.com/client/v4/accounts/<account_id>/browser-rendering/crawl

This endpoint is part of the Cloudflare Browser Rendering REST API and is used to initiate a crawl job that runs asynchronously in Cloudflare's infrastructure.

Required Fields in Cloudflare Crawl API

When using the Cloudflare Crawl API, the request must include required field in the request body. Without this field, the API cannot start a crawl job.

url (string)

The url parameter is the only required field needed to start a crawl job. It specifies the starting page that the crawler will visit first. From this URL, the crawler automatically discovers and follows links across the website based on your configuration.

How the Cloudflare Crawl API Works (Short Summary)

Using the Cloudflare Crawl API involves two simple steps:

Initiate the crawl job: You send a POST request with the starting URL to the /crawl endpoint. Cloudflare immediately returns a job ID that identifies your crawl task.
Request the crawl results: With the job ID, you send a GET request to check the status or retrieve the results of the crawl. The API response includes pages crawled and their extracted content.

Crawl jobs run asynchronously, meaning they run in the background. You don't wait for the crawl to finish in a single request. Instead, you check back with the job ID until the crawl completes.

Cloudflare applies certain limits:

A crawl can run for up to 7 days. If it doesn't finish in that time, the job is cancelled due to timeout.
Once the crawl completes, its data stays available for 14 days before being deleted.

This two‑step process lets you start crawls quickly and retrieve results whenever they're ready, without managing crawling infrastructure yourself.

How to Initiate a Crawl Job (Cloudflare /crawl API)

To start crawling a website using Cloudflare's Crawl API, you send a POST request to the /crawl endpoint with the URL you want to begin crawling. The API immediately responds with a job ID that you will use later to check the status and retrieve the results of the crawl.

Curl Example : Start a Crawl Job


curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \
  -H "Authorization: Bearer <apiToken>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://developers.cloudflare.com/workers/"
  }'

In this request:

You call the /crawl endpoint under your Cloudflare account.
You include your API token in the Authorization header.
You send a JSON body with the required url field pointing to the page you want to crawl.

Example Response

After successful submission, Cloudflare returns a response like this:


{
  "success": true,
  "result": "c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e"
}

Here:

success: true means the crawl job was created successfully.
The result value is the crawl job ID : a unique identifier for your crawl.

You use this job ID in a later GET request to check the crawl status or retrieve the final crawl results.

Requesting Crawl Job Results (Cloudflare /crawl API)

After you start a crawl job with a POST request, Cloudflare processes the job in the background and returns a job ID. You use that job ID to check the crawl status or get the actual results by sending a GET request to the same /crawl endpoint with the job ID appended.

Example GET Request to Fetch Crawl Status


curl -X GET "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e" \
  -H "Authorization: Bearer YOUR_API_TOKEN"

In this request:

You replace {account_id} with your Cloudflare account ID.
You use the job ID you received from the initial POST request.
You include your API token in the Authorization header.

When you send this request, the API returns a JSON response that includes a status field. This field tells you the current state of the crawl job.

Possible Crawl Job Statuses

The status field can contain any of the following values:

running : The crawl is still in progress.
cancelled_due_to_timeout : The job ran longer than the seven‑day limit and was automatically cancelled.
cancelled_due_to_limits : The job was cancelled because it reached your Cloudflare plan's resource limits.
cancelled_by_user : The job was manually cancelled by you.
errored : An error occurred during the crawl.
completed : The crawl finished successfully and you can retrieve the final data.

Once the job reaches the completed status, you can request the full crawl results (including crawled URLs and extracted content) from the same endpoint. You can also use optional query parameters like cursor, limit, or status to filter or paginate results.

Polling for Completion (Cloudflare /crawl API)

Cloudflare crawl jobs run asynchronously, so they don't finish immediately after you send the POST request. To check when your crawl job is done, you can poll the endpoint periodically using the job ID you received. Adding ?limit=1 helps keep the response small because you only need the status, not all crawled content.

JavaScript Example: Poll Until Crawl Finishes


async function waitForCrawl(accountId, jobId, apiToken) {
  const maxAttempts = 60;
  const delayMs = 5000;

  for (let i = 0; i < maxAttempts; i++) {
    const response = await fetch(
      `https://api.cloudflare.com/client/v4/accounts/${accountId}/browser-rendering/crawl/${jobId}?limit=1`,
      {
        headers: {
          Authorization: `Bearer ${apiToken}`,
        },
      },
    );
    const data = await response.json();
    const status = data.result.status;
    if (status !== "running") {
      return data.result;
    }
    // Wait before next check
    await new Promise((resolve) => setTimeout(resolve, delayMs));
  }
  throw new Error("Crawl job did not complete within timeout");
}

In this function:

The loop checks the crawl status every 5 seconds.
It stops when the crawl job is no longer running.
You can adjust maxAttempts and delayMs based on your preferences.

Fetch Full Crawl Results

Once the job reaches a terminal status (like completed), you can fetch the full results without the limit parameter. The API supports additional query options so you can control how you retrieve the results:

cursor : Used for pagination if results exceed 10 MB.
limit : Number of records to return per request.
status : Filter results by URL status such as completed, queued, skipped etc.

Example: Fetch Paginated Crawl Results


curl -X GET "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}?cursor=10&limit=10&status=completed" \
  -H "Authorization: Bearer YOUR_API_TOKEN"

This request retrieves results starting at the cursor position 10, with up to 10 records per page and only includes URLs whose status is completed.

Sample Paginated Response

The API returns a JSON object listing the crawl results and metadata:


{
  "result": {
    "id": "c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e",
    "status": "completed",
    "browserSecondsUsed": 134.7,
    "total": 50,
    "finished": 50,
    "records": [
      {
        "url": "https://developers.cloudflare.com/workers/",
        "status": "completed",
        "markdown": "# Cloudflare Workers\nBuild and deploy serverless applications...",
        "metadata": {
          "status": 200,
          "title": "Cloudflare Workers · Cloudflare Workers docs",
          "url": "https://developers.cloudflare.com/workers/"
        }
      },
      {
        "url": "https://developers.cloudflare.com/workers/get-started/quickstarts/",
        "status": "completed",
        "markdown": "## Quickstarts\nGet up and running with a simple Hello World...",
        "metadata": {
          "status": 200,
          "title": "Quickstarts · Cloudflare Workers docs",
          "url": "https://developers.cloudflare.com/workers/get-started/quickstarts/"
        }
      }
      // …more entries
    ],
    "cursor": 10
  },
  "success": true
}

This contains:

metadata about the crawl job
list of records with extracted content
a cursor value for pagination when there are more results to fetch.

This step ensures your application can detect when a crawl job finishes and then fetch all the crawled data efficiently.

Cancel a Crawl Job (Cloudflare /crawl API)

If you've started a crawl job with the Cloudflare Crawl API and decide you no longer want it to continue, you can cancel the job while it's in progress. Cancelling stops any future pages that were queued for crawling and updates the job status to indicate it was cancelled by the user.

How to Cancel a Crawl Job

To cancel a crawl job, send a DELETE request to the /crawl endpoint using the job ID you received when you initiated the crawl.

Example Terminal Command


curl -X DELETE "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}" \
  -H "Authorization: Bearer YOUR_API_TOKEN"

In this example:

Replace {account_id} with your Cloudflare account ID.
Replace {job_id} with the crawl job ID you received from your earlier POST request.
Include your API token in the Authorization header.

What Happens After Cancellation

A successful cancellation returns a 200 OK response.
The crawl job's status is updated to cancelled_by_user.
All URLs that were queued but not yet crawled are dropped.
You can still check the job status later if needed.

Cancelling a crawl is useful when you've queued a large crawl by mistake, want to save quota or no longer need the data before the crawl completes.

Optional Parameters for Cloudflare Crawl API

In addition to the required url field, Cloudflare's /crawl endpoint supports several optional parameters that let you customize the crawl behavior, limit scope, and control output formats. You include these in the same JSON body when you start a crawl job with a POST request.

Optional Parameter	Type	Description
`limit`	Number	Maximum number of pages to crawl (default 10, max 100,000).
`depth`	Number	Maximum link depth to crawl from the starting URL (default 100,000, max 100,000).
`source`	String	Source for discovering URLs: `all`, `sitemaps`, or `links`. Default: `all`.
`formats`	Array of strings	Response formats. Default: `["html"]`. Other options: `markdown`, `json`. JSON uses Workers AI extraction by default.
`render`	Boolean	Controls browser rendering: `true` executes JavaScript (default), `false` does a fast HTML fetch without rendering.
`jsonOptions`	Object	Required if `formats` includes `"json"`. Contains properties like `prompt`, `response_format`, and `custom_ai` (same as `/json` endpoint).
`maxAge`	Number	Max seconds the crawler can reuse a cached resource before re‑fetching it (default 86,400s, max 604,800s).
`modifiedSince`	Number	Unix timestamp (seconds). Crawl only pages modified since this time.
`options.includeExternalLinks`	Boolean	If `true`, follows links to external domains (default `false`).
`options.includeSubdomains`	Boolean	If `true`, follows links on subdomains of the starting URL (default `false`).
`options.includePatterns`	Array of strings	Only visits URLs matching any of these wildcard patterns; supports `` and `*`.
`options.excludePatterns`	Array of strings	Does not visit URLs matching these wildcard patterns; supports `` and `*`.

Example with all optional parameters


curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
  -H 'Authorization: Bearer <apiToken>' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://www.exampledocs.com/docs/",
    "limit": 50,
    "depth": 2,
    "formats": ["markdown"],
    "render": false,
    "maxAge": 7200,
    "modifiedSince": 1704067200,
    "source": "all",
    "options": {
      "includeExternalLinks": true,
      "includeSubdomains": true,
      "includePatterns": [
        "**/api/v1/*"
      ],
      "excludePatterns": [
        "*/learning-paths/*"
      ]
    }
}'

Real‑World Use Cases

1. Documentation Site Crawl With Filters

When you need to crawl only the documentation section of a website and deliberately skip unnecessary sections like changelog or archive you can use includePatterns and excludePatterns in your crawl request. These pattern filters let you precisely control which parts of a site are indexed, making the crawl more efficient and targeted.

This is particularly useful when building:

Technical knowledge bases
AI training datasets (e.g., for RAG systems)
Documentation monitoring tools

The example below shows how to start a crawl that scans only the site's docs and omits irrelevant sections.

Example: Crawl Only Documentation Pages


curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
  -H 'Authorization: Bearer <apiToken>' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://example.com/docs",
    "limit": 200,
    "depth": 5,
    "formats": ["markdown"],
    "options": {
      "includePatterns": [
        "https://example.com/docs/**"
      ],
      "excludePatterns": [
        "https://example.com/docs/changelog/**",
        "https://example.com/docs/archive/**"
      ]
    }
  }'

What This Script Does

Starting URL: Begins crawling at the documentation home (https://example.com/docs).
limit: Sets a maximum of 200 pages to crawl so you don't exceed quota or get unnecessary content.
depth: Limits how many levels deep the crawler will follow links (here up to 5).
formats: Requests results in Markdown, which is ideal for docs and AI training.
includePatterns: Ensures only URLs under the /docs/ path get crawled.
excludePatterns: Prevents crawling of specific sub‑sections that you don't need (like “changelog” and “archive”).

Why Use Pattern Rules

Pattern matching helps you avoid crawling irrelevant or sensitive parts of a site. Even if a page is linked from the docs, exclude patterns ensure it's skipped if it matches one of the defined rules. This keeps your crawl results focused and reduces unnecessary processing.

2. Product Catalog Extraction with AI

When building e‑commerce tools, price trackers, catalogs or AI‑driven product analytics systems, developers often need structured product data (such as product names, prices, descriptions, availability and currency). Cloudflare's Crawl API makes this easier by combining crawling, rendering and AI‑powered JSON extraction into one call.

Instead of crawling the site and then running a separate extraction step, you can tell the Crawl API exactly what fields you want to extract and it will return structured JSON using Workers AI.

Crawl Request: Extract Product Data

Below is an example curl command that starts a crawl on a hypothetical shop's products page and extracts key product information using a JSON schema:


curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
  -H 'Authorization: Bearer <apiToken>' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://shop.example.com/products",
    "limit": 50,
    "formats": ["json"],
    "jsonOptions": {
      "prompt": "Extract product name, price, description, and availability",
      "response_format": {
        "type": "json_schema",
        "json_schema": {
          "name": "product",
          "properties": {
            "name": "string",
            "price": "number",
            "currency": "string",
            "description": "string",
            "inStock": "boolean"
          }
        }
      }
    },
    "options": {
      "includePatterns": [
        "https://shop.example.com/products/*"
      ]
    }
  }'

What This Does

Starting URL: Begins crawling at the products listing page.
limit: Limits the crawl to 50 pages, enough to cover most basic product catalogs.
formats: Specifies ["json"], so results are returned as structured JSON rather than plain HTML or Markdown.
jsonOptions: Provides a prompt and a JSON schema to guide Workers AI in extracting exactly the fields you care about: product name, price, description, currency and availability.
includePatterns: Ensures only URLs that match the “products” path pattern are included in the crawl.

Why This Matters

In traditional web scraping setups, you would need to:

Crawl pages separately.
Parse HTML with custom extraction logic.
Normalize the data into a structured format.

With Cloudflare's Crawl API, you combine crawling, JavaScript rendering, link discovery and AI‑powered extraction in one API call, drastically simplifying your workflow. It's ideal for:

E‑commerce analytics dashboards
Price comparison tools
AI‑powered product search & recommendations
Automated catalog generation pipelines

By providing a prompt and schema, the Crawl API returns clean, typed JSON without custom scraping logic, which saves time and reduces engineering complexity.

3. Fast Static Content Fetch : Crawl Static Sites without Rendering

When you know that the content you need is already present in the initial HTML, such as on blogs, brochure sites or static documentation. you can tell the Crawl API to skip JavaScript rendering. This makes the crawl much faster and more efficient because Cloudflare doesn't launch a headless browser for each page.

Here's how you perform a fast static content fetch using the /crawl endpoint:

Example : Crawl Static HTML Pages


curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \
  -H "Authorization: Bearer <apiToken>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 100,
    "render": false,
    "formats": ["html", "markdown"]
  }'

What This Request Does

url: The starting page for the crawl.
limit: Tells the API to crawl up to 100 pages.
render: false: Disables JavaScript execution and performs a simple HTML fetch instead of spinning up a browser.
formats: Requests results in both raw HTML and Markdown for easier content consumption or analysis.

By disabling rendering (render: false), the crawl becomes faster and more cost‑efficient, especially for static sites where the HTML doesn't depend on client‑side JavaScript to load content.

When to Use render: false

Your pages are fully rendered on the server
You don't need the crawler to execute JavaScript
You want quicker results and lower compute usage
You're crawling simple static content like blogs or documentation

Cloudflare's /crawl endpoint will still follow links and extract content. it simply skips the headless browser step unless you specifically enable it.

4. Crawl with Authentication (Cloudflare /crawl API)

When you need to crawl content that is behind HTTP authentication or requires custom headers (for example, API‑key access), Cloudflare's Crawl endpoint lets you include authentication credentials directly in your crawl request. This makes it possible to crawl protected pages (like internal docs or API endpoints) that would otherwise be inaccessible with a simple anonymous crawl.

I. Crawl with Basic HTTP Authentication

If the site you're crawling requires a username and password (standard HTTP basic auth), you can include an authenticate object in your request body. This tells the crawler to send the credentials as part of the request to the target site.

Example : Basic Auth Crawl


curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \
  -H "Authorization: Bearer <apiToken>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://secure.example.com",
    "limit": 50,
    "authenticate": {
      "username": "user",
      "password": "pass"
    }
  }'

In this example:

url is the protected page you want to crawl.
limit determines how many pages are crawled.
authenticate contains the basic authentication credentials that Cloudflare will send when requesting the page.

This allows the crawler to access content that is normally blocked by a login prompt or HTTP basic authentication.

II. Crawl with Token‑Based or Custom Header Authentication

Some APIs or services require a token or custom HTTP header instead of basic auth. You can include these headers in the crawl request with the setExtraHTTPHeaders option.

Example : Token‑Based Auth Crawl


curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \
  -H "Authorization: Bearer <apiToken>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://api.example.com/docs",
    "limit": 100,
    "setExtraHTTPHeaders": {
      "X-API-Key": "your-api-key"
    }
  }'

How this works:

The crawler includes the X-API-Key header in every request it makes during the crawl.
This allows it to authenticate with APIs or endpoints that expect token‑based authentication.

This technique is especially useful for crawling internal documentation portals, private APIs or content that returns data only when an API token or session token is present.

Why Authentication Options Matter

Many real‑world websites and APIs are protected behind authentication mechanisms. Without supporting credentials or headers, crawlers only return the public content. By providing credentials or custom headers:

You can crawl login‑protected documentation
You can index internal knowledge bases for AI systems
You can extract data from API‑protected endpoints

These features make the Crawl API much more flexible and suitable for enterprise workflows where most content isn't publicly accessible.

5. Wait for Dynamic Content

Some modern websites, especially single‑page applications (SPAs) load their content after the initial HTML response through JavaScript. To ensure the Cloudflare crawler captures the fully rendered page instead of just an empty shell, you can use the gotoOptions and waitForSelector options in your crawl request.

Example : Crawl with Dynamic Content Wait


curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \
  -H "Authorization: Bearer <apiToken>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://app.example.com",
    "limit": 50,
    "gotoOptions": {
      "waitUntil": "networkidle2",
      "timeout": 60000
    },
    "waitForSelector": {
      "selector": "[data-content-loaded]",
      "timeout": 30000,
      "visible": true
    }
  }'

What This Does

gotoOptions.waitUntil: "networkidle2" - Tells the crawler to wait until network activity is mostly finished (no more than two pending requests). This helps ensure content loaded via JavaScript is fully available before extraction.
gotoOptions.timeout: 60000 - Allows up to 60 seconds for the page to finish loading dynamic content.
waitForSelector - Instructs the crawler to wait until a specific DOM element appears (in this case, one marked by data-content-loaded) before proceeding. This helps avoid capturing incomplete pages.

This approach is especially useful when crawling JavaScript heavy applications where important content is loaded asynchronously and not available in the initial HTML.

6. Block Unnecessary Resources (Speed Up Crawl)

When crawling a site where you only need text content (not images, videos, fonts,or CSS), you can instruct Cloudflare's Crawl API to block specific resource types. This reduces network requests, speeds up crawling and lowers resource usage, especially helpful for large crawls focused on text extraction or structured data. (developers.cloudflare.com)

Example : Crawl While Blocking Images & Media


curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \
  -H "Authorization: Bearer <apiToken>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 100,
    "rejectResourceTypes": [
      "image",
      "media",
      "font",
      "stylesheet"
    ]
  }'

What This Does

rejectResourceTypes tells the crawler to block requests for specific kinds of resources like:

image : prevents loading JPG, PNG, GIF, SVG, etc.
media : blocks audio and video files.
font : stops web fonts from downloading.
stylesheet : skips CSS files, which can speed up the crawl.

By blocking these, the crawler only requests text and essential document content, speeding up crawling and reducing processing costs. Blocking unnecessary resources is particularly useful when you only care about HTML structure, text or JSON output and not how the page looks visually. That way, the crawl focuses on what matters the content without wasting time fetching images or style files.

Cloudflare Crawl API '/crawl' Explained with Examples

Cloudflare Crawl API Endpoint

Required Fields in Cloudflare Crawl API

How the Cloudflare Crawl API Works (Short Summary)

How to Initiate a Crawl Job (Cloudflare /crawl API)

Requesting Crawl Job Results (Cloudflare /crawl API)

Possible Crawl Job Statuses

Polling for Completion (Cloudflare /crawl API)

Fetch Full Crawl Results

Cancel a Crawl Job (Cloudflare /crawl API)

How to Cancel a Crawl Job

What Happens After Cancellation

Optional Parameters for Cloudflare Crawl API

Example with all optional parameters

Real‑World Use Cases

1. Documentation Site Crawl With Filters

What This Script Does

Why Use Pattern Rules

2. Product Catalog Extraction with AI

Crawl Request: Extract Product Data

What This Does

Why This Matters

3. Fast Static Content Fetch : Crawl Static Sites without Rendering

What This Request Does

When to Use render: false

4. Crawl with Authentication (Cloudflare /crawl API)

I. Crawl with Basic HTTP Authentication

II. Crawl with Token‑Based or Custom Header Authentication

Why Authentication Options Matter

5. Wait for Dynamic Content

What This Does

6. Block Unnecessary Resources (Speed Up Crawl)

What This Does

Trending Developer Reads

Responses (0)