Cloudflare Crawl API (2026) : Complete Developer Guide to the /crawl Endpoint
#api
#crawler
#cloudflare
Modern web applications increasingly depend on structured data gathered from websites. Whether you are building an AI assistant, an analytics tool or a research platform, the first step often involves collecting content from across a website. Traditionally, this process required building a complex crawling infrastructure involving headless browsers, scraping scripts, proxy management and distributed processing systems.
In March 2026, Cloudflare introduced a new feature that significantly simplifies this process: the /crawl endpoint in the Browser Rendering REST API.
This endpoint allows developers to crawl an entire website using a single API call, automatically discovering pages, rendering them in a browser
environment if needed and returning the extracted content in structured formats.
This article provides a detailed, step-by-step explanation of the Cloudflare Crawl API. Instead of short summaries or bullet lists, we will explore how it works conceptually, how to use it in real projects and how developers can integrate it into modern data pipelines.
Understanding the Problem Cloudflare Is Solving
Before exploring the Crawl API itself, it is important to understand why such a feature matters.
Web crawling has traditionally been one of the most complex parts of building data-driven systems. If you wanted to extract content from websites at scale, you typically had to manage several layers of infrastructure. Developers often used headless browser automation tools such as Puppeteer or Playwright to load pages and execute JavaScript. On top of that, they needed systems to discover links across a site, manage crawling depth, handle retries, parse the HTML content and store the extracted data.
All of this required servers capable of running browser instances continuously. When crawling large websites, developers had to orchestrate hundreds or thousands of browser sessions simultaneously. Infrastructure costs, maintenance overhead and reliability challenges quickly became major concerns.
Cloudflare's solution is to move the browser infrastructure into the cloud and expose it through a simple API. Instead of building your own crawler, you simply send a request to Cloudflare's endpoint and let their infrastructure perform the crawl.
The /crawl endpoint takes a starting URL, explores the site automatically, renders pages if necessary and returns the results in formats that are ready
to use for further processing.
What the Cloudflare /crawl Endpoint Actually Does
At its core, the Crawl API is part of the Cloudflare Browser Rendering platform, which allows developers to perform browser tasks such as fetching HTML, generating PDFs, capturing screenshots and extracting structured data through REST API calls.
The /crawl endpoint extends this functionality by enabling developers to crawl entire websites rather than individual pages. When you submit a request, Cloudflare performs several operations automatically:
-
First, it loads the starting URL. From there, it analyzes the page to discover additional URLs. The crawler checks sitemap files if they exist, scans links embedded within the page and builds a list of additional pages to visit.
-
Each discovered page can optionally be rendered using a headless browser environment. This allows the crawler to execute JavaScript, which is necessary for many modern websites built with frameworks like React or Vue.
-
As pages are processed, the crawler extracts content and prepares it in one or more output formats. The job continues until the crawler reaches the configured limits such as maximum pages or crawl depth.
This entire process is executed asynchronously, meaning your application does not have to wait for the crawl to finish before continuing other tasks. Instead, the API returns a job identifier that you can use to check the crawl results later.
The Architecture Behind the Crawl API
To understand how developers use the API, it helps to look at the architecture behind it. The Cloudflare Crawl API follows an asynchronous job model consisting of two primary stages.
-
In the first stage, you initiate a crawl job by sending a request that includes a starting URL and optional configuration parameters. Cloudflare immediately returns a response containing a unique job identifier.
-
In the second stage, your application periodically requests the results associated with that job identifier. As pages are processed, the API returns the crawled data.
This design has several advantages. It allows Cloudflare to process large crawls without requiring long-running HTTP connections. It also enables developers to integrate the crawl process into larger pipelines that trigger additional processing once the job completes.
Preparing Your Environment
Before using the Crawl API, you need a Cloudflare account and an API token with the appropriate permissions.
-
The Browser Rendering API requires a token with Browser Rendering edit permissions, which allows your application to initiate rendering jobs through the REST interface.
-
Once the token is created, you can use it to authenticate your API requests.
Developers typically interact with the API using HTTP clients such as cURL, Postman or programmatic libraries in languages like Node.js or Python.
Initiating a Crawl Job
The first step in using the Crawl API is sending a request to start a crawl job.
The endpoint URL follows this structure:
https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl
Your request must include the URL you want to crawl.
Example request:
curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl" \
-H "Authorization: Bearer <apiToken>" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com"
}'
Once the request is processed, Cloudflare immediately returns a response similar to the following:
{
"success": true,
"result": "c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e"
}
The value in the result field represents the crawl job ID. This identifier is used to retrieve the results later.
Retrieving Crawl Results
Because the crawl process runs asynchronously, your application must check the job status periodically. To retrieve results, send a GET request using the job ID:
GET /accounts/{account_id}/browser-rendering/crawl/{job_id}
Example:
curl -X GET "https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}" \
-H "Authorization: Bearer <apiToken>"
The response contains information about the pages that were crawled, including metadata and extracted content. Cloudflare stores crawl results for 14 days, allowing developers to retrieve them even after the job has completed.
How the Cloudflare Crawler Discovers Pages
One of the most interesting aspects of the Crawl API is how it finds new pages within a website. The crawler follows a structured discovery process.
-
It begins with the initial URL provided in your request. This page acts as the entry point for the crawl.
-
Next, the crawler checks the website's sitemap files. Many websites include XML sitemaps that list the most important pages on the site. If a sitemap is available, the crawler extracts URLs from it and adds them to the crawl queue.
-
Finally, the crawler analyzes links embedded within each page. As it processes pages, it identifies additional URLs and adds them to the queue if they belong to the same site.
-
By combining sitemap discovery and link analysis, the crawler can map out large portions of a website automatically.
Output Formats Provided by the API
One of the major advantages of the Crawl API is the variety of output formats it supports.
-
The first format is raw HTML. This format provides the complete HTML markup for each crawled page. Developers who want full control over parsing and extraction often prefer this format.
-
The second format is Markdown. This format converts the page content into clean, readable text that removes many of the structural elements of HTML. Markdown is particularly useful for documentation systems or AI pipelines that process textual content.
-
The third format is structured JSON. In this format, Cloudflare organizes the extracted content into structured fields that are easier to process programmatically. Structured JSON is especially useful when building data pipelines or training machine learning models.
Handling JavaScript-Rendered Websites
Many modern websites rely heavily on JavaScript to render content dynamically. Traditional crawlers often struggle with such sites because they only retrieve the initial HTML response. The Crawl API solves this problem through the render parameter.
When rendering is enabled, Cloudflare launches a headless browser environment and executes the page's JavaScript. This allows the crawler to capture the fully rendered page, including content that appears only after JavaScript execution.
If the site is mostly static and the content is already present in the HTML response, developers can disable rendering. This makes the crawl process faster and reduces resource usage.
Incremental Crawling for Efficient Data Collection
When crawling large websites regularly, it is inefficient to reprocess every page on every crawl.
The Crawl API supports incremental crawling using parameters such as modifiedSince and maxAge.
These parameters allow the crawler to skip pages that have not changed since the last crawl. By avoiding unnecessary processing, developers can reduce both time and cost.
Incremental crawling is particularly valuable for monitoring websites that update frequently, such as documentation portals, blogs and product catalogs.
Respecting Website Rules and Policies
Cloudflare designed the Crawl API to behave like a responsible crawler rather than a scraping tool that bypasses protections.
The crawler respects the rules defined in a website's robots.txt file, including directives that prevent specific pages from being crawled. It also
follows crawl delay instructions if they are specified.
If a site blocks bots entirely or uses security systems such as CAPTCHA challenges, Web Application Firewalls or bot detection services, the Crawl API does not attempt to bypass those protections. Instead, blocked pages are marked accordingly in the response.
Limitations and Job Duration
Although the Crawl API can process large websites, there are certain limits to keep in mind.
-
Crawl jobs are allowed to run for up to seven days. If a crawl takes longer than that, it is automatically cancelled.
-
Additionally, the amount of browser time available depends on your Cloudflare plan. Free plan users may encounter stricter limits on browser usage.
These limits encourage developers to design efficient crawling strategies rather than attempting to crawl extremely large websites without constraints.
Real-World Use Cases for the Crawl API
The release of the Crawl API opens up several interesting possibilities for developers.
-
One common use case involves building knowledge bases for artificial intelligence systems. Many AI assistants rely on retrieval-augmented generation pipelines, where a system retrieves relevant documents before generating an answer. The Crawl API can gather content from documentation websites and convert it into structured formats that can be indexed by vector databases.
-
Another important use case involves monitoring website changes. Companies often track competitors, regulatory updates or content changes across multiple websites. By running scheduled crawl jobs, developers can detect when pages are updated and trigger alerts or automated analyses.
-
Content aggregation platforms also benefit from this technology. Instead of manually maintaining scraping scripts for dozens of websites, developers can initiate crawl jobs and process the resulting data programmatically.
Why This Release Matters for Developers
The introduction of the Crawl API represents an important shift in how developers interact with web content.
Historically, building large-scale crawlers required specialized infrastructure and significant engineering effort. By abstracting the browser environment and crawl logic behind a simple API, Cloudflare reduces the barrier to entry for developers who want to analyze web content.
Instead of managing headless browsers and distributed crawlers, developers can focus on the higher-level tasks that actually deliver value, such as data analysis, machine learning and application development.
Final Thoughts
The Cloudflare Crawl API demonstrates how modern cloud platforms are evolving beyond traditional infrastructure services. Rather than simply providing servers or networking capabilities, platforms like Cloudflare are increasingly offering higher-level developer tools that simplify complex tasks.
With the /crawl endpoint, developers can crawl entire websites using a single API request, automatically discover pages, render JavaScript content
and retrieve structured data that can be used in a wide range of applications.
As web applications continue to rely more heavily on data collection and AI-driven analysis, tools like this are likely to become an essential part of the developer ecosystem.
Explore Cloudflare Crawl API '/crawl' Explained with Real-World Examples
