Web scraping for AI

The scraping tool built for RAG pipelines.

Turn any website into clean, structured text ready for retrieval, indexing, and evaluation pipelines. Monitor every crawl live and export in the format your stack needs.

Start scraping free See pricing

1,000 free crawl units. No credit card required.

Export formats

< 1 min

To first crawl

1,000

Free crawl units

FireScraper dashboard showing active crawl projects with real-time status

Inside the workspace

More than a scraper. A workspace for AI data ops.

AI-ready output

Export website content as clean text, JSONL documents, and chunked formats that slot straight into embedding, retrieval, and evaluation pipelines.

FireScraper session page showing export downloads and crawl results

REST API, SDKs, and an MCP server

Start crawls from scripts, CI pipelines, and AI agents. TypeScript and Python SDKs, plus an MCP server that drops FireScraper straight into Claude, Cursor, and Windsurf — crawling tools right in the conversation.

FireScraper developers page showing API keys and REST documentation

Scheduled recurring crawls

Set daily, weekly, or monthly schedules from any project configuration. FireScraper queues fresh runs automatically so your datasets stay current.

FireScraper schedules page showing recurring crawl configurations

Capabilities

Everything AI teams need from a scraper.

Ask your corpus

Turn a finished crawl into a queryable knowledge base — ask questions in plain language and get answers grounded in the scraped pages, with cited sources.

Parallel workers

Track every page live as it moves through the crawl pipeline with real-time queue visibility.

CSV, JSON, and JSONL exports

Download clean files ready for spreadsheets, ETL jobs, vector databases, and downstream tools.

Webhook delivery

Receive HMAC-signed callbacks when crawls finish so downstream pipelines start immediately.

Structured extraction

Define a JSON schema and pull typed fields from every page alongside the full text.

Markdown output

LLM-optimized text that uses fewer tokens than raw HTML for leaner RAG retrieval.

robots.txt support

Honour site rules automatically. Blocked URLs are logged so nothing is silently skipped.

MCP server for AI agents

Connect FireScraper to Claude, Cursor, and any MCP client with npx @firescraper/mcp — give your agent crawling tools in one line.

Model Context Protocol

Put FireScraper inside your AI agent

Connect once to Codex, Claude, Cursor, or any MCP client — then just ask. Your agent crawls the live web, gets clean Markdown, and queries a crawled site, right in the conversation.

Crawl a site and reason over the real, current page
Ask a finished crawl a question — with cited sources
Pull clean Markdown or structured JSON into your agent

How to connect the MCP Docs

Setup guides for Codex and Claude.

Codex — FireScraper MCP

Crawl https://docs.stripe.com/webhooks and tell me how to verify a webhook signature — cite the page.

⚡firescraper_scrape_and_wait — crawling docs.stripe.com/webhooks…

→ 7 pages crawled · clean Markdown ready

⚡firescraper_ask_corpus — "how do I verify a webhook signature?"

→ answered from 1 source

✦

Verify it with stripe.webhooks.constructEvent(rawBody, signature, secret). It throws if the signature or timestamp is invalid, so you know the event really came from Stripe [1].

[1] docs.stripe.com/webhooks

Connected@firescraper/mcp

crawl.ts — @firescraper/sdk

// Install: npm i @firescraper/sdk
import { FireScraper } from '@firescraper/sdk';
 
const client = new FireScraper('fsk_your_key');
 
// Start a crawl
const session = await client.scrape({
  urls: ['https://docs.example.com/'],
  maxDepth: 2,
});
 
await client.waitForCompletion(session.id);
 
// AI-ready output, ready for your pipeline
const md     = await client.getResults(session.id, 'markdown');
const chunks = await client.getResults(session.id, 'chunks');

TypeScript & Python SDKs

Crawl from code in a few lines

Install the SDK, start a crawl, and pull AI-ready output — typed methods, clear errors, no glue code.

client.scrape()Start a crawl

client.waitForCompletion()Poll until it finishes

client.getResults()Download markdown, chunks, vectors, JSON…

client.listResults()See available exports

client.getPartialResults()Stream results mid-crawl

Read the SDK docs npm i @firescraper/sdk

Built for AI teams

What teams build with FireScraper

Build RAG datasets from documentation, blogs, and public knowledge bases

Feed structured website text into LLM fine-tuning and evaluation workflows

Automate recurring crawls with scheduled scrapes and webhook callbacks

Trigger scrapes from CI pipelines, n8n, or custom agents via the REST API

Extract structured product data, pricing, and metadata with JSON schemas

Ask a crawled site questions and get cited answers — RAG built in

Start with 1,000 free crawl units.

No credit card required. Upgrade when you need more.

Start scraping free View pricing