FireScraperWeb scraping for AI/Documentation

FireScraper API Documentation

Use the TypeScript SDK or the REST API to start crawls, monitor progress, and download results. Everything you need to build web scraping into your pipeline.

Get an API key npm: @firescraper/sdk

Quick start

Zero dependencies. Works in Node.js 18+, Bun, Deno, and Cloudflare Workers.

Terminal

npm install @firescraper/sdk

TypeScript

import { FireScraper } from '@firescraper/sdk';

const client = new FireScraper('fsk_your_api_key');

// Start a crawl
const session = await client.scrape({
  name: 'Docs crawl',
  urls: ['https://docs.example.com/'],
  maxDepth: 2,
  scraper: 'article',
});

// Wait for it to finish
const result = await client.waitForCompletion(session.id, {
  onProgress: (s) => console.log(`${s.counts.success} pages scraped`),
});

// Download clean Markdown for your RAG pipeline
const download = await client.getResults(session.id, 'markdown');

Authentication

Every request requires an API key. Create keys from API Keys in the dashboard. Keys start with fsk_ and are shown only once.

Authorization: Bearer fsk_your_api_key

Rate limits

POST /api/v1/scrape — 30 req/min per key

GET /api/v1/sessions/:id — 120 req/min per key

GET /api/v1/sessions/:id/results — 60 req/min per key

SDK methods

client.scrape(options)

Start a new crawl. Returns the session ID immediately.

client.getSession(id)

Get session status, page counts, and queue depth.

client.waitForCompletion(id, opts?)

Poll until the crawl finishes. Supports onProgress callbacks.

client.listResults(id)

List available export files after crawl completes.

client.getResults(id, format)

Download results in a specific format.

client.getPartialResults(id, fmt?)

Download pages scraped so far mid-crawl.

Feed a RAG pipeline

TypeScript

const session = await client.scrape({
  name: 'Knowledge base',
  urls: ['https://docs.example.com/'],
  maxDepth: 4,
  scraper: 'article',
  respectRobotsTxt: true,
});

await client.waitForCompletion(session.id);
const docs = await client.getResults(session.id, 'documents');
const text = new TextDecoder().decode(docs.data);

for (const line of text.split('\n').filter(Boolean)) {
  const doc = JSON.parse(line);
  await vectorStore.upsert(doc.document_id, doc.text);
}

Structured extraction

TypeScript

const session = await client.scrape({
  name: 'Product catalog',
  urls: ['https://shop.example.com/products'],
  maxDepth: 2,
  extractionSchema: {
    type: 'object',
    properties: {
      product_name: { type: 'string' },
      price: { type: 'number' },
      in_stock: { type: 'boolean' },
    },
  },
});

await client.waitForCompletion(session.id);
const extracted = await client.getResults(session.id, 'extracted');

REST API

Use the REST API directly with curl, Python, Go, or any HTTP client.

POST

/api/v1/scrape

Field	Type	Req	Description
name	string	Yes	Project name.
urls	string[]	Yes	One or more seed URLs.
ignoreUrls	string[]	No	URLs to exclude.
maxDepth	number	No	Link-hop depth (0 = seed only).
minTextLength	number	No	Minimum word count per page.
scraper	"article" \| "full"	No	Extraction mode. Default: article.
uniqueTextDownloads	boolean	No	Deduplicate text content.
respectRobotsTxt	boolean	No	Honour robots.txt rules.
contentSelector	string	No	CSS selector to restrict extraction. Max 500 chars.
webhookUrl	string	No	POST callback when crawl finishes.
extractionSchema	string (JSON)	No	JSON Schema for structured extraction.

curl

curl -X POST https://firescraper.com/api/v1/scrape \
  -H "Authorization: Bearer fsk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Vendor docs crawl",
    "urls": ["https://docs.vendor.com/"],
    "maxDepth": 2,
    "scraper": "article",
    "respectRobotsTxt": true,
    "contentSelector": "main article",
    "webhookUrl": "https://example.com/webhook"
  }'

Response (201)

{
  "id": "SESSION_ID",
  "status": "in-progress",
  "message": "Scrape session created successfully.",
  "webhookSecret": "whsec_abc123..."
}

GET

/api/v1/sessions/:id

curl

curl https://firescraper.com/api/v1/sessions/SESSION_ID \
  -H "Authorization: Bearer fsk_your_api_key"

Response (200)

{
  "session": {
    "id": "SESSION_ID",
    "name": "Vendor docs crawl",
    "status": "in-progress",
    "downloadFilesReady": false
  },
  "counts": {
    "success": 124, "warning": 3,
    "error": 1, "total": 128
  },
  "processing": {
    "serverInstancesCount": 3,
    "queueLength": 41
  }
}

GET

/api/v1/sessions/:id/results

curl

# List available files
curl https://firescraper.com/api/v1/sessions/SESSION_ID/results \
  -H "Authorization: Bearer fsk_your_api_key"

# Download a specific format
curl -L "https://firescraper.com/api/v1/sessions/SESSION_ID/results?format=markdown" \
  -H "Authorization: Bearer fsk_your_api_key" \
  -o corpus.md

# Partial export (mid-crawl)
curl -L "https://firescraper.com/api/v1/sessions/SESSION_ID/results?partial=true&format=csv" \
  -H "Authorization: Bearer fsk_your_api_key" \
  -o partial.csv

PATCH

/api/v1/sessions/:id

Rotate the webhook signing secret for a session.

curl

curl -X PATCH https://firescraper.com/api/v1/sessions/SESSION_ID \
  -H "Authorization: Bearer fsk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{ "action": "rotate_webhook_secret" }'

Result formats

csv

Tabular export — URL, title, text, word count, status.

json

Full JSON array of all scraped pages.

jsonl

Newline-delimited JSON. One document per line.

markdown

Clean Markdown. Uses fewer LLM tokens than HTML.

zip

All formats bundled in a single ZIP archive.

documents

JSONL documents artifact with metadata.

chunks

JSONL chunks artifact for vector stores.

extracted

Structured extraction output (requires schema).

manifest

Crawl manifest with summary and file index.

Webhooks

Include a webhookUrl when starting a crawl. FireScraper sends a POST when exports are ready. Deliveries retry up to 3 times with exponential backoff.

Webhook payload

{
  "event": "session.completed",
  "occurredAt": "2026-05-18T14:30:00.000Z",
  "sessionId": "SESSION_ID",
  "session": {
    "id": "SESSION_ID",
    "name": "Vendor docs crawl",
    "status": "done"
  },
  "files": [
    { "format": "csv", "fileName": "corpus-csv.csv" },
    { "format": "zip", "fileName": "corpus-zip.zip" }
  ]
}

Verifying signatures

The POST /api/v1/scrape response includes a webhookSecret (shown once). Each delivery includes an x-firescraper-signature header:

t=<unix_timestamp>,v1=<hmac_sha256_hex>

Compute HMAC-SHA256(secret, "<timestamp>.<raw_body>") and compare with the v1 value using constant-time comparison.

Lost your secret? Use the rotate_webhook_secret action via PATCH to generate a new one.

Error codes

400	Invalid request body or unsupported format.
401	Missing or invalid API key.
404	Session or artifact not found.
429	Rate limit exceeded. Check the Retry-After header.

Need help?

Open a support ticket. Include your session ID for faster resolution.

Open support ticket