Use the TypeScript SDK or the REST API to start crawls, monitor progress, and download results. Everything you need to build web scraping into your pipeline.
Zero dependencies. Works in Node.js 18+, Bun, Deno, and Cloudflare Workers.
npm install @firescraper/sdkimport { FireScraper } from '@firescraper/sdk';
const client = new FireScraper('fsk_your_api_key');
// Start a crawl
const session = await client.scrape({
name: 'Docs crawl',
urls: ['https://docs.example.com/'],
maxDepth: 2,
scraper: 'article',
});
// Wait for it to finish
const result = await client.waitForCompletion(session.id, {
onProgress: (s) => console.log(`${s.counts.success} pages scraped`),
});
// Download clean Markdown for your RAG pipeline
const download = await client.getResults(session.id, 'markdown');Every request requires an API key. Create keys from API Keys in the dashboard. Keys start with fsk_ and are shown only once.
Authorization: Bearer fsk_your_api_keyPOST /api/v1/scrape — 30 req/min per key
GET /api/v1/sessions/:id — 120 req/min per key
GET /api/v1/sessions/:id/results — 60 req/min per key
client.scrape(options)Start a new crawl. Returns the session ID immediately.
client.getSession(id)Get session status, page counts, and queue depth.
client.waitForCompletion(id, opts?)Poll until the crawl finishes. Supports onProgress callbacks.
client.listResults(id)List available export files after crawl completes.
client.getResults(id, format)Download results in a specific format.
client.getPartialResults(id, fmt?)Download pages scraped so far mid-crawl.
const session = await client.scrape({
name: 'Knowledge base',
urls: ['https://docs.example.com/'],
maxDepth: 4,
scraper: 'article',
respectRobotsTxt: true,
});
await client.waitForCompletion(session.id);
const docs = await client.getResults(session.id, 'documents');
const text = new TextDecoder().decode(docs.data);
for (const line of text.split('\n').filter(Boolean)) {
const doc = JSON.parse(line);
await vectorStore.upsert(doc.document_id, doc.text);
}const session = await client.scrape({
name: 'Product catalog',
urls: ['https://shop.example.com/products'],
maxDepth: 2,
extractionSchema: {
type: 'object',
properties: {
product_name: { type: 'string' },
price: { type: 'number' },
in_stock: { type: 'boolean' },
},
},
});
await client.waitForCompletion(session.id);
const extracted = await client.getResults(session.id, 'extracted');Use the REST API directly with curl, Python, Go, or any HTTP client.
/api/v1/scrape| Field | Type | Req | Description |
|---|---|---|---|
| name | string | Yes | Project name. |
| urls | string[] | Yes | One or more seed URLs. |
| ignoreUrls | string[] | No | URLs to exclude. |
| maxDepth | number | No | Link-hop depth (0 = seed only). |
| minTextLength | number | No | Minimum word count per page. |
| scraper | "article" | "full" | No | Extraction mode. Default: article. |
| uniqueTextDownloads | boolean | No | Deduplicate text content. |
| respectRobotsTxt | boolean | No | Honour robots.txt rules. |
| contentSelector | string | No | CSS selector to restrict extraction. Max 500 chars. |
| webhookUrl | string | No | POST callback when crawl finishes. |
| extractionSchema | string (JSON) | No | JSON Schema for structured extraction. |
curl -X POST https://firescraper.com/api/v1/scrape \
-H "Authorization: Bearer fsk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Vendor docs crawl",
"urls": ["https://docs.vendor.com/"],
"maxDepth": 2,
"scraper": "article",
"respectRobotsTxt": true,
"contentSelector": "main article",
"webhookUrl": "https://example.com/webhook"
}'{
"id": "SESSION_ID",
"status": "in-progress",
"message": "Scrape session created successfully.",
"webhookSecret": "whsec_abc123..."
}/api/v1/sessions/:idcurl https://firescraper.com/api/v1/sessions/SESSION_ID \
-H "Authorization: Bearer fsk_your_api_key"{
"session": {
"id": "SESSION_ID",
"name": "Vendor docs crawl",
"status": "in-progress",
"downloadFilesReady": false
},
"counts": {
"success": 124, "warning": 3,
"error": 1, "total": 128
},
"processing": {
"serverInstancesCount": 3,
"queueLength": 41
}
}/api/v1/sessions/:id/results# List available files
curl https://firescraper.com/api/v1/sessions/SESSION_ID/results \
-H "Authorization: Bearer fsk_your_api_key"
# Download a specific format
curl -L "https://firescraper.com/api/v1/sessions/SESSION_ID/results?format=markdown" \
-H "Authorization: Bearer fsk_your_api_key" \
-o corpus.md
# Partial export (mid-crawl)
curl -L "https://firescraper.com/api/v1/sessions/SESSION_ID/results?partial=true&format=csv" \
-H "Authorization: Bearer fsk_your_api_key" \
-o partial.csv/api/v1/sessions/:idRotate the webhook signing secret for a session.
curl -X PATCH https://firescraper.com/api/v1/sessions/SESSION_ID \
-H "Authorization: Bearer fsk_your_api_key" \
-H "Content-Type: application/json" \
-d '{ "action": "rotate_webhook_secret" }'Tabular export — URL, title, text, word count, status.
Full JSON array of all scraped pages.
Newline-delimited JSON. One document per line.
Clean Markdown. Uses fewer LLM tokens than HTML.
All formats bundled in a single ZIP archive.
JSONL documents artifact with metadata.
JSONL chunks artifact for vector stores.
Structured extraction output (requires schema).
Crawl manifest with summary and file index.
Include a webhookUrl when starting a crawl. FireScraper sends a POST when exports are ready. Deliveries retry up to 3 times with exponential backoff.
{
"event": "session.completed",
"occurredAt": "2026-05-18T14:30:00.000Z",
"sessionId": "SESSION_ID",
"session": {
"id": "SESSION_ID",
"name": "Vendor docs crawl",
"status": "done"
},
"files": [
{ "format": "csv", "fileName": "corpus-csv.csv" },
{ "format": "zip", "fileName": "corpus-zip.zip" }
]
}The POST /api/v1/scrape response includes a webhookSecret (shown once). Each delivery includes an x-firescraper-signature header:
t=<unix_timestamp>,v1=<hmac_sha256_hex>Compute HMAC-SHA256(secret, "<timestamp>.<raw_body>") and compare with the v1 value using constant-time comparison.
Lost your secret? Use the rotate_webhook_secret action via PATCH to generate a new one.
| 400 | Invalid request body or unsupported format. |
| 401 | Missing or invalid API key. |
| 404 | Session or artifact not found. |
| 429 | Rate limit exceeded. Check the Retry-After header. |
Open a support ticket. Include your session ID for faster resolution.