FireScraper Python SDK: Sync, Async, and LangChain Integration
The FireScraper Python SDK is live on PyPI. Install it, pass your API key, and start scraping websites from any Python environment — scripts, notebooks, async pipelines, or LangChain RAG workflows.
pip install firescraper
Why a Python SDK?
The REST API works with any language, and the TypeScript SDK has been available since launch. But most AI and ML teams work in Python. If you are building a RAG pipeline with LangChain, training a model with PyTorch, or running data processing in Jupyter notebooks, you should not have to write HTTP requests by hand.
The Python SDK gives you typed methods, automatic polling, progress callbacks, and proper error handling — everything you need to integrate FireScraper into a Python workflow.
Quick Start
from firescraper import FireScraper
client = FireScraper("fsk_your_api_key")
# Start a crawl
session = client.scrape(
name="Documentation crawl",
urls=["https://docs.example.com/"],
max_depth=2,
scraper="article",
)
print(f"Session started: {session.id}")
# Wait for completion with progress updates
result = client.wait_for_completion(
session.id,
on_progress=lambda s: print(
f" {s.counts.success}/{s.counts.total} pages scraped"
),
)
print(f"Done! {result.counts.success} pages scraped")
# Download results as JSON
download = client.get_results(session.id, format="json")
with open("results.json", "wb") as f:
f.write(download.data)
That is the entire workflow: start a crawl, wait for it to finish, download the results. The SDK handles authentication, polling, and error mapping.
Async Support
For async pipelines, use AsyncFireScraper. It has the same API but every method is a coroutine:
import asyncio
from firescraper import AsyncFireScraper
async def main():
async with AsyncFireScraper("fsk_your_api_key") as client:
session = await client.scrape(
name="Async crawl",
urls=["https://docs.example.com/"],
max_depth=2,
)
result = await client.wait_for_completion(session.id)
print(f"Scraped {result.counts.success} pages")
download = await client.get_results(session.id, format="json")
print(f"Downloaded {len(download.data)} bytes")
asyncio.run(main())
Use the async client when you are running inside an async framework like FastAPI, or when you want to run multiple crawls concurrently.
LangChain Integration
If you are building a RAG pipeline with LangChain, the FireScraperLoader turns any website into LangChain Document objects in one call:
pip install firescraper langchain-firescraper langchain-core
from langchain_firescraper import FireScraperLoader
loader = FireScraperLoader(
api_key="fsk_your_api_key",
urls=["https://docs.example.com/"],
max_depth=3,
scraper="article",
)
# Load all pages as LangChain Documents
docs = loader.load()
print(f"Loaded {len(docs)} documents")
for doc in docs[:3]:
print(f" {doc.metadata['url']} — {doc.metadata['word_count']} words")
Each Document has:
page_content— the extracted textmetadata—url,title,word_count,session_id,scraper,source
Plugging into a RAG Pipeline
Here is how you would use FireScraperLoader with a vector store and a retrieval chain:
from langchain_firescraper import FireScraperLoader
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
# 1. Scrape the docs
loader = FireScraperLoader(
api_key="fsk_your_api_key",
urls=["https://docs.example.com/"],
max_depth=3,
)
docs = loader.load()
# 2. Chunk the text
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)
chunks = splitter.split_documents(docs)
# 3. Embed and store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)
# 4. Query
qa = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4"),
retriever=vectorstore.as_retriever(),
)
answer = qa.invoke("How do I authenticate API requests?")
print(answer["result"])
That is a complete RAG pipeline — from raw website to answering questions — in about 20 lines of Python.
Lazy Loading for Large Crawls
For crawls with thousands of pages, use lazy_load() to process documents one at a time without loading everything into memory:
for doc in loader.lazy_load():
# Process each document as it arrives
chunks = splitter.split_documents([doc])
vectorstore.add_documents(chunks)
All SDK Methods
| Method | Description |
|---|---|
| client.scrape(name, urls, ...) | Start a new crawl session |
| client.get_session(session_id) | Get status, page counts, queue depth |
| client.wait_for_completion(session_id) | Poll until the crawl finishes |
| client.list_results(session_id) | List available export files |
| client.get_results(session_id, format) | Download results (json, csv, markdown, zip, ...) |
| client.get_partial_results(session_id) | Download mid-crawl results |
Error Handling
The SDK maps HTTP errors to typed exceptions:
from firescraper.exceptions import (
AuthenticationError, # 401 — bad or missing API key
BadRequestError, # 400 — invalid parameters
NotFoundError, # 404 — session not found
RateLimitError, # 429 — too many requests
ServerError, # 5xx — server-side issue
TimeoutError, # request or poll timeout
)
try:
session = client.scrape(name="Test", urls=["https://example.com"])
except AuthenticationError:
print("Check your API key")
except RateLimitError:
print("Slow down — retry after a moment")
Install and Get Started
pip install firescraper
Create an API key from the dashboard, and you are ready to go. The SDK requires Python 3.9+ and has a single dependency (httpx).
Python SDK is live
pip install firescraper. Sync, async, and LangChain integration included. 1,000 free credits to start.