Spider

Website | Guides | API Docs | Examples | Discord

A high-performance web crawler and scraper for Rust. 200-1000x faster than popular alternatives, with HTTP, headless Chrome, and WebDriver rendering in a single library.

Crawl 100k+ pages in minutes on a single machine. See benchmarks.
HTTP, Chrome CDP, WebDriver, and AI automation in one dependency.
Production-ready with caching, proxy rotation, anti-bot bypass, and distributed crawling. Feature-gated so you only compile what you use.

Quick Start

Command Line

cargo install spider_cli
spider --url https://example.com

Rust

[dependencies]
spider = "2"

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    website.crawl().await;
    println!("Pages found: {}", website.get_links().len());
}

Streaming

Process each page the moment it's crawled, not after:

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("- {}", page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

Headless Chrome

Add one feature flag to render JavaScript-heavy pages:

[dependencies]
spider = { version = "2", features = ["chrome"] }

use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_chrome_intercept(RequestInterceptConfiguration::new(true))
        .with_stealth(true)
        .build()
        .unwrap();

    website.crawl().await;
}

Also supports WebDriver (Selenium Grid, remote browsers) and AI-driven automation. See examples for more.

Benchmarks

Crawling 185 pages on rsseau.fr (source, 10 samples averaged):

Apple M1 Max (10-core, 64 GB RAM):

Crawler	Language	Time	vs Spider
spider	Rust	73 ms	baseline
node-crawler	JavaScript	15 s	205x slower
colly	Go	32 s	438x slower
wget	C	70 s	959x slower

Linux (2-core, 7 GB RAM):

Crawler	Language	Time	vs Spider
spider	Rust	50 ms	baseline
node-crawler	JavaScript	3.4 s	68x slower
colly	Go	30 s	600x slower
wget	C	60 s	1200x slower

The gap grows with site size. Spider handles 100k+ pages in minutes where other crawlers take hours. This comes from Rust's async runtime (tokio), lock-free data structures, and optional io_uring on Linux. Full details

Why Spider?

Most crawlers force a choice between fast HTTP-only or slow-but-flexible browser automation. Spider supports both, and you can mix them in the same crawl.

Supports HTTP, Chrome, and WebDriver. Switch rendering modes with a feature flag. Use HTTP for speed, Chrome CDP for JavaScript-heavy pages, and WebDriver for Selenium Grid or cross-browser testing.

Only compile what you use. Every optional capability (Chrome, caching, proxies, AI) lives behind a Cargo feature flag. A minimal spider = "2" stays lean.

Built for production. Caching (memory, disk, hybrid), proxy rotation, anti-bot fingerprinting, ad blocking, depth budgets, cron scheduling, and distributed workers. All of this has been hardened through Spider Cloud.

AI automation included. spider_agent adds multimodal LLM-driven automation: navigate pages, fill forms, solve challenges, and extract structured data with OpenAI or any compatible API.

Features

Crawling

Concurrent and streaming crawls with backpressure
Decentralized crawling for horizontal scaling
Caching: memory, disk (SQLite), or hybrid Chrome cache
Proxy support with rotation
Cron job scheduling
Depth budgeting, blacklisting, whitelisting
Smart mode that auto-detects JS-rendered content and upgrades to Chrome

Browser Automation

Chrome DevTools Protocol: headless or headed, stealth mode, screenshots, request interception
WebDriver: Selenium Grid, remote browsers, cross-browser testing
AI-powered challenge solving (deterministic + Chrome built-in AI)
Anti-bot fingerprinting, ad blocking, firewall

Data Processing

HTML transformations (Markdown, text, structured extraction)
CSS/XPath scraping with spider_utils
OpenAI and Gemini integration for content analysis

AI Agent

spider_agent: concurrent-safe multimodal web automation agent
Multiple LLM providers (OpenAI, any OpenAI-compatible API, Chrome built-in AI)
Web research with search providers (Serper, Brave, Bing, Tavily)
110 built-in automation skills for web challenges

Spider Cloud

For managed proxy rotation, anti-bot bypass, and CAPTCHA handling, Spider Cloud plugs in with one line:

let mut website = Website::new("https://protected-site.com")
    .with_spider_cloud("your-api-key")  // enable with features = ["spider_cloud"]
    .build()
    .unwrap();

Mode	Strategy	Best For
Proxy (default)	All traffic through Spider Cloud proxy	General crawling with IP rotation
Smart (recommended)	Proxy + auto-fallback on bot detection	Production (speed + reliability)
Fallback	Direct first, API on failure	Cost-efficient, most sites work without help
Unblocker	All requests through unblocker	Aggressive bot protection

Free credits on signup. Get started at spider.cloud

Get Spider

Package	Language	Install
spider	Rust	`cargo add spider`
spider_cli	CLI	`cargo install spider_cli`
spider-nodejs	Node.js	`npm i @spider-rs/spider-rs`
spider-py	Python	`pip install spider_rs`
spider_agent	Rust	`cargo add spider --features agent`

Cloud and Remote

Package	Description
Spider Cloud	Managed crawling infrastructure, no setup needed
spider-clients	SDKs for Spider Cloud in multiple languages
spider-browser	Remote access to Spider's Rust browser

Resources

64 examples covering crawling, Chrome, WebDriver, AI, caching, and more
API documentation
Benchmarks
Changelog

Contributing

Contributions welcome. See CONTRIBUTING.md for setup and guidelines.

Spider has been actively developed for the past 4 years. Join the Discord for questions and discussion.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1,682 Commits
.cargo		.cargo
.github		.github
benches		benches
examples		examples
spider		spider
spider_agent		spider_agent
spider_agent_html		spider_agent_html
spider_agent_types		spider_agent_types
spider_cli		spider_cli
spider_utils		spider_utils
spider_worker		spider_worker
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
default.nix		default.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spider

Quick Start

Command Line

Rust

Streaming

Headless Chrome

Benchmarks

Why Spider?

Features

Spider Cloud

Get Spider

Cloud and Remote

Resources

Contributing

License

About

Uh oh!

Releases 158

Packages

Uh oh!

Used by 129

Contributors 31

Uh oh!

Languages

License

spider-rs/spider

Folders and files

Latest commit

History

Repository files navigation

Spider

Quick Start

Command Line

Rust

Streaming

Headless Chrome

Benchmarks

Why Spider?

Features

Spider Cloud

Get Spider

Cloud and Remote

Resources

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 158

Packages 0

Uh oh!

Used by 129

Contributors 31

Uh oh!

Languages

Packages