Skip to content

spider-rs/spider

Spider

Build Status Crates.io Downloads Documentation License: MIT Discord

Website | Guides | API Docs | Examples | Discord

A high-performance web crawler and scraper for Rust. 200-1000x faster than popular alternatives, with HTTP, headless Chrome, and WebDriver rendering in a single library.

Quick Start

Command Line

cargo install spider_cli
spider --url https://example.com

Rust

[dependencies]
spider = "2"
use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    website.crawl().await;
    println!("Pages found: {}", website.get_links().len());
}

Streaming

Process each page the moment it's crawled, not after:

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("- {}", page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

Headless Chrome

Add one feature flag to render JavaScript-heavy pages:

[dependencies]
spider = { version = "2", features = ["chrome"] }
use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_chrome_intercept(RequestInterceptConfiguration::new(true))
        .with_stealth(true)
        .build()
        .unwrap();

    website.crawl().await;
}

Also supports WebDriver (Selenium Grid, remote browsers) and AI-driven automation. See examples for more.

Benchmarks

Crawling 185 pages on rsseau.fr (source, 10 samples averaged):

Apple M1 Max (10-core, 64 GB RAM):

Crawler Language Time vs Spider
spider Rust 73 ms baseline
node-crawler JavaScript 15 s 205x slower
colly Go 32 s 438x slower
wget C 70 s 959x slower

Linux (2-core, 7 GB RAM):

Crawler Language Time vs Spider
spider Rust 50 ms baseline
node-crawler JavaScript 3.4 s 68x slower
colly Go 30 s 600x slower
wget C 60 s 1200x slower

The gap grows with site size. Spider handles 100k+ pages in minutes where other crawlers take hours. This comes from Rust's async runtime (tokio), lock-free data structures, and optional io_uring on Linux. Full details

Why Spider?

Most crawlers force a choice between fast HTTP-only or slow-but-flexible browser automation. Spider supports both, and you can mix them in the same crawl.

Supports HTTP, Chrome, and WebDriver. Switch rendering modes with a feature flag. Use HTTP for speed, Chrome CDP for JavaScript-heavy pages, and WebDriver for Selenium Grid or cross-browser testing.

Only compile what you use. Every optional capability (Chrome, caching, proxies, AI) lives behind a Cargo feature flag. A minimal spider = "2" stays lean.

Built for production. Caching (memory, disk, hybrid), proxy rotation, anti-bot fingerprinting, ad blocking, depth budgets, cron scheduling, and distributed workers. All of this has been hardened through Spider Cloud.

AI automation included. spider_agent adds multimodal LLM-driven automation: navigate pages, fill forms, solve challenges, and extract structured data with OpenAI or any compatible API.

Features

Crawling
  • Concurrent and streaming crawls with backpressure
  • Decentralized crawling for horizontal scaling
  • Caching: memory, disk (SQLite), or hybrid Chrome cache
  • Proxy support with rotation
  • Cron job scheduling
  • Depth budgeting, blacklisting, whitelisting
  • Smart mode that auto-detects JS-rendered content and upgrades to Chrome
Browser Automation
Data Processing
AI Agent
  • spider_agent: concurrent-safe multimodal web automation agent
  • Multiple LLM providers (OpenAI, any OpenAI-compatible API, Chrome built-in AI)
  • Web research with search providers (Serper, Brave, Bing, Tavily)
  • 110 built-in automation skills for web challenges

Spider Cloud

For managed proxy rotation, anti-bot bypass, and CAPTCHA handling, Spider Cloud plugs in with one line:

let mut website = Website::new("https://protected-site.com")
    .with_spider_cloud("your-api-key")  // enable with features = ["spider_cloud"]
    .build()
    .unwrap();
Mode Strategy Best For
Proxy (default) All traffic through Spider Cloud proxy General crawling with IP rotation
Smart (recommended) Proxy + auto-fallback on bot detection Production (speed + reliability)
Fallback Direct first, API on failure Cost-efficient, most sites work without help
Unblocker All requests through unblocker Aggressive bot protection

Free credits on signup. Get started at spider.cloud

Get Spider

Package Language Install
spider Rust cargo add spider
spider_cli CLI cargo install spider_cli
spider-nodejs Node.js npm i @spider-rs/spider-rs
spider-py Python pip install spider_rs
spider_agent Rust cargo add spider --features agent

Cloud and Remote

Package Description
Spider Cloud Managed crawling infrastructure, no setup needed
spider-clients SDKs for Spider Cloud in multiple languages
spider-browser Remote access to Spider's Rust browser

Resources

Contributing

Contributions welcome. See CONTRIBUTING.md for setup and guidelines.

Spider has been actively developed for the past 4 years. Join the Discord for questions and discussion.

License

MIT