Website | Guides | API Docs | Examples | Discord
A high-performance web crawler and scraper for Rust. 200-1000x faster than popular alternatives, with HTTP, headless Chrome, and WebDriver rendering in a single library.
- Crawl 100k+ pages in minutes on a single machine. See benchmarks.
- HTTP, Chrome CDP, WebDriver, and AI automation in one dependency.
- Production-ready with caching, proxy rotation, anti-bot bypass, and distributed crawling. Feature-gated so you only compile what you use.
cargo install spider_cli
spider --url https://example.com[dependencies]
spider = "2"use spider::tokio;
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://example.com");
website.crawl().await;
println!("Pages found: {}", website.get_links().len());
}Process each page the moment it's crawled, not after:
use spider::tokio;
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://example.com");
let mut rx = website.subscribe(0).unwrap();
tokio::spawn(async move {
while let Ok(page) = rx.recv().await {
println!("- {}", page.get_url());
}
});
website.crawl().await;
website.unsubscribe();
}Add one feature flag to render JavaScript-heavy pages:
[dependencies]
spider = { version = "2", features = ["chrome"] }use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://example.com")
.with_chrome_intercept(RequestInterceptConfiguration::new(true))
.with_stealth(true)
.build()
.unwrap();
website.crawl().await;
}Also supports WebDriver (Selenium Grid, remote browsers) and AI-driven automation. See examples for more.
Crawling 185 pages on rsseau.fr (source, 10 samples averaged):
Apple M1 Max (10-core, 64 GB RAM):
| Crawler | Language | Time | vs Spider |
|---|---|---|---|
| spider | Rust | 73 ms | baseline |
| node-crawler | JavaScript | 15 s | 205x slower |
| colly | Go | 32 s | 438x slower |
| wget | C | 70 s | 959x slower |
Linux (2-core, 7 GB RAM):
| Crawler | Language | Time | vs Spider |
|---|---|---|---|
| spider | Rust | 50 ms | baseline |
| node-crawler | JavaScript | 3.4 s | 68x slower |
| colly | Go | 30 s | 600x slower |
| wget | C | 60 s | 1200x slower |
The gap grows with site size. Spider handles 100k+ pages in minutes where other crawlers take hours. This comes from Rust's async runtime (tokio), lock-free data structures, and optional io_uring on Linux. Full details
Most crawlers force a choice between fast HTTP-only or slow-but-flexible browser automation. Spider supports both, and you can mix them in the same crawl.
Supports HTTP, Chrome, and WebDriver. Switch rendering modes with a feature flag. Use HTTP for speed, Chrome CDP for JavaScript-heavy pages, and WebDriver for Selenium Grid or cross-browser testing.
Only compile what you use. Every optional capability (Chrome, caching, proxies, AI) lives behind a Cargo feature flag. A minimal spider = "2" stays lean.
Built for production. Caching (memory, disk, hybrid), proxy rotation, anti-bot fingerprinting, ad blocking, depth budgets, cron scheduling, and distributed workers. All of this has been hardened through Spider Cloud.
AI automation included. spider_agent adds multimodal LLM-driven automation: navigate pages, fill forms, solve challenges, and extract structured data with OpenAI or any compatible API.
Crawling
- Concurrent and streaming crawls with backpressure
- Decentralized crawling for horizontal scaling
- Caching: memory, disk (SQLite), or hybrid Chrome cache
- Proxy support with rotation
- Cron job scheduling
- Depth budgeting, blacklisting, whitelisting
- Smart mode that auto-detects JS-rendered content and upgrades to Chrome
Browser Automation
- Chrome DevTools Protocol: headless or headed, stealth mode, screenshots, request interception
- WebDriver: Selenium Grid, remote browsers, cross-browser testing
- AI-powered challenge solving (deterministic + Chrome built-in AI)
- Anti-bot fingerprinting, ad blocking, firewall
Data Processing
- HTML transformations (Markdown, text, structured extraction)
- CSS/XPath scraping with spider_utils
- OpenAI and Gemini integration for content analysis
AI Agent
- spider_agent: concurrent-safe multimodal web automation agent
- Multiple LLM providers (OpenAI, any OpenAI-compatible API, Chrome built-in AI)
- Web research with search providers (Serper, Brave, Bing, Tavily)
- 110 built-in automation skills for web challenges
For managed proxy rotation, anti-bot bypass, and CAPTCHA handling, Spider Cloud plugs in with one line:
let mut website = Website::new("https://protected-site.com")
.with_spider_cloud("your-api-key") // enable with features = ["spider_cloud"]
.build()
.unwrap();| Mode | Strategy | Best For |
|---|---|---|
| Proxy (default) | All traffic through Spider Cloud proxy | General crawling with IP rotation |
| Smart (recommended) | Proxy + auto-fallback on bot detection | Production (speed + reliability) |
| Fallback | Direct first, API on failure | Cost-efficient, most sites work without help |
| Unblocker | All requests through unblocker | Aggressive bot protection |
Free credits on signup. Get started at spider.cloud
| Package | Language | Install |
|---|---|---|
| spider | Rust | cargo add spider |
| spider_cli | CLI | cargo install spider_cli |
| spider-nodejs | Node.js | npm i @spider-rs/spider-rs |
| spider-py | Python | pip install spider_rs |
| spider_agent | Rust | cargo add spider --features agent |
| Package | Description |
|---|---|
| Spider Cloud | Managed crawling infrastructure, no setup needed |
| spider-clients | SDKs for Spider Cloud in multiple languages |
| spider-browser | Remote access to Spider's Rust browser |
- 64 examples covering crawling, Chrome, WebDriver, AI, caching, and more
- API documentation
- Benchmarks
- Changelog
Contributions welcome. See CONTRIBUTING.md for setup and guidelines.
Spider has been actively developed for the past 4 years. Join the Discord for questions and discussion.