NeuralScraper v2.0

Web scraping, analysis & content extraction for AI agents.

Scrape pages, crawl sites, extract UI/brand/SEO data. MCP server + CLI + HTTP API. Local-first, self-hosted.

Part of the Neural* ecosystem. NeuralScraper handles web scraping & analysis — but it doesn't work alone. It pairs with NeuralVaultCore (persistent memory), NeuralVaultSkill (session automation), and NeuralVaultFlow (dev workflow orchestration). Each component has its own repository and documentation. See the Neural* Ecosystem section at the bottom.

What It Does

NeuralScraper gives AI agents (and humans) a clean, structured way to extract data from the web — no fluff, no cloud dependency.

Capability	Description
Scrape	Scrape page — web + PDF
Screenshot	Full-page PNG capture
Crawl	Multi-page scraping with depth and limit control
Map	Fast internal URL discovery
UI Analysis	Layout structure, components, spacing, typography
Brand Extraction	Dominant colors, fonts, logos
SEO Audit	Meta tags, headings, OG, schema markup, scoring
Analyze	Scrape + screenshot + UI + brand + SEO in one command
Search	Web search via SearXNG + scrape results
Extract	Structured data extraction with LLM (Ollama) and custom schema
Interact	Browser actions (click, type, wait) + scrape
Batch	Process a list of URLs from a file

Installation

Option 1 — Local (recommended)

git clone https://github.com/getobyte/NeuralScraper.git
cd NeuralScraper
npm install
npx playwright install chromium
npm run build

Make the CLI globally available:

npm link
# Now you can run: ns scrape https://example.com

Start the MCP server:

node dist/mcp-server.js

Option 2 — Docker (homelab)

git clone https://github.com/getobyte/NeuralScraper.git
cd NeuralScraper
cp .env.example .env
docker compose up -d

MCP server starts on port 9996 inside container NeuralScraper.

Verify:

docker ps | grep NeuralScraper
docker logs NeuralScraper

Connecting to Claude Code

Add to ~/.claude.json or .claude/settings.json in your project:

{
  "mcpServers": {
    "neuralscraper": {
      "command": "node",
      "args": ["D:/path/to/NeuralScraper/dist/mcp-server.js"]
    }
  }
}

Restart Claude Code. The following 12 tools will be available:

ns_scrape · ns_screenshot · ns_crawl · ns_map · ns_ui · ns_brand · ns_seo · ns_analyze · ns_search · ns_extract · ns_interact · ns_batch

HTTP API

NeuralScraper exposes a REST API when running as a server.

Method	Endpoint
`GET`	`/health`
`POST`	`/scrape`
`POST`	`/screenshot`
`POST`	`/crawl`
`POST`	`/map`
`POST`	`/ui`
`POST`	`/brand`
`POST`	`/seo`
`POST`	`/analyze`
`POST`	`/search`
`POST`	`/extract`
`POST`	`/interact`
`POST`	`/batch`

Using with Ollama (Local LLM)

NeuralScraper's ns extract command uses Ollama to run a local LLM for structured data extraction — no cloud, no API keys.

Step 1 — Install Ollama

Windows / macOS: Download the installer from ollama.com/download and run it.

Linux:

curl -fsSL https://ollama.com/install.sh | sh

Verify:

ollama --version

Step 2 — Pull the recommended model

ollama pull qwen3:14b

qwen3:14b — 9.3 GB, 40K context, native tool use support. Recommended for ns extract flows.

Step 3 — Run

ollama run qwen3:14b

Ollama runs as a local API server on http://localhost:11434. No internet required after the initial pull.

CLI Usage

# Scrape a page (web or PDF)
ns scrape https://example.com

# Full-page screenshot
ns screenshot https://example.com

# Crawl a site
ns crawl https://example.com --depth 2 --limit 20

# Discover URLs
ns map https://example.com

# UI analysis
ns ui https://example.com

# Brand extraction
ns brand https://example.com

# SEO audit
ns seo https://example.com

# Full analysis (scrape + screenshot + UI + brand + SEO)
ns analyze https://example.com

# Web search via SearXNG + scrape results
ns search "best react libs" --limit 5

# Structured extraction with LLM (Ollama)
ns extract https://example.com --schema '{"price":"string"}'

# Browser automation (click, type, wait) + scrape
ns interact https://example.com --actions '[{"click":".btn"}]'

# Batch processing from a file
ns batch urls.txt

CLI Options

Option	Commands	Default
`-o, --output <dir>`	all	`./ns-output`
`-d, --depth <n>`	`crawl`	`2`
`-l, --limit <n>`	`crawl`, `search`	`20` / `5`
`--no-screenshot`	`scrape`, `crawl`, `batch`	—
`-s, --schema <json>`	`extract`	—
`-p, --prompt <text>`	`extract`	—
`-a, --actions <json>`	`interact`	`[]`
`--no-scrape`	`search`	—
`--no-scrape-after`	`interact`	—

Output Structure

Single page scrape:

ns-output/
  example.com/
    2026-03-28T14-30-00/
      page.md
      page.html
      metadata.json
      links.json
      screenshot.png
      ui-analysis.json
      brand.json
      seo-audit.json
      manifest.json

Crawl job:

ns-output/
  example.com/
    crawl-2026-03-28T14-30-00/
      manifest.json
      pages.json
      pages/
        001-home/
        002-about/
        ...

Architecture

src/
  browser/
    playwright.ts        # Browser pool management
    screenshot.ts        # Full-page screenshot
  extractors/
    markdown.ts          # HTML → Markdown (readability + turndown)
    metadata.ts          # Meta tags, OG, Twitter cards
    links.ts             # Link extraction & classification
    ui-analyzer.ts       # Layout, components, spacing, fonts
    brand.ts             # Colors, fonts, logos
    seo.ts               # SEO audit with scoring
  storage/
    writer.ts            # File output & manifest generation
  tools/
    scrape.ts
    screenshot.ts
    crawl.ts
    map.ts
    ui.ts
    brand.ts
    seo.ts
    analyze.ts
    search.ts
    extract.ts
    interact.ts
    batch.ts
  cli.ts                 # CLI entry point (commander)
  mcp-server.ts          # MCP server entry point (stdio)
  index.ts               # Library exports

Stack


Runtime	Node.js 20+
Language	TypeScript 5.8
Browser	Playwright (Chromium)
HTML → MD	@mozilla/readability + turndown
HTML parsing	cheerio
MCP	@modelcontextprotocol/sdk
CLI	commander
Build	tsup

Neural* Ecosystem

NeuralScraper is a standalone tool — but it's designed to work alongside the rest of the Neural* family. Each component lives in its own repo with its own docs.

Component	Role	Repo
NeuralScraper (you are here)	Web scraping & analysis	—
NeuralVaultCore	Persistent memory for AI agents	→ GitHub
NeuralVaultSkill	Session memory automation	→ GitHub
NeuralVaultFlow	Dev workflow orchestration	→ GitHub

NeuralScraper v2.0 — Cyber-Draco Legacy Built by getobyte

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuralScraper v2.0

What It Does

Installation

Option 1 — Local (recommended)

Option 2 — Docker (homelab)

Connecting to Claude Code

HTTP API

Using with Ollama (Local LLM)

Step 1 — Install Ollama

Step 2 — Pull the recommended model

Step 3 — Run

CLI Usage

CLI Options

Output Structure

Architecture

Stack

Neural* Ecosystem

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NeuralScraper v2.0

What It Does

Installation

Option 1 — Local (recommended)

Option 2 — Docker (homelab)

Connecting to Claude Code

HTTP API

Using with Ollama (Local LLM)

Step 1 — Install Ollama

Step 2 — Pull the recommended model

Step 3 — Run

CLI Usage

CLI Options

Output Structure

Architecture

Stack

Neural* Ecosystem

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages