Unstract uses LLMs to extract structured JSON from documents — PDFs, images, scans, you name it. Define what you want to extract using natural language prompts, and deploy as an API or ETL pipeline.
Built for teams in finance, insurance, healthcare, KYC/compliance, and much more.
| Task | Without Unstract | With Unstract |
|---|---|---|
| Schema definition | Write regex, build templates per vendor | Write a prompt once, handles variations |
| New document type | Days of development | Minutes in Prompt Studio |
| LLM integration | Build your own pipeline | Plug in any provider (OpenAI, Anthropic, Bedrock, Ollama) |
| Deployment | Custom infrastructure | ./run-platform.sh or managed cloud |
| Output | Unstructured text blobs | Clean JSON, ready for your database |
⭐ If Unstract helps you, star this repo!
Prompt Studio — Define document extraction schemas with natural language. Docs →
API Deployment — Send a document over REST API, get JSON back. Docs →
ETL Pipeline — Pull documents from a folder, process them, load to your warehouse. Docs →
MCP Server — Connect to AI agents (Claude, etc.) via Model Context Protocol. Docs →
n8n Node — Drop into existing automation workflows. Docs →
- Linux or macOS (Intel or M-series)
- Docker & Docker Compose
- 8 GB RAM minimum
- Git
# Clone and start
git clone https://github.com/Zipstack/unstract.git
cd unstract
./run-platform.shThat's it!
- Visit http://frontend.unstract.localhost in your browser
- Login with username:
unstractpassword:unstract - Start extracting data!
# Pull and run entire Unstract platform with default env config.
./run-platform.sh
# Pull and run docker containers with a specific version tag.
./run-platform.sh -v v0.1.0
# Upgrade existing Unstract platform setup by pulling the latest available version.
./run-platform.sh -u
# Upgrade existing Unstract platform setup by pulling a specific version.
./run-platform.sh -u -v v0.2.0
# Build docker images locally as a specific version tag.
./run-platform.sh -b -v v0.1.0
# Build docker images locally from working branch as `current` version tag.
./run-platform.sh -b -v current
# Display the help information.
./run-platform.sh -h
# Only do setup of environment files.
./run-platform.sh -e
# Only do docker images pull with a specific version tag.
./run-platform.sh -p -v v0.1.0
# Only do docker images pull by building locally with a specific version tag.
./run-platform.sh -p -b -v v0.1.0
# Upgrade existing Unstract platform setup with docker images built locally from working branch as `current` version tag.
./run-platform.sh -u -b -v current
# Pull and run docker containers in detached mode.
./run-platform.sh -d -v v0.1.0Warning
This key encrypts adapter credentials — losing it makes existing adapters inaccessible!
Copy the value of ENCRYPTION_KEY from backend/.env or platform-service/.env to a secure location.
┌────────────────────────────────────────────────────────────┐
│ Unstract │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│ Frontend │ Backend │ Worker │ Platform Service │
│ (React) │ (Django) │ (Celery) │ (FastAPI) │
├─────────────┴─────────────┴─────────────┴──────────────────┤
│ Cache (Redis) │
├────────────────────────────────────────────────────────────┤
│ Message Queue (RabbitMQ) │
├────────────────────────────────────────────────────────────┤
│ Database (PostgreSQL) │
├────────────────────────────────────────────────────────────┤
│ LLM Adapters │ Vector DBs │ Text Extractors │
│ (OpenAI, etc.) │ (Qdrant, etc.) │ (LLMWhisperer) │
└────────────────────────────────────────────────────────────┘
Also see architecture.
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, DOC, ODT, TXT, CSV, JSON |
| Spreadsheets | XLSX, XLS, ODS |
| Presentations | PPTX, PPT, ODP |
| Images | PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP |
| Provider | Status | Provider | Status |
|---|---|---|---|
| OpenAI | ✅ | Azure OpenAI | ✅ |
| Anthropic Claude | ✅ | Google Gemini | ✅ |
| AWS Bedrock | ✅ | Mistral AI | ✅ |
| Ollama (local) | ✅ | Anyscale | ✅ |
| Provider | Status | Provider | Status |
|---|---|---|---|
| Qdrant | ✅ | Pinecone | ✅ |
| Weaviate | ✅ | PostgreSQL | ✅ |
| Milvus | ✅ |
| Provider | Status |
|---|---|
| LLMWhisperer | ✅ |
| Unstructured.io | ✅ |
| LlamaIndex Parse | ✅ |
Sources: AWS S3, MinIO, Google Cloud Storage, Azure Blob, Google Drive, Dropbox, SFTP
Destinations: Snowflake, Amazon Redshift, Google BigQuery, PostgreSQL, MySQL, MariaDB, SQL Server, Oracle
Follow these steps to change the default username and password.
# Install pre-commit hooks
./dev-env-cli.sh -p
# Run pre-commit checks
./dev-env-cli.sh -rFinance & Banking → | Insurance → | Healthcare → | Income Tax →
For teams that need managed infrastructure, advanced accuracy features, or compliance certifications.
- ✅ LLMChallenge — dual-LLM verification
- ✅ SinglePass & Summarized Extraction — reduce LLM token costs
- ✅ Human-in-the-Loop — review interface with document highlighting
- ✅ SSO & Enterprise RBAC — SAML/OIDC integration with granular role-based access control
- ✅ SOC 2, HIPAA, ISO 27001, GDPR Compliant — third-party audited security certifications
- ✅ Priority Support with SLA — dedicated support team with response time guarantees
- Unstract + PostgreSQL + DeepSeek
- Unstract + n8n
- Unstract + Snowflake
- Unstract + BigQuery
- Unstract + Crew.AI
- Unstract + PydanticAI
- Unstract MCP Server
We welcome contributions! The easiest way to start:
- Pick an issue tagged
good first issue - Submit a PR
Report Bug → | Request Feature →
Join the LLM-powered document automation community:
Unstract integrates Posthog to track minimal usage analytics. Disable by setting REACT_APP_ENABLE_POSTHOG=false in the frontend's .env file.
Unstract is released under the AGPL-3.0 License.
Built with ❤️ by Zipstack



