Skip to content

uncase-ai/UNCASE

Repository files navigation

UNCASE

Unbiased Neutral Convention for Agnostic Seed Engineering
Privacy-first synthetic data pipeline for fine-tuning LLMs in regulated industries.
Blockchain-anchored quality verification on Polygon PoS.

This project is proudly

SPONSORED BY E2B FOR STARTUPS


Why UNCASE?

Organizations in healthcare, finance, legal, automotive, manufacturing, and education need specialized LLMs — but fine-tuning requires real conversations full of PII, PHI, and legally privileged data.

UNCASE solves this. It generates high-quality synthetic training data from structured seeds (conversation blueprints that contain zero real data), evaluates output against 9 hard-gated quality metrics, and anchors every evaluation on Polygon PoS so auditors can independently verify results without trusting anyone.

Real Conversations → PII Scan → Parse & Validate → 9-Metric Quality Gate
  → Synthetic Generation → Re-evaluate → Blockchain Anchor → LoRA/QLoRA Fine-tune

One pipeline. Zero PII. On-chain proof.


Quick Start

Docker (recommended)

git clone https://github.com/uncase-ai/uncase.git && cd uncase
cp .env.example .env
# Edit .env → set at least one LLM key (ANTHROPIC_API_KEY, GEMINI_API_KEY, etc.)

docker-compose up -d          # API + PostgreSQL + Redis + Dashboard
docker-compose ps             # Verify all containers are running

Optional profiles:

docker-compose --profile ml up -d              # + MLflow (port 5000)
docker-compose --profile gpu up -d             # + NVIDIA GPU support (port 8001)
docker-compose --profile observability up -d   # + Prometheus (9090) + Grafana (3001)

Local development (uv)

Requires Python 3.11+ and PostgreSQL 16+.

git clone https://github.com/uncase-ai/uncase.git && cd uncase
uv sync --extra dev
cp .env.example .env          # Set DATABASE_URL + at least one LLM key
make migrate                  # Run database migrations
make api                      # Start API on port 8000

# Optional: start the dashboard
cd frontend && npm install && npm run dev

pip (library only)

pip install uncase                       # Core
pip install "uncase[ml]"                 # + transformers, peft, trl, torch
pip install "uncase[privacy]"            # + SpaCy, Presidio NER
pip install "uncase[blockchain]"         # + web3, Polygon PoS anchoring
pip install "uncase[all]"               # Everything

Key Features

Feature What it does
Privacy Privacy Interceptor + PromptShield Zero PII tolerance — Presidio NER, SpaCy, 9 regex patterns, adversarial scan
Quality 9 hard-gated metrics ROUGE-L, Factual Fidelity, TTR, Coherence, Semantic Fidelity, Embedding Drift, Tool Validity, Privacy, Memorization
Verification Blockchain anchoring Every evaluation SHA-256 hashed → Merkle tree → Polygon PoS. Verifiable via Polygonscan
Generation LLM Gateway Route to Claude, GPT-4, Gemini, Groq, Ollama, vLLM — privacy-intercepted
Import Connector Hub WhatsApp exports, webhooks, HuggingFace, CRM, custom connectors
Export 10+ formats ChatML, Llama, Mistral, Qwen, Nemotron, Alpaca, ShareGPT — all with tool-use
Tools 30 domain tools 5 per industry across 6 verticals, with simulation and training data
Sandbox E2B cloud Parallel generation in isolated MicroVMs, instant demo containers
SDK Python SDK Sync + async programmatic access via httpx
MCP MCP server Expose tools to Claude Code and any MCP-compatible agent
Plugins Marketplace 6 official plugins (one per industry), extensible architecture
Enterprise Auth + observability JWT, audit logging, cost tracking, rate limiting, Prometheus + Grafana

Supported Industries

Domain Namespace Tools Plugin
Automotive Sales automotive.sales 5 automotive
Medical Consultation medical.consultation 5 medical
Legal Advisory legal.advisory 5 legal
Financial Advisory finance.advisory 5 finance
Industrial Support industrial.support 5 industrial
Education Tutoring education.tutoring 5 education

Each includes specialized seed templates, quality thresholds, compliance rules, and built-in tools.


Architecture (SCSF)

Layer Purpose
0 — Privacy PII detection + PromptShield adversarial scan (audit/warn/block)
1 — Parser Multi-format parsing (CSV, JSONL), auto-detect OpenAI/ShareGPT/UNCASE formats
2 — Evaluator 9 quality metrics, composite scoring, blockchain anchoring
3 — Generator LLM-powered parallel generation with tool-augmented conversations
4 — LoRA Pipeline LoRA/QLoRA fine-tuning with DP-SGD (epsilon ≤ 8.0)

106+ API endpoints across 25 routers. 1,590+ tests at 73% coverage.

See docs/architecture.md for the full system diagram and database schema.


Documentation

Document Description
Architecture System diagram, SCSF layers, database schema
API Reference Complete endpoint map with examples
Features Detailed feature docs with usage examples
Configuration Environment variables, Docker services, pip extras
Development Setup, testing, CLI, contributing guide

Contributing

git checkout -b feat/my-feature
make check                            # lint + typecheck + tests
uv run pytest tests/privacy/          # mandatory before PR

See docs/development.md for the full guide.


License

BSD 3-Clause — Free for commercial and non-commercial use.

UNCASE — Because the best training data is data that never existed.