Skip to content

Skelf-Research/checkstream

Repository files navigation

CheckStream

High-Performance, Real-time Safety and Compliance Layer for Streaming LLMs

CheckStream is a production-ready Rust guardrail platform that enforces safety, security, and regulatory compliance on LLM outputs as tokens stream—with sub-10ms latency. Works with any LLM provider.

Build Status Tests License

Current Status

Version: 0.1.0 Status: Core Complete - Production Ready for Testing

Component Status Details
Three-Phase Proxy Complete Ingress, Midstream, Egress pipelines
ML Classifiers Working DistilBERT sentiment from HuggingFace
Pattern Classifiers Complete PII, prompt injection, custom patterns
Policy Engine Complete Triggers, actions, composite rules
Action Executor Complete Stop, Redact, Log, Audit actions
Audit Trail Complete Hash-chained, tamper-proof logging
Telemetry Complete Prometheus metrics, structured logging
Security Hardening Complete SSRF protection, timing-safe auth, security headers
Tests 122 passing Unit, integration, ML classifier tests

Quick Start

Build and Run

# Clone and build
git clone https://github.com/Skelf-Research/checkstream.git
cd checkstream
cargo build --release --features ml-models

# Run the proxy
./target/release/checkstream-proxy \
  --backend https://api.openai.com/v1 \
  --policy ./policies/default.yaml \
  --port 8080

Test ML Model (Live Demo)

# Run the sentiment classifier example
cargo run --example test_hf_model --features ml-models

# Output:
# Model loaded successfully!
# "I love this movie!" → positive (1.000)
# "This is terrible."  → negative (1.000)

Run Tests

cargo test --workspace  # 122 tests pass

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Your Application                          │
│            (OpenAI SDK, Anthropic SDK, etc.)                │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   CheckStream Proxy                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   Phase 1    │  │   Phase 2    │  │   Phase 3    │      │
│  │   INGRESS    │→ │  MIDSTREAM   │→ │   EGRESS     │      │
│  │  Validate    │  │  Stream      │  │  Compliance  │      │
│  │  Prompt      │  │  Checks      │  │  & Audit     │      │
│  │  (~3ms)      │  │ (~2ms/chunk) │  │  (async)     │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              Classifier Pipeline                      │   │
│  │  Pattern (Tier A) → ML Models (Tier B) → Policy      │   │
│  └──────────────────────────────────────────────────────┘   │
└──────────────────────────┬──────────────────────────────────┘
                           │
           ┌───────────────┼───────────────┐
           ▼               ▼               ▼
      ┌────────┐     ┌─────────┐     ┌─────────┐
      │ OpenAI │     │ Claude  │     │  vLLM   │
      └────────┘     └─────────┘     └─────────┘

Features

ML Classifiers (Working)

Load models from HuggingFace with zero code:

# models/registry.yaml
models:
  sentiment:
    source:
      type: huggingface
      repo: "distilbert-base-uncased-finetuned-sst-2-english"
    architecture:
      type: distil-bert-sequence-classification
      num_labels: 2
      labels: ["negative", "positive"]
    inference:
      device: "cpu"  # or "cuda" for GPU
      max_length: 512

Performance (CPU):

  • DistilBERT: ~30-50ms per inference
  • GPU (estimated): 2-10ms per inference

Policy Engine

Define rules with triggers and actions:

# policies/default.yaml
name: safety-policy
rules:
  - name: block-injection
    trigger:
      type: pattern
      pattern: "ignore previous instructions"
      case_insensitive: true
    actions:
      - type: stop
        message: "Request blocked"
        status_code: 403

  - name: toxicity-check
    trigger:
      type: classifier
      classifier: toxicity
      threshold: 0.8
    actions:
      - type: audit
        category: safety
        severity: high

Three-Phase Pipeline

Phase Purpose Latency
Ingress Validate prompts before LLM ~3ms
Midstream Check streaming chunks ~2ms/chunk
Egress Final compliance check async

Health Endpoints

GET /health        # Basic health check
GET /health/live   # Kubernetes liveness probe
GET /health/ready  # Kubernetes readiness probe
GET /metrics       # Prometheus metrics (requires admin key)
GET /audit         # Query audit trail (requires admin key)

Security

CheckStream is built with security as a core principle:

Built-in Protections

Feature Description
SSRF Protection Backend URLs validated; internal IPs and cloud metadata endpoints blocked
Timing-Safe Auth Constant-time comparison prevents API key extraction via timing attacks
Request Limits 10MB body size limit prevents memory exhaustion
Security Headers X-Content-Type-Options, X-Frame-Options, CSP on all responses
Secure IDs Cryptographic UUID v4 for request/event IDs (unpredictable)
Config Limits 1MB YAML file limit prevents billion-laughs attacks
Memory Safety Written in Rust - no buffer overflows or use-after-free

Admin Authentication

Protect admin endpoints with an API key:

export CHECKSTREAM_ADMIN_API_KEY="$(openssl rand -hex 32)"

# Access protected endpoints
curl -H "X-Checkstream-Admin-Key: $CHECKSTREAM_ADMIN_API_KEY" \
  http://localhost:8080/metrics

Development Mode

For local development with localhost backends:

export CHECKSTREAM_DEV_MODE=1  # Never use in production!

See Security & Privacy for complete security documentation.

Project Structure

checkstream/
├── crates/
│   ├── checkstream-core/        # Types, errors, token buffer
│   ├── checkstream-classifiers/ # ML models, patterns, pipeline
│   ├── checkstream-policy/      # Policy engine, triggers, actions
│   ├── checkstream-proxy/       # HTTP proxy server
│   └── checkstream-telemetry/   # Audit trail, metrics
├── examples/
│   ├── test_hf_model.rs         # Live ML model demo
│   └── full_dynamic_pipeline.rs # Complete pipeline example
├── policies/                    # Policy YAML files
├── models/                      # Model registry configs
└── docs/                        # Documentation

Documentation

Document Description
Architecture Technical design
Getting Started Setup guide
Model Loading ML model configuration
Pipeline Configuration Classifier pipelines
Policy Engine Policy-as-code reference
API Reference REST API docs
FCA Example Financial compliance example
Deployment Modes Proxy vs Sidecar
Security & Privacy Data handling
Regulatory Compliance FCA, FINRA, GDPR

Use Cases

  • Financial Services: FCA Consumer Duty compliance, advice boundary detection
  • Healthcare: HIPAA compliance, medical disclaimer injection
  • Security: Prompt injection defense, PII protection, data exfiltration prevention
  • Content Moderation: Real-time toxicity filtering

Performance

Component Target Actual
Pattern classifier <2ms ~0.5ms
ML classifier (CPU) <50ms ~30-50ms
ML classifier (GPU) <10ms ~2-10ms (est.)
Policy evaluation <1ms ~0.2ms
Total overhead <10ms ~5-8ms (patterns only)

Contributing

See CONTRIBUTING.md for guidelines.

License

Apache 2.0 - See LICENSE

Support


Built for trust at the speed of generation.

About

High-Performance, Real-time Safety and Compliance Layer for Streaming LLMs

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors