General-purpose extract & transform platform - transforming EDGAR into example-driven data extraction for any source.
Last updated: November 30, 2025
- Defined formal IDataExtractor interface for AI-generated extractors (135 LOC)
IDataExtractorABC - Abstract interface with single extract() method- Exported from
extract_transform_platform.coremodule - Updated PM mode prompt to use platform import
- Comprehensive documentation with usage examples
- Package:
extract_transform_platform.core - Status: Interface validation verified, zero breaking changes
- Ticket: 1M-381
- Migrated AI-powered code generation to platform (1,224 LOC)
Sonnet45Agent(753 LOC) - Dual-mode agent (PM + Coder) for code generationOpenRouterClient(471 LOC) - OpenRouter API client with retry logic
- Package:
extract_transform_platform.ai - Status: 100% code reuse, zero breaking changes
- Ticket: 1M-380
- Migrated 3 code generation services to platform (1,266 LOC)
PromptGenerator(436 LOC) - Generate Sonnet 4.5 prompts from patternsCodeGeneratorService(590 LOC) - End-to-end code generation pipelineConstraintEnforcer(240 LOC) - AST-based code validation
- Package:
extract_transform_platform.services.codegen - Status: 100% code reuse, zero breaking changes
- Ticket: 1M-379
- Migrated 3 schema analysis services to platform (1,645 LOC)
PatternModels(530 LOC) - 14 transformation pattern typesSchemaAnalyzer(436 LOC) - Schema inference and comparisonExampleParser(679 LOC) - Pattern extraction from examples
- Package:
extract_transform_platform.models.patterns&extract_transform_platform.services.analysis - Status: 100% code reuse, 60/60 tests passing
- Ticket: 1M-378
- Migrated 4 data sources to platform (2,180 LOC)
ExcelDataSource- Excel file parsing with pandasPDFDataSource- PDF table extraction with pdfplumberCSVDataSource- CSV/JSON/YAML file parsingAPIDataSource- REST API integration
- Package:
extract_transform_platform.data_sources - Status: 120/120 tests passing, zero breaking changes
- Ticket: 1M-377
- Created
extract_transform_platformpackage structure - Established core abstractions (
IDataSource,BaseDataSource) - Set up generic data models (no EDGAR dependencies)
- Epic: EDGAR β General-Purpose Extract & Transform Platform
- Natural language queries: "Analyze Apple's executive compensation"
- Context-aware responses with real-time codebase analysis
- Automatic LLM detection with graceful fallback to traditional CLI
- LLM Supervisor: Professional quality assurance with Grok 4.1 Fast
- LLM Engineer: Real code improvements with Claude 3.5 Sonnet
- Git-Safe Enhancement: Automatic checkpoints and branch management
- Iterative Improvement: Multi-iteration enhancement process
- Real-time Subprocess Monitoring: Line-by-line output streaming
- Process Control: Timeout handling and termination capabilities
- Automatic Fallback: Graceful degradation when subprocess unavailable
- Enhanced Security: Process isolation and comprehensive validation
- OpenRouter Web Search: Real-time information access using OpenRouter standard
- Validation Enhancement: Supervisor validation with current standards
- Best Practices Research: Engineer improvements using latest practices
- Contextual Search: Domain-specific query generation and analysis
git clone https://github.com/bobmatnyc/zach-edgar.git
cd zach-edgar
python3 setup_edgar_cli.py# Copy the secure template
cp .env.template .env.local
# Get your API key from https://openrouter.ai/keys
# Edit .env.local and replace 'your_openrouter_api_key_here' with your actual keyπ SECURITY NOTE: .env.local is gitignored to protect your API keys from accidental exposure.
source venv/bin/activate
# Interactive mode (default)
python -m edgar_analyzer
# Bypass interactive, show CLI help
python -m edgar_analyzer --cli
# With web search capabilities (requires OpenRouter API key)
python -m edgar_analyzer --enable-web-search
# Specific modes
python -m edgar_analyzer --mode chatbot # Force conversational mode
python -m edgar_analyzer --mode traditional # Force traditional CLI㪠You: What is this application about?
π€ AI: This is an intelligent EDGAR analysis system that extracts executive
compensation data from SEC filings using self-improving code patterns...
π¬ You: Analyze Apple's CEO compensation for 2023
π€ AI: I'll extract Apple's executive compensation data. Let me fetch their
latest proxy filing and run the analysis...# Default: Interactive conversational mode
python -m edgar_analyzer
# Bypass interactive, show CLI help
python -m edgar_analyzer --cli
# Extract specific company
python -m edgar_analyzer extract --cik 0000320193 --year 2023
# Extract with web search validation
python -m edgar_analyzer --enable-web-search extract --cik 0000320193 --year 2023
# Run system test
python -m edgar_analyzer test --companies 10
# Show application info
python -m edgar_analyzer trad-info
# Analyze codebase with web search
python -m edgar_analyzer --enable-web-search trad-analyze --query "compensation extraction"- CLI Chatbot Controller: Conversational interface with dynamic context
- Self-Improving Engine: LLM-powered code enhancement and QA
- Subprocess Monitor: Real-time process control and output streaming
- Context Injector: Dynamic codebase analysis and injection
- Safety Validator: AST-based script validation and sandboxing
- Primary Model: Grok 4.1 Fast (OpenRouter)
- Fallback Model: Claude 3.5 Sonnet (Anthropic)
- Supervisor: Quality assurance and improvement detection
- Engineer: Code modifications and enhancements
- LLM QA Accuracy: 100% (correctly identified data quality issues)
- Self-Improvement Active: Multiple iterations per company
- Processing Rate: ~30 seconds per company
- Success Rate: 100% completion with comprehensive analysis
- LLM Service: β Grok 4.1 Fast + Claude 3.5 Sonnet
- Context Injection: β Real-time codebase analysis
- Subprocess Monitoring: β Process control and streaming
- Safety Validation: β AST parsing and sandboxing
- Git Management: β Automatic checkpoints and branches
- AST-based Script Validation: Prevents dangerous code execution
- Sandboxed Environments: Isolated execution contexts
- Process Monitoring: Real-time control and termination
- Git Checkpoints: Automatic backup and recovery
- Professional Error Handling: Comprehensive error recovery
- Primary: Conversational interface with LLM
- Secondary: Traditional CLI with full functionality
- Tertiary: Subprocess execution with monitoring
- Fallback: exec() mode with safety validation
edgar-cli/
βββ src/
β βββ cli_chatbot/ # Conversational interface
β βββ edgar_analyzer/ # Core analysis engine
β βββ self_improving_code/ # Self-improvement patterns
βββ tests/ # Comprehensive test suite
βββ setup_edgar_cli.py # One-command setup script
βββ SYSTEM_READY_SUMMARY.md # Complete system documentation
- Dynamic codebase analysis and injection
- Real-time help and guidance
- Context-aware responses and suggestions
- Professional conversation flow management
- Git-safe iterative enhancement
- Automatic code quality assessment
- Real-time improvement suggestions
- Professional validation and testing
- Subprocess monitoring with timeout protection
- Automatic service detection and fallback
- Cross-platform compatibility
- Enterprise-grade error handling
π Complete Documentation - Comprehensive documentation hub
- System Overview - Complete system capabilities
- Quick Start Guide - Get started in 5 minutes
- CLI Usage Guide - Master the conversational interface
- Web Search Guide - Real-time information access
- Security Guidelines - Enterprise security practices
- API Reference - Technical documentation
- Architecture - System design and patterns
This is a production-ready system demonstrating revolutionary CLI interface concepts. The codebase showcases:
- Self-improving code patterns with LLM integration
- Conversational AI interfaces for command-line tools
- Enterprise-grade process monitoring and control
- Professional safety and validation systems
- Follow Security Guidelines for API key management
- Use Code Governance standards for all contributions
- Never commit API keys or sensitive configuration
- Use
.env.localfor local development (gitignored)
MIT License - see LICENSE file for details.
Experience the world's first self-improving conversational CLI:
source venv/bin/activate
# Start interactive mode (default)
python -m edgar_analyzer
# Bypass interactive, show CLI help
python -m edgar_analyzer --cli
# With web search capabilities
python -m edgar_analyzer --enable-web-searchπ Project Overview - Complete project structure and organization
The project is now cleanly organized with:
- π Documentation - Comprehensive guides and references
- π§ͺ Tests - Complete test suite and validation
- π§ Source Code - Clean, modular implementation
- βοΈ Configuration - Setup scripts and environment files
Revolutionary. Intelligent. Production-Ready. π