Skip to content

bobmatnyc/edgar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

161 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ EDGAR β†’ Extract & Transform Platform

Python 3.8+ License: MIT Status: Active Development

General-purpose extract & transform platform - transforming EDGAR into example-driven data extraction for any source.

Last updated: November 30, 2025

πŸ†• What's New

November 30, 2025 - T6: IDataExtractor Interface Definition βœ…

  • Defined formal IDataExtractor interface for AI-generated extractors (135 LOC)
    • IDataExtractor ABC - Abstract interface with single extract() method
    • Exported from extract_transform_platform.core module
    • Updated PM mode prompt to use platform import
    • Comprehensive documentation with usage examples
  • Package: extract_transform_platform.core
  • Status: Interface validation verified, zero breaking changes
  • Ticket: 1M-381

November 30, 2025 - T5: Sonnet 4.5 AI Integration Migration βœ…

  • Migrated AI-powered code generation to platform (1,224 LOC)
    • Sonnet45Agent (753 LOC) - Dual-mode agent (PM + Coder) for code generation
    • OpenRouterClient (471 LOC) - OpenRouter API client with retry logic
  • Package: extract_transform_platform.ai
  • Status: 100% code reuse, zero breaking changes
  • Ticket: 1M-380

November 30, 2025 - T4: Code Generation Pipeline Migration βœ…

  • Migrated 3 code generation services to platform (1,266 LOC)
    • PromptGenerator (436 LOC) - Generate Sonnet 4.5 prompts from patterns
    • CodeGeneratorService (590 LOC) - End-to-end code generation pipeline
    • ConstraintEnforcer (240 LOC) - AST-based code validation
  • Package: extract_transform_platform.services.codegen
  • Status: 100% code reuse, zero breaking changes
  • Ticket: 1M-379

November 29, 2025 - T3: Schema Services Migration βœ…

  • Migrated 3 schema analysis services to platform (1,645 LOC)
    • PatternModels (530 LOC) - 14 transformation pattern types
    • SchemaAnalyzer (436 LOC) - Schema inference and comparison
    • ExampleParser (679 LOC) - Pattern extraction from examples
  • Package: extract_transform_platform.models.patterns & extract_transform_platform.services.analysis
  • Status: 100% code reuse, 60/60 tests passing
  • Ticket: 1M-378

November 28, 2025 - T2: Data Source Abstractions Migration βœ…

  • Migrated 4 data sources to platform (2,180 LOC)
    • ExcelDataSource - Excel file parsing with pandas
    • PDFDataSource - PDF table extraction with pdfplumber
    • CSVDataSource - CSV/JSON/YAML file parsing
    • APIDataSource - REST API integration
  • Package: extract_transform_platform.data_sources
  • Status: 120/120 tests passing, zero breaking changes
  • Ticket: 1M-377

November 27, 2025 - Phase 2: Platform Architecture Launch πŸš€


🎯 What Makes This Revolutionary

πŸ€– Conversational AI Interface

  • Natural language queries: "Analyze Apple's executive compensation"
  • Context-aware responses with real-time codebase analysis
  • Automatic LLM detection with graceful fallback to traditional CLI

πŸ”„ Self-Improving Code

  • LLM Supervisor: Professional quality assurance with Grok 4.1 Fast
  • LLM Engineer: Real code improvements with Claude 3.5 Sonnet
  • Git-Safe Enhancement: Automatic checkpoints and branch management
  • Iterative Improvement: Multi-iteration enhancement process

⚑ Enterprise-Grade Process Control

  • Real-time Subprocess Monitoring: Line-by-line output streaming
  • Process Control: Timeout handling and termination capabilities
  • Automatic Fallback: Graceful degradation when subprocess unavailable
  • Enhanced Security: Process isolation and comprehensive validation

πŸ” Web Search Integration

  • OpenRouter Web Search: Real-time information access using OpenRouter standard
  • Validation Enhancement: Supervisor validation with current standards
  • Best Practices Research: Engineer improvements using latest practices
  • Contextual Search: Domain-specific query generation and analysis

πŸš€ Quick Start

1. Clone and Setup

git clone https://github.com/bobmatnyc/zach-edgar.git
cd zach-edgar
python3 setup_edgar_cli.py

2. Configure API Keys πŸ”’

# Copy the secure template
cp .env.template .env.local

# Get your API key from https://openrouter.ai/keys
# Edit .env.local and replace 'your_openrouter_api_key_here' with your actual key

πŸ”’ SECURITY NOTE: .env.local is gitignored to protect your API keys from accidental exposure.

3. Start the Revolutionary CLI

source venv/bin/activate

# Interactive mode (default)
python -m edgar_analyzer

# Bypass interactive, show CLI help
python -m edgar_analyzer --cli

# With web search capabilities (requires OpenRouter API key)
python -m edgar_analyzer --enable-web-search

# Specific modes
python -m edgar_analyzer --mode chatbot    # Force conversational mode
python -m edgar_analyzer --mode traditional # Force traditional CLI

πŸ’¬ Usage Examples

Conversational Interface

πŸ’¬ You: What is this application about?
πŸ€– AI: This is an intelligent EDGAR analysis system that extracts executive 
       compensation data from SEC filings using self-improving code patterns...

πŸ’¬ You: Analyze Apple's CEO compensation for 2023
πŸ€– AI: I'll extract Apple's executive compensation data. Let me fetch their 
       latest proxy filing and run the analysis...

CLI Usage Examples

# Default: Interactive conversational mode
python -m edgar_analyzer

# Bypass interactive, show CLI help
python -m edgar_analyzer --cli

# Extract specific company
python -m edgar_analyzer extract --cik 0000320193 --year 2023

# Extract with web search validation
python -m edgar_analyzer --enable-web-search extract --cik 0000320193 --year 2023

# Run system test
python -m edgar_analyzer test --companies 10

# Show application info
python -m edgar_analyzer trad-info

# Analyze codebase with web search
python -m edgar_analyzer --enable-web-search trad-analyze --query "compensation extraction"

πŸ—οΈ Architecture

Core Components

  • CLI Chatbot Controller: Conversational interface with dynamic context
  • Self-Improving Engine: LLM-powered code enhancement and QA
  • Subprocess Monitor: Real-time process control and output streaming
  • Context Injector: Dynamic codebase analysis and injection
  • Safety Validator: AST-based script validation and sandboxing

LLM Integration

  • Primary Model: Grok 4.1 Fast (OpenRouter)
  • Fallback Model: Claude 3.5 Sonnet (Anthropic)
  • Supervisor: Quality assurance and improvement detection
  • Engineer: Code modifications and enhancements

πŸ“Š System Validation

βœ… 50 Companies Test - PASSED

  • LLM QA Accuracy: 100% (correctly identified data quality issues)
  • Self-Improvement Active: Multiple iterations per company
  • Processing Rate: ~30 seconds per company
  • Success Rate: 100% completion with comprehensive analysis

βœ… Component Status

  • LLM Service: βœ… Grok 4.1 Fast + Claude 3.5 Sonnet
  • Context Injection: βœ… Real-time codebase analysis
  • Subprocess Monitoring: βœ… Process control and streaming
  • Safety Validation: βœ… AST parsing and sandboxing
  • Git Management: βœ… Automatic checkpoints and branches

πŸ›‘οΈ Safety & Security

Enterprise-Grade Safety

  • AST-based Script Validation: Prevents dangerous code execution
  • Sandboxed Environments: Isolated execution contexts
  • Process Monitoring: Real-time control and termination
  • Git Checkpoints: Automatic backup and recovery
  • Professional Error Handling: Comprehensive error recovery

Automatic Fallback Layers

  1. Primary: Conversational interface with LLM
  2. Secondary: Traditional CLI with full functionality
  3. Tertiary: Subprocess execution with monitoring
  4. Fallback: exec() mode with safety validation

πŸ“ Project Structure

edgar-cli/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ cli_chatbot/           # Conversational interface
β”‚   β”œβ”€β”€ edgar_analyzer/        # Core analysis engine
β”‚   └── self_improving_code/   # Self-improvement patterns
β”œβ”€β”€ tests/                     # Comprehensive test suite
β”œβ”€β”€ setup_edgar_cli.py        # One-command setup script
└── SYSTEM_READY_SUMMARY.md   # Complete system documentation

🎯 Key Features

🧠 Intelligent Context Awareness

  • Dynamic codebase analysis and injection
  • Real-time help and guidance
  • Context-aware responses and suggestions
  • Professional conversation flow management

πŸ”§ Professional Development Tools

  • Git-safe iterative enhancement
  • Automatic code quality assessment
  • Real-time improvement suggestions
  • Professional validation and testing

⚑ Performance & Reliability

  • Subprocess monitoring with timeout protection
  • Automatic service detection and fallback
  • Cross-platform compatibility
  • Enterprise-grade error handling

πŸ“š Documentation

πŸ“– Complete Documentation - Comprehensive documentation hub

Quick Links

🀝 Contributing

This is a production-ready system demonstrating revolutionary CLI interface concepts. The codebase showcases:

  • Self-improving code patterns with LLM integration
  • Conversational AI interfaces for command-line tools
  • Enterprise-grade process monitoring and control
  • Professional safety and validation systems

πŸ”’ Security Requirements

  • Follow Security Guidelines for API key management
  • Use Code Governance standards for all contributions
  • Never commit API keys or sensitive configuration
  • Use .env.local for local development (gitignored)

πŸ“„ License

MIT License - see LICENSE file for details.


πŸŽ‰ Welcome to the Future of CLI Interfaces!

Experience the world's first self-improving conversational CLI:

source venv/bin/activate

# Start interactive mode (default)
python -m edgar_analyzer

# Bypass interactive, show CLI help
python -m edgar_analyzer --cli

# With web search capabilities
python -m edgar_analyzer --enable-web-search

πŸ—οΈ Project Organization

πŸ“‹ Project Overview - Complete project structure and organization

The project is now cleanly organized with:

  • πŸ“š Documentation - Comprehensive guides and references
  • πŸ§ͺ Tests - Complete test suite and validation
  • πŸ”§ Source Code - Clean, modular implementation
  • βš™οΈ Configuration - Setup scripts and environment files

Revolutionary. Intelligent. Production-Ready. πŸš€

About

SEC EDGAR Executive Compensation Analysis Platform - Extract, transform, and analyze Fortune 100 executive compensation data from DEF 14A proxy filings using LLM-powered extraction

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors