Python Code Flow Analysis Tool - Static analysis for control flow graphs (CFG), data flow graphs (DFG), and call graph extraction.
For large projects (>1000 functions), use Fast Mode:
# Ultra-fast analysis (5-10x faster)
code2flow /path/to/project --fast
# Custom performance settings
code2flow /path/to/project \
--parallel-workers 8 \
--max-depth 3 \
--skip-data-flow \
--cache-dir ./.cache| Technique | Speedup | Use Case |
|---|---|---|
--fast mode |
5-10x | Initial exploration |
| Parallel workers | 2-4x | Multi-core machines |
| Caching | 3-5x | Repeated analysis |
| Depth limiting | 2-3x | Large codebases |
| Skip private methods | 1.5-2x | Public API analysis |
| Project Size | Functions | Time (fast) | Time (full) |
|---|---|---|---|
| Small (<100) | ~50 | 0.5s | 2s |
| Medium (1K) | ~500 | 3s | 15s |
| Large (10K) | ~2000 | 15s | 120s |
- Control Flow Graph (CFG): Extract execution paths from Python AST
- Data Flow Graph (DFG): Track variable definitions and dependencies
- Call Graph Analysis: Map function calls and dependencies
- Pattern Detection: Identify design patterns (state machines, factories, recursion)
- Compact Output: Deduplicated flow diagrams with pattern recognition
- Multiple Output Formats: YAML, JSON, Mermaid diagrams, PNG visualizations
- LLM-Ready Output: Generate prompts for reverse engineering
# Install from source
pip install -e .
# Or with development dependencies
pip install -e ".[dev]"# Analyze a Python project
code2flow /path/to/project
# With verbose output
code2flow /path/to/project -v
# Specify output directory and formats
code2flow /path/to/project -o ./analysis --format yaml,json,mermaid,png
# Use different analysis modes
code2flow /path/to/project -m static # Fast static analysis only
code2flow /path/to/project -m hybrid # Combined analysis (default)code2flow /path/to/project# Static analysis only (fastest)
code2flow /path/to/project -m static
# Dynamic analysis with tracing
code2flow /path/to/project -m dynamic
# Hybrid analysis (recommended)
code2flow /path/to/project -m hybrid
# Behavioral pattern focus
code2flow /path/to/project -m behavioral
# Reverse engineering ready
code2flow /path/to/project -m reversecode2flow /path/to/project -o my_analysis| File | Description |
|---|---|
analysis.yaml |
Complete structured analysis data |
analysis.json |
JSON format for programmatic use |
flow.mmd |
Full Mermaid flowchart (all nodes) |
compact_flow.mmd |
Compact flowchart - deduplicated nodes, grouped by function |
calls.mmd |
Function call graph |
cfg.png |
Control flow visualization |
call_graph.png |
Call graph visualization |
llm_prompt.md |
LLM-ready analysis summary |
The compact_flow.mmd file provides optimized output:
- Deduplication: Identical node patterns are merged (e.g.,
x = 1,x = 2→x = N) - Function Subgraphs: Nodes grouped by function in subgraphs
- Pattern Preservation: Control flow structure maintained while reducing file size
- Import Reuse: Common patterns linked rather than duplicated
Example compact output:
flowchart TD
%% Function subgraphs
subgraph F12345["process_data"]
N1["x = N"]
N2{"if x > 0"}
N3[/"return x"/]
end
%% Edges reference deduplicated nodes
N1 --> N2
N2 -->|"true"| N3
The generated prompt includes:
- System overview with metrics
- Call graph structure
- Behavioral patterns with confidence scores
- Data flow insights
- State machine definitions
- Reverse engineering guidelines
Each pattern includes:
- Name: Descriptive identifier
- Type: sequential, conditional, iterative, recursive, state_machine
- Entry/Exit points: Key functions
- Decision points: Conditional logic locations
- Data transformations: Variable dependencies
- Confidence: Pattern detection certainty
The analysis provides specific guidance for:
- Preserving call graph structure
- Implementing identified patterns
- Maintaining data dependencies
- Recreating state machines
- Preserving decision logic
Automatically identifies:
- State variables
- Transition methods
- Source and destination states
- State machine hierarchy
Maps:
- Variable dependencies
- Data transformations
- Information flow paths
- Side effects
When using dynamic mode:
- Function entry/exit timing
- Call stack reconstruction
- Exception tracking
- Performance profiling
The generated system_analysis_prompt.md is designed to be:
- Comprehensive: Contains all necessary system information
- Structured: Organized for easy parsing
- Actionable: Includes specific implementation guidance
- Language-agnostic: Describes behavior, not implementation
Example usage with an LLM:
"Based on the system analysis provided, implement this system in Go,
preserving all behavioral patterns and data flow characteristics."
- Dynamic analysis requires test files
- Complex inheritance hierarchies may need manual review
- External library calls are treated as black boxes
- Runtime reflection and metaprogramming not fully captured
The analyzer is designed to be extensible. Key areas for enhancement:
- Additional pattern types
- Language-specific optimizations
- Improved visualization
- Real-time analysis mode
Apache License 2.0 - see LICENSE for details.
Created by Tom Sapletta - tom@sapletta.com