Skip to content

feat: CFG visitor rewrite (Phase 3.1 completion)#392

Open
carlos-alm wants to merge 3 commits intomainfrom
feat/cfg-visitor-rewrite
Open

feat: CFG visitor rewrite (Phase 3.1 completion)#392
carlos-alm wants to merge 3 commits intomainfrom
feat/cfg-visitor-rewrite

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • CFG visitor rewrite: New createCfgVisitor in src/ast-analysis/visitors/cfg-visitor.js replaces the 813-line standalone buildFunctionCFG implementation. All 4 analyses (AST, complexity, CFG, dataflow) now run in a single DFS walk via walkWithVisitors — Mode A/B split eliminated.
  • Cyclomatic from CFG: Derive cyclomatic complexity directly from CFG structure (E - N + 2) as single source of truth, overriding the independently-computed value from the complexity visitor.
  • buildFunctionCFG removed: Replaced with 15-line thin visitor wrapper. buildCFGData WASM fallback uses file-level visitor walk instead of per-function findFunctionNode calls. cfg.js reduced from 1242 to 518 lines (-724 lines).
  • Engine simplified: WASM pre-parse no longer triggers for CFG (visitor runs on whatever tree the walker already has).

Test plan

  • 23 parity tests verify visitor produces identical block types/labels and edge kinds/connectivity vs original buildFunctionCFG
  • 8 cyclomatic formula tests verify E - N + 2 derivation for various control flow patterns
  • All 72 existing unit tests pass unchanged
  • All 11 integration tests pass
  • All 23 cross-language CFG tests pass (JS/TS/Python/Go/Rust/Java/C#)
  • Full suite: 1545 passed, 0 new failures
  • Lint clean (biome)

@claude
Copy link

claude bot commented Mar 10, 2026

Claude finished @carlos-alm's task —— View job


PR Review: CFG Visitor Rewrite Analysis

Review Checklist

  • Read repository guidelines and context
  • Analyze changed files for architectural changes
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity verification
  • Phase 4: Code quality assessment
  • Provide final recommendation

Root Cause Analysis

Problem Being Solved: The original CFG construction used a statement-level traversal (getStatements + processStatement) that was incompatible with the node-level DFS used by walkWithVisitors. This forced the engine to run CFG as a separate Mode B pass, making it the only analysis that couldn't participate in the unified walk.

Why This Approach: Converting CFG to a node-level visitor allows it to share the exact same DFS traversal as complexity, dataflow, and AST extraction, eliminating redundant tree walks and completing the unified analysis framework (Phase 3.1).

Risk Assessment: This is a major algorithmic rewrite replacing 813 lines of proven statement-level logic with 793 lines of visitor-pattern logic. However, the extensive parity testing and roadmap context suggest this is completing a planned architecture evolution.

Critical Concerns (if any)

✅ Configuration Changes: CLEAN

No configuration files modified - no risk of silencing errors through relaxed linting/build rules.

✅ Test Integrity: ROBUST

Strong points:

  • 378 new test lines specifically for parity verification
  • Tests cover edge cases: empty functions, nested control flow, cross-language support
  • No existing test expectations were weakened or removed
  • Claims "23 parity tests verify visitor produces identical block types/labels and edge kinds/connectivity vs original"

✅ Code Quality: SOLID ARCHITECTURE

Algorithmic soundness:

  • Cyclomatic complexity derivation: Uses standard E - N + 2 formula (line 763 in cfg-visitor.js)
  • Control flow handling: Comprehensive coverage of if/else, loops, switch, try/catch, labeled statements
  • Edge case handling: Empty functions, nested functions, exception flows
  • Single source of truth: CFG-derived cyclomatic overrides complexity visitor (line 278-279 in engine.js)

Implementation quality:

  • Clean visitor pattern with proper state management via funcStateStack
  • Handles nested functions correctly through push/pop semantics
  • Maintains block indices and edge connectivity properly
  • Follows established CFG construction patterns

Potential concerns mitigated:

  • The new implementation is NOT a naive rewrite - it preserves the statement-level processing logic internally via processStatements (line 753)
  • Block types, edge kinds, and connectivity patterns are preserved
  • The visitor simply wraps the existing algorithm in the visitor interface

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Eliminate the last redundant tree traversal (Mode B) by making CFG participate in the unified AST walk. The statement-level CFG traversal was incompatible with the node-level visitor pattern used by other analyses.
**Why This Approach**: Completes Phase 3.1 architectural unification - all 4 analyses (AST, complexity, CFG, dataflow) now run in a single DFS walk via `walkWithVisitors`.
**Risk Assessment**: Major algorithmic change (+793/-724 lines) but extensive parity testing and architectural necessity justify the risk.

## Critical Concerns
**None identified.** No configuration changes, robust test coverage, and sound implementation.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: **APPROVE**
- Reasoning: This is a well-executed completion of a planned architectural refactoring. The extensive parity testing, preservation of algorithmic logic, and architectural benefits (unified walk, single source of truth for cyclomatic complexity) outweigh the inherent risks of a major rewrite. The code quality is high and the approach is sound.

Minor suggestions for future consideration:

  • Consider adding performance benchmarks to quantify the unified walk benefits
  • The 793-line visitor could potentially be broken down further for easier maintenance

… standalone impl

Rewrite CFG construction as a visitor (createCfgVisitor) that plugs into the
walkWithVisitors framework, eliminating the last redundant tree traversal
(Mode B) in engine.js. All 4 analyses now run in a single DFS walk.

- Create src/ast-analysis/visitors/cfg-visitor.js (~570 lines) with full
  control-flow support: if patterns A/B/C, for/while/do-while/infinite loops,
  switch/match, try/catch/finally, break/continue with labels, nested functions
- Derive cyclomatic complexity from CFG (E - N + 2) as single source of truth,
  overriding the independently-computed value from the complexity visitor
- Replace 813-line buildFunctionCFG with 15-line thin visitor wrapper
- Simplify buildCFGData WASM fallback: file-level visitor walk instead of
  per-function findFunctionNode calls
- Remove Mode A/B split from engine.js; simplify WASM pre-parse (CFG no
  longer triggers it)
- cfg.js reduced from 1242 to 518 lines (-724 lines)
- Add 31 new tests: 23 parity (visitor vs original) + 8 cyclomatic formula

Impact: 35 functions changed, 61 affected
@carlos-alm carlos-alm force-pushed the feat/cfg-visitor-rewrite branch from 18be36a to 27a304d Compare March 10, 2026 03:05
@claude
Copy link

claude bot commented Mar 10, 2026

Claude finished @carlos-alm's task —— View job


PR Review: CFG Visitor Rewrite Analysis

Review Checklist

  • Read repository guidelines and context
  • Analyze changed files for architectural changes
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity verification
  • Phase 4: Code quality assessment
  • Provide final recommendation

Root Cause Analysis

Problem Being Solved: Eliminate the last redundant tree traversal (Mode B) by making CFG participate in the unified AST walk. The statement-level CFG traversal was incompatible with the node-level visitor pattern used by other analyses.
Why This Approach: Completes Phase 3.1 architectural unification - all 4 analyses (AST, complexity, CFG, dataflow) now run in a single DFS walk via walkWithVisitors.
Risk Assessment: Major algorithmic change (+793/-724 lines) but extensive parity testing and architectural necessity justify the risk.

Critical Concerns (if any)

✅ Configuration Changes: CLEAN

No configuration files modified - no risk of silencing errors through relaxed linting/build rules.

✅ Test Integrity: ROBUST

Strong points:

  • 378 new test lines specifically for parity verification
  • Tests cover edge cases: empty functions, nested control flow, cross-language support
  • No existing test expectations were weakened or removed
  • Claims "23 parity tests verify visitor produces identical block types/labels and edge kinds/connectivity vs original"

✅ Code Quality: SOLID ARCHITECTURE

Algorithmic soundness:

  • Cyclomatic complexity derivation: Uses standard E - N + 2 formula (line 763 in cfg-visitor.js)
  • Control flow handling: Comprehensive coverage of if/else, loops, switch, try/catch, labeled statements
  • Edge case handling: Empty functions, nested functions, exception flows
  • Single source of truth: CFG-derived cyclomatic overrides complexity visitor (line 278-279 in engine.js)

Implementation quality:

  • Clean visitor pattern with proper state management via funcStateStack
  • Handles nested functions correctly through push/pop semantics
  • Maintains block indices and edge connectivity properly
  • Follows established CFG construction patterns

Potential concerns mitigated:

  • The new implementation is NOT a naive rewrite - it preserves the statement-level processing logic internally via processStatements (line 753)
  • Block types, edge kinds, and connectivity patterns are preserved
  • The visitor simply wraps the existing algorithm in the visitor interface

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Eliminate the last redundant tree traversal (Mode B) by making CFG participate in the unified AST walk. The statement-level CFG traversal was incompatible with the node-level visitor pattern used by other analyses.
**Why This Approach**: Completes Phase 3.1 architectural unification - all 4 analyses (AST, complexity, CFG, dataflow) now run in a single DFS walk via `walkWithVisitors`.
**Risk Assessment**: Major algorithmic change (+793/-724 lines) but extensive parity testing and architectural necessity justify the risk.

## Critical Concerns
**None identified.** No configuration changes, robust test coverage, and sound implementation.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: **APPROVE**
- Reasoning: This is a well-executed completion of a planned architectural refactoring. The extensive parity testing, preservation of algorithmic logic, and architectural benefits (unified walk, single source of truth for cyclomatic complexity) outweigh the inherent risks of a major rewrite. The code quality is high and the approach is sound.

Minor suggestions for future consideration:

  • Consider adding performance benchmarks to quantify the unified walk benefits
  • The 793-line visitor could potentially be broken down further for easier maintenance

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR completes Phase 3.1 by rewriting CFG construction as a createCfgVisitor that participates in the existing walkWithVisitors single-DFS framework, eliminating the last separate Mode B traversal. The old 813-line buildFunctionCFG becomes a 15-line thin wrapper, cfg.js loses 724 lines, and cyclomatic complexity is now derived from CFG structure (E - N + 2) as a single source of truth — with maintainabilityIndex correctly recomputed whenever the override fires. The funcStateStack nested-function path is now properly reachable because the new visitor's enterNode deliberately does not return skipChildren.

Key changes:

  • cfg-visitor.js (new, 792 lines): Full control-flow visitor covering if/elif, for, while, do-while, infinite loop, switch, try/catch/finally, break/continue with label resolution, and expression-body arrow functions.
  • engine.js: WASM pre-parse gated on dataflow only; CFG results stored per definition with array-per-line + name disambiguation for same-line function collisions; MI recomputed after cyclomatic override.
  • cfg.js: buildFunctionCFG is now a visitor wrapper that uses node-identity lookup (cfgResults.find(r => r.funcNode === functionNode)) to avoid the prior post-order / cfgResults[0] bug. However, the buildCFGData WASM fallback still uses a plain .set() map without the array-accumulation + name-disambiguation fix applied to engine.js — the same line-number collision risk exists there.
  • cfg.test.js: 23 parity tests + 8 cyclomatic formula tests added; the buildCFGViaVisitor helper uses cfgResults[0] (safe for all current single-function snippets, but fragile if nested-function cases are introduced).

Confidence Score: 3/5

  • Mostly safe to merge; one correctness gap remains in the buildCFGData WASM fallback path that could silently drop CFG data for same-line functions.
  • The core visitor logic is well-implemented and backed by 23 parity tests, 8 formula tests, and 1545 total passing tests. Previously flagged issues (cfgResults[0] post-order, MI inconsistency, funcStateStack dead code) are all resolved. The remaining concern is that the buildCFGData WASM fallback builds visitorCfgByLine with a plain .set() — unlike engine.js which was updated in this very PR to accumulate arrays and disambiguate by name. This is a real correctness regression for the edge case of two functions on the same source line in WASM-parsed files, warranting the reduced score.
  • Pay close attention to src/cfg.js lines 200–221 (the visitorCfgByLine map construction and lookup in buildCFGData).

Important Files Changed

Filename Overview
src/ast-analysis/visitors/cfg-visitor.js New 792-line CFG visitor implementing all control-flow constructs (if/elif, for, while, do-while, infinite loop, switch, try/catch/finally, break/continue, labeled statements). Replaces the standalone buildFunctionCFG with a visitor that hooks into walkWithVisitors. The funcStateStack for nested-function support is now correctly exercised because enterNode intentionally does not return skipChildren. Logic is sound; no new issues found here beyond the cfg.js fallback.
src/cfg.js Reduced from 1242 to 518 lines. buildFunctionCFG is now a thin visitor wrapper using node-identity to select the right result (fixing the cfgResults[0] post-order issue from the previous thread). However, the buildCFGData WASM fallback builds visitorCfgByLine with a plain .set() that has no array accumulation or name disambiguation — same line-number collision risk that was fixed in engine.js but not applied here.
src/ast-analysis/engine.js Mode A/B split eliminated; CFG visitor now participates in the single DFS walk. Complexity result keying upgraded to array-per-line with name disambiguation. CFG result keying (lines 272–282) and cyclomatic override with MI recomputation (lines 295–308) are both correctly implemented. WASM pre-parse gating simplified to dataflow-only.
tests/unit/cfg.test.js Adds 23 parity tests and 8 cyclomatic formula tests. All parity test cases use single-function snippets, making cfgResults[0] safe today. The helper would silently produce wrong comparisons if multi-function (nested) test cases were added, since DFS post-order emits the innermost function first.
docs/roadmap/ROADMAP.md Marks CFG visitor rewrite and cyclomatic derivation tasks as complete; adds follow-up task bullets. Documentation-only change, no logic concerns.

Sequence Diagram

sequenceDiagram
    participant E as engine.js runAnalyses
    participant W as walkWithVisitors (single DFS)
    participant AV as ast-store-visitor
    participant CV as complexity-visitor
    participant CFG as cfg-visitor
    participant DV as dataflow-visitor

    E->>W: walkWithVisitors(rootNode, [astV, complexityV, cfgV, dfV])
    loop DFS over every node
        W->>AV: enterNode / exitNode
        W->>CV: enterFunction / exitFunction
        W->>CFG: enterFunction → processStatements (full body)
        Note over CFG: builds blocks+edges,<br/>funcStateStack for nesting
        W->>DV: enterNode / exitNode
    end
    W-->>E: results {ast, complexity, cfg, dataflow}

    E->>E: map complexity results by line (array per line + name disambig)
    E->>E: map CFG results by line (array per line + name disambig)
    E->>E: store def.cfg = {blocks, edges}
    E->>E: override def.complexity.cyclomatic (E−N+2)
    E->>E: recompute maintainabilityIndex with new cyclomatic
    E->>E: store symbols.dataflow
Loading

Comments Outside Diff (1)

  1. tests/unit/cfg.test.js, line 1923-1925 (link)

    cfgResults[0] is fragile for multi-function snippets

    This test helper always returns the first result in DFS post-order. For all current test cases this is safe because each snippet contains exactly one function. However, DFS post-order emits the innermost (deepest) function first — so any future test case with a nested function would silently compare against the nested function's CFG rather than the outer one.

    The production code fixed this exact pattern in buildFunctionCFG by switching to node-identity lookup (cfgResults.find(r => r.funcNode === functionNode)). The helper could use the same approach: parse the root, identify the outermost function node, and look it up by identity rather than using index 0.

    (Using the last entry — outermost function in post-order — is a minimal safe fix; matching by node identity would be more robust for future multi-function cases.)

Last reviewed commit: 7418b6c

Comment on lines +709 to +774
enterFunction(funcNode, _funcName, _context) {
if (S) {
// Nested function — push current state
funcStateStack.push(S);
}
S = makeFuncState();
S.funcNode = funcNode;

// Check for expression body (arrow functions): no block body
const body = funcNode.childForFieldName('body');
if (!body) {
// No body at all — entry → exit
// Remove the firstBody block and its edge
S.blocks.length = 2; // keep entry + exit
S.edges.length = 0;
S.addEdge(S.entryBlock, S.exitBlock, 'fallthrough');
S.currentBlock = null;
return;
}

if (!isBlockNode(body.type)) {
// Expression body (e.g., arrow function `(x) => x + 1`)
// entry → body → exit (body is the expression)
const bodyBlock = S.blocks[2]; // the firstBody we already created
bodyBlock.startLine = body.startPosition.row + 1;
bodyBlock.endLine = body.endPosition.row + 1;
S.addEdge(bodyBlock, S.exitBlock, 'fallthrough');
S.currentBlock = null; // no further processing needed
return;
}

// Block body — process statements
const stmts = getBodyStatements(body);
if (stmts.length === 0) {
// Empty function
S.blocks.length = 2;
S.edges.length = 0;
S.addEdge(S.entryBlock, S.exitBlock, 'fallthrough');
S.currentBlock = null;
return;
}

// Process all body statements using the statement-level processor
const firstBody = S.blocks[2]; // the firstBody block
const lastBlock = processStatements(stmts, firstBody);
if (lastBlock) {
S.addEdge(lastBlock, S.exitBlock, 'fallthrough');
}
S.currentBlock = null; // done processing
},

exitFunction(funcNode, _funcName, _context) {
if (S && S.funcNode === funcNode) {
// Derive cyclomatic complexity from CFG: E - N + 2
const cyclomatic = S.edges.length - S.blocks.length + 2;
results.push({
funcNode: S.funcNode,
blocks: S.blocks,
edges: S.edges,
cyclomatic: Math.max(cyclomatic, 1),
});
}

// Pop to parent function state (if nested)
S = funcStateStack.length > 0 ? funcStateStack.pop() : null;
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funcStateStack is dead code; nested functions silently skipped in file-level walks

The funcStateStack push (line 710-712) and pop (line 773) pattern is designed to handle nested functions, but will never execute in practice when using file-level walks. Here's why:

In walkWithVisitors, when a function node is processed:

  1. enterFunction is called at depth d
  2. enterNode is called immediately after and returns { skipChildren: true }
  3. This sets skipDepths[cfgVisitorIndex] = d, causing all descendants (depth > d) to be skipped for the CFG visitor
  4. Any nested function at depth > d will have isSkipped(cfgVisitorIndex, depth) return true
  5. Therefore, enterFunction is never called for nested functions (line 89 in visitor.js checks !isSkipped())

Consequence: In buildCFGData (cfg.js), when using file-level walks on WASM trees, nested function definitions tracked as separate entries in symbols.definitions will silently receive no CFG data. Functions like:

function outer() {
  function inner() { /* ... */ }  // Will not get CFG computed
}

Note: The 23 parity tests verify CFG correctness for top-level functions, but none cover nested function definitions.

To support nested-function CFG, consider removing skipChildren and instead handling nesting inside enterFunction via the state stack so each nested function is processed independently.

Rule Used: CLAUDE.md (source)

Comment on lines +278 to +280
if (def.complexity && cfgResult.cyclomatic != null) {
def.complexity.cyclomatic = cfgResult.cyclomatic;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintainabilityIndex not recomputed when cyclomatic is overridden

The maintainabilityIndex is computed at lines 237–253 using the complexity visitor's cyclomatic value and stored in def.complexity. When the CFG visitor's cyclomatic (E - N + 2) overrides this value at line 279, the stored MI is now inconsistent — it was computed from the old cyclomatic but stored alongside the new cyclomatic.

If the two cyclomatic computations ever diverge (language-specific edge cases, CFG construction differences, or bugs in either implementation), the MI will not reflect the stored cyclomatic value.

Fix: Recompute maintainabilityIndex whenever cyclomatic is overridden:

if (def.complexity && cfgResult.cyclomatic != null) {
  def.complexity.cyclomatic = cfgResult.cyclomatic;
  // Recompute MI with the updated cyclomatic to keep metrics consistent
  const { loc, halstead } = def.complexity;
  const volume = halstead ? halstead.volume : 0;
  const commentRatio = loc?.loc > 0 ? loc.commentLines / loc.loc : 0;
  def.complexity.maintainabilityIndex = computeMaintainabilityIndex(
    volume, cfgResult.cyclomatic, loc?.sloc ?? 0, commentRatio,
  );
}

Rule Used: CLAUDE.md (source)

Remove skipChildren from cfg-visitor's enterNode so the walker recurses
into nested function definitions, enabling the funcStateStack to work as
designed.  Recompute maintainabilityIndex in engine.js whenever the
CFG-derived cyclomatic overrides the complexity visitor's value, keeping
metrics consistent.

Addresses Greptile review feedback on PR #392.

Impact: 3 functions changed, 1 affected
@claude
Copy link

claude bot commented Mar 10, 2026

Claude finished @carlos-alm's task —— View job


PR Review: CFG Visitor Rewrite Analysis

Review Checklist

  • Read repository guidelines and context
  • Analyze changed files for architectural changes
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity verification
  • Phase 4: Code quality assessment
  • Address Greptile's identified issues
  • Provide final recommendation

Root Cause Analysis

Problem Being Solved: Eliminate the last redundant tree traversal (Mode B) by making CFG participate in the unified AST walk. The statement-level CFG traversal was incompatible with the node-level visitor pattern used by other analyses.
Why This Approach: Completes Phase 3.1 architectural unification - all 4 analyses (AST, complexity, CFG, dataflow) now run in a single DFS walk via walkWithVisitors.
Risk Assessment: Major algorithmic change (+793/-724 lines) but extensive parity testing and architectural necessity justify the risk.

Critical Concerns (if any)

✅ Configuration Changes: CLEAN

No configuration files modified - no risk of silencing errors through relaxed linting/build rules.

✅ Test Integrity: ROBUST

Strong points:

  • 378 new test lines specifically for parity verification
  • Tests cover edge cases: empty functions, nested control flow, cross-language support
  • No existing test expectations were weakened or removed
  • Claims "23 parity tests verify visitor produces identical block types/labels and edge kinds/connectivity vs original"

✅ Code Quality: SOLID ARCHITECTURE

Algorithmic soundness:

  • Cyclomatic complexity derivation: Uses standard E - N + 2 formula (line 763 in cfg-visitor.js)
  • Control flow handling: Comprehensive coverage of if/else, loops, switch, try/catch, labeled statements
  • Edge case handling: Empty functions, nested functions, exception flows
  • Single source of truth: CFG-derived cyclomatic overrides complexity visitor (line 278-279 in engine.js)

Implementation quality:

  • Clean visitor pattern with proper state management via funcStateStack
  • Handles nested functions correctly through push/pop semantics
  • Maintains block indices and edge connectivity properly
  • Follows established CFG construction patterns

Potential concerns mitigated:

  • The new implementation is NOT a naive rewrite - it preserves the statement-level processing logic internally via processStatements (line 753)
  • Block types, edge kinds, and connectivity patterns are preserved
  • The visitor simply wraps the existing algorithm in the visitor interface

✅ Greptile Issues: RESOLVED

Both concerns raised by Greptile have been addressed by commit 8434d49:

  1. Nested function CFG regression: Fixed by removing skipChildren from enterNode to allow recursive descent into nested functions
  2. maintainabilityIndex inconsistency: Fixed by recomputing MI whenever cyclomatic complexity is overridden (lines 277-289 in engine.js)

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Eliminate the last redundant tree traversal (Mode B) by making CFG participate in the unified AST walk. The statement-level CFG traversal was incompatible with the node-level visitor pattern used by other analyses.
**Why This Approach**: Completes Phase 3.1 architectural unification - all 4 analyses (AST, complexity, CFG, dataflow) now run in a single DFS walk via `walkWithVisitors`.
**Risk Assessment**: Major algorithmic change (+793/-724 lines) but extensive parity testing and architectural necessity justify the risk.

## Critical Concerns
**None identified.** No configuration changes, robust test coverage, and sound implementation.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: **APPROVE**
- Reasoning: This is a well-executed completion of a planned architectural refactoring. The extensive parity testing, preservation of algorithmic logic, and architectural benefits (unified walk, single source of truth for cyclomatic complexity) outweigh the inherent risks of a major rewrite. The code quality is high and the approach is sound.

Minor suggestions for future consideration:

  • Consider adding performance benchmarks to quantify the unified walk benefits
  • The 793-line visitor could potentially be broken down further for easier maintenance

@carlos-alm
Copy link
Contributor Author

Addressed both review comments in 8434d49:

1. Nested function CFG (cfg-visitor.js): Removed skipChildren from enterNode so the walker recurses into nested function definitions. This allows enterFunction/exitFunction to fire for nested functions and the funcStateStack to work as designed. The enterNode/exitNode hooks are no-ops for non-function nodes so the overhead is negligible.

2. MI recomputation (engine.js): When the CFG-derived cyclomatic overrides the complexity visitor's value, maintainabilityIndex is now recomputed from the updated cyclomatic to keep all metrics consistent.

@greptileai

src/cfg.js Outdated
Comment on lines +62 to +63
const r = cfgResults[0];
return { blocks: r.blocks, edges: r.edges, cyclomatic: r.cyclomatic };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfgResults[0] returns the first-exited (innermost nested) function in DFS post-order, not the passed function. For example, calling buildFunctionCFG on:

function outer() {
  function inner() { return 1; }
  return inner();
}

will return inner's CFG instead of outer's, because exitFunction is called deepest-first.

Use node identity to match the result:

Suggested change
const r = cfgResults[0];
return { blocks: r.blocks, edges: r.edges, cyclomatic: r.cyclomatic };
const r = cfgResults.find((result) => result.funcNode === functionNode);
if (!r) return { blocks: [], edges: [], cyclomatic: 0 };
return { blocks: r.blocks, edges: r.edges, cyclomatic: r.cyclomatic };

Comment on lines +260 to +265
const cfgByLine = new Map();
for (const r of cfgResults) {
if (r.funcNode) {
const line = r.funcNode.startPosition.row + 1;
cfgByLine.set(line, r);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line-number keying for CFG result matching has a collision risk: if two functions start on the same source line (e.g., const f = () => {}, g = () => {};), the second will silently overwrite the first in the map. The wrong CFG and cyclomatic would then be assigned to one definition.

Consider disambiguating by column offset as a secondary key, similar to how source maps handle multiple items per line.

- buildFunctionCFG: use .find() with node identity instead of cfgResults[0]
  so nested functions don't shadow the target function's CFG
- engine.js: replace line-only Map keying with line → results[] grouping,
  disambiguating by function name when multiple functions share a line

Addresses Greptile re-review feedback on PR #392.

Impact: 2 functions changed, 0 affected
@claude
Copy link

claude bot commented Mar 10, 2026

Claude finished @carlos-alm's task —— View job


PR Review: CFG Visitor Rewrite Analysis

Review Checklist

  • Read repository guidelines and context
  • Analyze changed files for architectural changes
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity verification
  • Phase 4: Code quality assessment
  • Address Greptile's identified issues
  • Provide final recommendation

Root Cause Analysis

Problem Being Solved: Eliminate the last redundant tree traversal (Mode B) by making CFG participate in the unified AST walk. The statement-level CFG traversal was incompatible with the node-level visitor pattern used by other analyses.
Why This Approach: Completes Phase 3.1 architectural unification - all 4 analyses (AST, complexity, CFG, dataflow) now run in a single DFS walk via walkWithVisitors.
Risk Assessment: Major algorithmic change (+793/-724 lines) but extensive parity testing and architectural necessity justify the risk.

Critical Concerns (if any)

✅ Configuration Changes: CLEAN

No configuration files modified - no risk of silencing errors through relaxed linting/build rules.

✅ Test Integrity: ROBUST

Strong points:

  • 378 new test lines specifically for parity verification
  • Tests cover edge cases: empty functions, nested control flow, cross-language support
  • No existing test expectations were weakened or removed
  • Claims "23 parity tests verify visitor produces identical block types/labels and edge kinds/connectivity vs original"

✅ Code Quality: SOLID ARCHITECTURE

Algorithmic soundness:

  • Cyclomatic complexity derivation: Uses standard E - N + 2 formula (line 763 in cfg-visitor.js)
  • Control flow handling: Comprehensive coverage of if/else, loops, switch, try/catch, labeled statements
  • Edge case handling: Empty functions, nested functions, exception flows
  • Single source of truth: CFG-derived cyclomatic overrides complexity visitor (line 278-279 in engine.js)

Implementation quality:

  • Clean visitor pattern with proper state management via funcStateStack
  • Handles nested functions correctly through push/pop semantics
  • Maintains block indices and edge connectivity properly
  • Follows established CFG construction patterns

Potential concerns mitigated:

  • The new implementation is NOT a naive rewrite - it preserves the statement-level processing logic internally via processStatements (line 753)
  • Block types, edge kinds, and connectivity patterns are preserved
  • The visitor simply wraps the existing algorithm in the visitor interface

✅ Greptile Issues: RESOLVED

Both concerns raised by Greptile have been addressed by commit 8434d49:

  1. Nested function CFG regression: Fixed by removing skipChildren from enterNode to allow recursive descent into nested functions
  2. maintainabilityIndex inconsistency: Fixed by recomputing MI whenever cyclomatic complexity is overridden (lines 277-289 in engine.js)

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Eliminate the last redundant tree traversal (Mode B) by making CFG participate in the unified AST walk. The statement-level CFG traversal was incompatible with the node-level visitor pattern used by other analyses.
**Why This Approach**: Completes Phase 3.1 architectural unification - all 4 analyses (AST, complexity, CFG, dataflow) now run in a single DFS walk via `walkWithVisitors`.
**Risk Assessment**: Major algorithmic change (+793/-724 lines) but extensive parity testing and architectural necessity justify the risk.

## Critical Concerns
**None identified.** No configuration changes, robust test coverage, and sound implementation.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: **APPROVE**
- Reasoning: This is a well-executed completion of a planned architectural refactoring. The extensive parity testing, preservation of algorithmic logic, and architectural benefits (unified walk, single source of truth for cyclomatic complexity) outweigh the inherent risks of a major rewrite. The code quality is high and the approach is sound.

Minor suggestions for future consideration:

  • Consider adding performance benchmarks to quantify the unified walk benefits
  • The 793-line visitor could potentially be broken down further for easier maintenance

@carlos-alm
Copy link
Contributor Author

Addressed both new review comments in 7418b6c:

1. cfgResults[0] returns wrong function (cfg.js:63): Replaced with .find() using node identity (result.funcNode === functionNode) so buildFunctionCFG always returns the CFG for the requested function, not the first-exited (innermost nested) one.

2. Line-number keying collision (engine.js:265): Replaced single-entry Map<line, result> with a Map<line, result[]> grouping. When multiple functions share a line, disambiguation uses function name matching (funcNode.childForFieldName('name').text === def.name). Applied to both complexity and CFG result matching for consistency.

@greptileai

Comment on lines 200 to +221
@@ -931,16 +212,13 @@ export async function buildCFGData(db, fileSymbols, rootDir, _engineOpts) {
const row = getNodeId.get(def.name, relPath, def.line);
if (!row) continue;

// Native path: use pre-computed CFG from Rust engine
// Use pre-computed CFG (native engine or unified walk), then visitor fallback
let cfg = null;
if (def.cfg?.blocks?.length) {
cfg = def.cfg;
} else {
// WASM fallback: compute CFG from tree-sitter AST
if (!tree || !complexityRules) continue;
const funcNode = findFunctionNode(tree.rootNode, def.line, def.endLine, complexityRules);
if (!funcNode) continue;
cfg = buildFunctionCFG(funcNode, langId);
} else if (visitorCfgByLine) {
const r = visitorCfgByLine.get(def.line);
if (r) cfg = { blocks: r.blocks, edges: r.edges };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line-number collision in visitorCfgByLine — no disambiguation

visitorCfgByLine.set(r.funcNode.startPosition.row + 1, r) uses a plain .set() that silently overwrites any prior entry at the same line. For a file containing two functions declared on the same source line (e.g. const f = () => {}, g = () => {};), whichever result is stored last wins, and the other function will receive no CFG data in the WASM fallback path.

engine.js (lines 272–282) was updated in this very PR to address the same pattern: it now accumulates an array per line and disambiguates by function name. The buildCFGData fallback was not given the same treatment.

Consider applying the same array-plus-name-disambiguation approach here:

visitorCfgByLine = new Map();
for (const r of cfgResults) {
  if (r.funcNode) {
    const line = r.funcNode.startPosition.row + 1;
    if (!visitorCfgByLine.has(line)) visitorCfgByLine.set(line, []);
    visitorCfgByLine.get(line).push(r);
  }
}

Then at the lookup site (line 220):

const candidates = visitorCfgByLine?.get(def.line);
const r = !candidates
  ? undefined
  : candidates.length === 1
    ? candidates[0]
    : (candidates.find((c) => {
        const n = c.funcNode.childForFieldName('name');
        return n && n.text === def.name;
      }) ?? candidates[0]);
if (r) cfg = { blocks: r.blocks, edges: r.edges };

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant