feat: CFG visitor rewrite (Phase 3.1 completion)#392
feat: CFG visitor rewrite (Phase 3.1 completion)#392carlos-alm wants to merge 3 commits intomainfrom
Conversation
|
Claude finished @carlos-alm's task —— View job PR Review: CFG Visitor Rewrite AnalysisReview Checklist
Root Cause AnalysisProblem Being Solved: The original CFG construction used a statement-level traversal ( Why This Approach: Converting CFG to a node-level visitor allows it to share the exact same DFS traversal as complexity, dataflow, and AST extraction, eliminating redundant tree walks and completing the unified analysis framework (Phase 3.1). Risk Assessment: This is a major algorithmic rewrite replacing 813 lines of proven statement-level logic with 793 lines of visitor-pattern logic. However, the extensive parity testing and roadmap context suggest this is completing a planned architecture evolution. Critical Concerns (if any)✅ Configuration Changes: CLEANNo configuration files modified - no risk of silencing errors through relaxed linting/build rules. ✅ Test Integrity: ROBUSTStrong points:
✅ Code Quality: SOLID ARCHITECTUREAlgorithmic soundness:
Implementation quality:
Potential concerns mitigated:
Final AssessmentMinor suggestions for future consideration:
|
… standalone impl Rewrite CFG construction as a visitor (createCfgVisitor) that plugs into the walkWithVisitors framework, eliminating the last redundant tree traversal (Mode B) in engine.js. All 4 analyses now run in a single DFS walk. - Create src/ast-analysis/visitors/cfg-visitor.js (~570 lines) with full control-flow support: if patterns A/B/C, for/while/do-while/infinite loops, switch/match, try/catch/finally, break/continue with labels, nested functions - Derive cyclomatic complexity from CFG (E - N + 2) as single source of truth, overriding the independently-computed value from the complexity visitor - Replace 813-line buildFunctionCFG with 15-line thin visitor wrapper - Simplify buildCFGData WASM fallback: file-level visitor walk instead of per-function findFunctionNode calls - Remove Mode A/B split from engine.js; simplify WASM pre-parse (CFG no longer triggers it) - cfg.js reduced from 1242 to 518 lines (-724 lines) - Add 31 new tests: 23 parity (visitor vs original) + 8 cyclomatic formula Impact: 35 functions changed, 61 affected
18be36a to
27a304d
Compare
|
Claude finished @carlos-alm's task —— View job PR Review: CFG Visitor Rewrite AnalysisReview Checklist
Root Cause AnalysisProblem Being Solved: Eliminate the last redundant tree traversal (Mode B) by making CFG participate in the unified AST walk. The statement-level CFG traversal was incompatible with the node-level visitor pattern used by other analyses. Critical Concerns (if any)✅ Configuration Changes: CLEANNo configuration files modified - no risk of silencing errors through relaxed linting/build rules. ✅ Test Integrity: ROBUSTStrong points:
✅ Code Quality: SOLID ARCHITECTUREAlgorithmic soundness:
Implementation quality:
Potential concerns mitigated:
Final AssessmentMinor suggestions for future consideration:
|
Greptile SummaryThis PR completes Phase 3.1 by rewriting CFG construction as a Key changes:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant E as engine.js runAnalyses
participant W as walkWithVisitors (single DFS)
participant AV as ast-store-visitor
participant CV as complexity-visitor
participant CFG as cfg-visitor
participant DV as dataflow-visitor
E->>W: walkWithVisitors(rootNode, [astV, complexityV, cfgV, dfV])
loop DFS over every node
W->>AV: enterNode / exitNode
W->>CV: enterFunction / exitFunction
W->>CFG: enterFunction → processStatements (full body)
Note over CFG: builds blocks+edges,<br/>funcStateStack for nesting
W->>DV: enterNode / exitNode
end
W-->>E: results {ast, complexity, cfg, dataflow}
E->>E: map complexity results by line (array per line + name disambig)
E->>E: map CFG results by line (array per line + name disambig)
E->>E: store def.cfg = {blocks, edges}
E->>E: override def.complexity.cyclomatic (E−N+2)
E->>E: recompute maintainabilityIndex with new cyclomatic
E->>E: store symbols.dataflow
|
| enterFunction(funcNode, _funcName, _context) { | ||
| if (S) { | ||
| // Nested function — push current state | ||
| funcStateStack.push(S); | ||
| } | ||
| S = makeFuncState(); | ||
| S.funcNode = funcNode; | ||
|
|
||
| // Check for expression body (arrow functions): no block body | ||
| const body = funcNode.childForFieldName('body'); | ||
| if (!body) { | ||
| // No body at all — entry → exit | ||
| // Remove the firstBody block and its edge | ||
| S.blocks.length = 2; // keep entry + exit | ||
| S.edges.length = 0; | ||
| S.addEdge(S.entryBlock, S.exitBlock, 'fallthrough'); | ||
| S.currentBlock = null; | ||
| return; | ||
| } | ||
|
|
||
| if (!isBlockNode(body.type)) { | ||
| // Expression body (e.g., arrow function `(x) => x + 1`) | ||
| // entry → body → exit (body is the expression) | ||
| const bodyBlock = S.blocks[2]; // the firstBody we already created | ||
| bodyBlock.startLine = body.startPosition.row + 1; | ||
| bodyBlock.endLine = body.endPosition.row + 1; | ||
| S.addEdge(bodyBlock, S.exitBlock, 'fallthrough'); | ||
| S.currentBlock = null; // no further processing needed | ||
| return; | ||
| } | ||
|
|
||
| // Block body — process statements | ||
| const stmts = getBodyStatements(body); | ||
| if (stmts.length === 0) { | ||
| // Empty function | ||
| S.blocks.length = 2; | ||
| S.edges.length = 0; | ||
| S.addEdge(S.entryBlock, S.exitBlock, 'fallthrough'); | ||
| S.currentBlock = null; | ||
| return; | ||
| } | ||
|
|
||
| // Process all body statements using the statement-level processor | ||
| const firstBody = S.blocks[2]; // the firstBody block | ||
| const lastBlock = processStatements(stmts, firstBody); | ||
| if (lastBlock) { | ||
| S.addEdge(lastBlock, S.exitBlock, 'fallthrough'); | ||
| } | ||
| S.currentBlock = null; // done processing | ||
| }, | ||
|
|
||
| exitFunction(funcNode, _funcName, _context) { | ||
| if (S && S.funcNode === funcNode) { | ||
| // Derive cyclomatic complexity from CFG: E - N + 2 | ||
| const cyclomatic = S.edges.length - S.blocks.length + 2; | ||
| results.push({ | ||
| funcNode: S.funcNode, | ||
| blocks: S.blocks, | ||
| edges: S.edges, | ||
| cyclomatic: Math.max(cyclomatic, 1), | ||
| }); | ||
| } | ||
|
|
||
| // Pop to parent function state (if nested) | ||
| S = funcStateStack.length > 0 ? funcStateStack.pop() : null; | ||
| }, |
There was a problem hiding this comment.
funcStateStack is dead code; nested functions silently skipped in file-level walks
The funcStateStack push (line 710-712) and pop (line 773) pattern is designed to handle nested functions, but will never execute in practice when using file-level walks. Here's why:
In walkWithVisitors, when a function node is processed:
enterFunctionis called at depthdenterNodeis called immediately after and returns{ skipChildren: true }- This sets
skipDepths[cfgVisitorIndex] = d, causing all descendants (depth >d) to be skipped for the CFG visitor - Any nested function at depth >
dwill haveisSkipped(cfgVisitorIndex, depth)return true - Therefore,
enterFunctionis never called for nested functions (line 89 in visitor.js checks!isSkipped())
Consequence: In buildCFGData (cfg.js), when using file-level walks on WASM trees, nested function definitions tracked as separate entries in symbols.definitions will silently receive no CFG data. Functions like:
function outer() {
function inner() { /* ... */ } // Will not get CFG computed
}Note: The 23 parity tests verify CFG correctness for top-level functions, but none cover nested function definitions.
To support nested-function CFG, consider removing skipChildren and instead handling nesting inside enterFunction via the state stack so each nested function is processed independently.
Rule Used: CLAUDE.md (source)
| if (def.complexity && cfgResult.cyclomatic != null) { | ||
| def.complexity.cyclomatic = cfgResult.cyclomatic; | ||
| } |
There was a problem hiding this comment.
maintainabilityIndex not recomputed when cyclomatic is overridden
The maintainabilityIndex is computed at lines 237–253 using the complexity visitor's cyclomatic value and stored in def.complexity. When the CFG visitor's cyclomatic (E - N + 2) overrides this value at line 279, the stored MI is now inconsistent — it was computed from the old cyclomatic but stored alongside the new cyclomatic.
If the two cyclomatic computations ever diverge (language-specific edge cases, CFG construction differences, or bugs in either implementation), the MI will not reflect the stored cyclomatic value.
Fix: Recompute maintainabilityIndex whenever cyclomatic is overridden:
if (def.complexity && cfgResult.cyclomatic != null) {
def.complexity.cyclomatic = cfgResult.cyclomatic;
// Recompute MI with the updated cyclomatic to keep metrics consistent
const { loc, halstead } = def.complexity;
const volume = halstead ? halstead.volume : 0;
const commentRatio = loc?.loc > 0 ? loc.commentLines / loc.loc : 0;
def.complexity.maintainabilityIndex = computeMaintainabilityIndex(
volume, cfgResult.cyclomatic, loc?.sloc ?? 0, commentRatio,
);
}Rule Used: CLAUDE.md (source)
Remove skipChildren from cfg-visitor's enterNode so the walker recurses into nested function definitions, enabling the funcStateStack to work as designed. Recompute maintainabilityIndex in engine.js whenever the CFG-derived cyclomatic overrides the complexity visitor's value, keeping metrics consistent. Addresses Greptile review feedback on PR #392. Impact: 3 functions changed, 1 affected
|
Claude finished @carlos-alm's task —— View job PR Review: CFG Visitor Rewrite AnalysisReview Checklist
Root Cause AnalysisProblem Being Solved: Eliminate the last redundant tree traversal (Mode B) by making CFG participate in the unified AST walk. The statement-level CFG traversal was incompatible with the node-level visitor pattern used by other analyses. Critical Concerns (if any)✅ Configuration Changes: CLEANNo configuration files modified - no risk of silencing errors through relaxed linting/build rules. ✅ Test Integrity: ROBUSTStrong points:
✅ Code Quality: SOLID ARCHITECTUREAlgorithmic soundness:
Implementation quality:
Potential concerns mitigated:
✅ Greptile Issues: RESOLVEDBoth concerns raised by Greptile have been addressed by commit 8434d49:
Final AssessmentMinor suggestions for future consideration:
|
|
Addressed both review comments in 8434d49: 1. Nested function CFG ( 2. MI recomputation ( |
src/cfg.js
Outdated
| const r = cfgResults[0]; | ||
| return { blocks: r.blocks, edges: r.edges, cyclomatic: r.cyclomatic }; |
There was a problem hiding this comment.
cfgResults[0] returns the first-exited (innermost nested) function in DFS post-order, not the passed function. For example, calling buildFunctionCFG on:
function outer() {
function inner() { return 1; }
return inner();
}will return inner's CFG instead of outer's, because exitFunction is called deepest-first.
Use node identity to match the result:
| const r = cfgResults[0]; | |
| return { blocks: r.blocks, edges: r.edges, cyclomatic: r.cyclomatic }; | |
| const r = cfgResults.find((result) => result.funcNode === functionNode); | |
| if (!r) return { blocks: [], edges: [], cyclomatic: 0 }; | |
| return { blocks: r.blocks, edges: r.edges, cyclomatic: r.cyclomatic }; |
| const cfgByLine = new Map(); | ||
| for (const r of cfgResults) { | ||
| if (r.funcNode) { | ||
| const line = r.funcNode.startPosition.row + 1; | ||
| cfgByLine.set(line, r); | ||
| } |
There was a problem hiding this comment.
Line-number keying for CFG result matching has a collision risk: if two functions start on the same source line (e.g., const f = () => {}, g = () => {};), the second will silently overwrite the first in the map. The wrong CFG and cyclomatic would then be assigned to one definition.
Consider disambiguating by column offset as a secondary key, similar to how source maps handle multiple items per line.
- buildFunctionCFG: use .find() with node identity instead of cfgResults[0] so nested functions don't shadow the target function's CFG - engine.js: replace line-only Map keying with line → results[] grouping, disambiguating by function name when multiple functions share a line Addresses Greptile re-review feedback on PR #392. Impact: 2 functions changed, 0 affected
|
Claude finished @carlos-alm's task —— View job PR Review: CFG Visitor Rewrite Analysis
|
|
Addressed both new review comments in 7418b6c: 1. 2. Line-number keying collision ( |
| @@ -931,16 +212,13 @@ export async function buildCFGData(db, fileSymbols, rootDir, _engineOpts) { | |||
| const row = getNodeId.get(def.name, relPath, def.line); | |||
| if (!row) continue; | |||
|
|
|||
| // Native path: use pre-computed CFG from Rust engine | |||
| // Use pre-computed CFG (native engine or unified walk), then visitor fallback | |||
| let cfg = null; | |||
| if (def.cfg?.blocks?.length) { | |||
| cfg = def.cfg; | |||
| } else { | |||
| // WASM fallback: compute CFG from tree-sitter AST | |||
| if (!tree || !complexityRules) continue; | |||
| const funcNode = findFunctionNode(tree.rootNode, def.line, def.endLine, complexityRules); | |||
| if (!funcNode) continue; | |||
| cfg = buildFunctionCFG(funcNode, langId); | |||
| } else if (visitorCfgByLine) { | |||
| const r = visitorCfgByLine.get(def.line); | |||
| if (r) cfg = { blocks: r.blocks, edges: r.edges }; | |||
There was a problem hiding this comment.
Line-number collision in visitorCfgByLine — no disambiguation
visitorCfgByLine.set(r.funcNode.startPosition.row + 1, r) uses a plain .set() that silently overwrites any prior entry at the same line. For a file containing two functions declared on the same source line (e.g. const f = () => {}, g = () => {};), whichever result is stored last wins, and the other function will receive no CFG data in the WASM fallback path.
engine.js (lines 272–282) was updated in this very PR to address the same pattern: it now accumulates an array per line and disambiguates by function name. The buildCFGData fallback was not given the same treatment.
Consider applying the same array-plus-name-disambiguation approach here:
visitorCfgByLine = new Map();
for (const r of cfgResults) {
if (r.funcNode) {
const line = r.funcNode.startPosition.row + 1;
if (!visitorCfgByLine.has(line)) visitorCfgByLine.set(line, []);
visitorCfgByLine.get(line).push(r);
}
}Then at the lookup site (line 220):
const candidates = visitorCfgByLine?.get(def.line);
const r = !candidates
? undefined
: candidates.length === 1
? candidates[0]
: (candidates.find((c) => {
const n = c.funcNode.childForFieldName('name');
return n && n.text === def.name;
}) ?? candidates[0]);
if (r) cfg = { blocks: r.blocks, edges: r.edges };
Summary
createCfgVisitorinsrc/ast-analysis/visitors/cfg-visitor.jsreplaces the 813-line standalonebuildFunctionCFGimplementation. All 4 analyses (AST, complexity, CFG, dataflow) now run in a single DFS walk viawalkWithVisitors— Mode A/B split eliminated.E - N + 2) as single source of truth, overriding the independently-computed value from the complexity visitor.buildCFGDataWASM fallback uses file-level visitor walk instead of per-functionfindFunctionNodecalls.cfg.jsreduced from 1242 to 518 lines (-724 lines).Test plan
buildFunctionCFGE - N + 2derivation for various control flow patterns