fix(core): deterministic normalization for parallel batch analyzer output#60
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| const oldSource = String(raw.source ?? ""); | ||
| const oldTarget = String(raw.target ?? ""); | ||
| const newSource = idMap.get(oldSource) ?? oldSource; | ||
| const newTarget = idMap.get(oldTarget) ?? oldTarget; |
There was a problem hiding this comment.
Edge source/target IDs not normalized before lookup
High Severity
Edge source/target IDs are only resolved via idMap.get(oldSource) ?? oldSource, but idMap only contains original node IDs as keys. When an edge references a node using a different malformed format than the node's original ID (e.g., edge uses "proj:file:src/a.ts" but the node was "file:src/a.ts"), the lookup misses, and the raw malformed ID falls through — failing the validNodeIds check and being silently dropped as dangling. The edge source/target needs to be independently normalized (via stripToValidPrefix) before the idMap fallback. The test "deduplicates edges after ID rewriting" passes for the wrong reason — the second edge is dropped as dangling, not deduplicated — masking this bug.
Additional Locations (1)
f579b43 to
aa8a194
Compare
|
@codex review this |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa8a194652
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const newSource = idMap.get(oldSource) ?? oldSource; | ||
| const newTarget = idMap.get(oldTarget) ?? oldTarget; |
There was a problem hiding this comment.
Rewrite non-canonical edge IDs before dangling-edge pruning
normalizeBatchOutput only rewrites edge endpoints through exact idMap lookups, so endpoints that are malformed-but-normalizable (for example proj:file:src/a.ts when the node exists as file:src/a.ts) are left unchanged and then dropped as dangling. In parallel batch merges this can silently remove valid relationships whenever edge IDs differ in formatting from node IDs, which defeats the normalization pass's purpose.
Useful? React with 👍 / 👎.
| if (prefix) { | ||
| return `${prefix}:${path}`; |
There was a problem hiding this comment.
Enforce node-type prefix when canonicalizing node IDs
When a node ID already has any valid prefix, normalizeNodeId returns it unchanged even if that prefix contradicts node.type (for example a function node with a file: ID). This leaves non-canonical IDs in place and can cause collisions during later node deduplication and edge rewriting, turning recoverable model output errors into dropped or miswired graph entities.
Useful? React with 👍 / 👎.
…r output Parallel file-analyzer subagents can produce inconsistent node IDs (project-name prefixed, double-prefixed, bare paths) and invalid complexity values. Phase 3 ASSEMBLE now normalizes these deterministically before merging, preventing cascading edge drops and dashboard load failures. - Add normalize-graph.ts with normalizeNodeId, normalizeComplexity, and normalizeBatchOutput utilities - Rewrite SKILL.md Phase 3 with 6-step normalization sequence - Strengthen file-analyzer prompt with ID format warnings - Add 32 normalization tests and 2 schema boundary tests
9da0166 to
60c2ead
Compare
|
@Lum1104 there were some redundant changes in this PR - like I'll close this and create a new PR with just new changes. |


Summary
normalize-graph.tswithnormalizeNodeId,normalizeComplexity, andnormalizeBatchOutpututilitiesReviewer focus
Start with
packages/core/src/analyzer/normalize-graph.ts— that's the core logic. In particular:stripToValidPrefix()(line 36) handles the recursive prefix-peeling — verify the loop correctly handles project-name, double-prefix, and bare-path casesnormalizeBatchOutput()(line 133) orchestrates everything — check the node-first-then-edges ordering and the ID map rewritingThe SKILL.md Phase 3 rewrite and prompt change are straightforward and follow the existing Phase 4/5 normalization patterns.
Test plan
pnpm --filter @understand-anything/core buildpassespnpm --filter @understand-anything/core test— all 191 tests pass/understand --fullon a multi-file project and verify no dropped nodes or edges in the dashboardNote
Medium Risk
Touches graph assembly semantics (ID canonicalization and edge rewrites), which can change node/edge identity and deduplication behavior; covered by new unit tests but could still affect downstream consumers relying on prior IDs.
Overview
Introduces
analyzer/normalize-graph.tswithnormalizeNodeId,normalizeComplexity, andnormalizeBatchOutputto canonicalize node IDs (strip project/double prefixes, add missingfile:/func:/class:prefixes), coerce numeric/aliased complexity tosimple|moderate|complex, and rewrite + dedupe edges while dropping dangling references and reporting stats.Exports these utilities from
packages/core/src/index.ts, adds comprehensive normalization tests plus a schema test asserting node IDs may be bare strings, and updates/understanddocs/prompts to explicitly warn about invalid ID formats and define the new Phase 3 normalization sequence before dedupe/dangling-edge cleanup.Written by Cursor Bugbot for commit 60c2ead. This will update automatically on new commits. Configure here.