fix(core): deterministic normalization for parallel batch analyzer output by fishinakleinbottle · Pull Request #60 · Lum1104/Understand-Anything

fishinakleinbottle · 2026-03-31T05:38:36Z

Summary

Adds normalize-graph.ts with normalizeNodeId, normalizeComplexity, and normalizeBatchOutput utilities
Rewrites Phase 3 (ASSEMBLE) in SKILL.md with a 6-step normalization sequence that runs before deduplication and dangling-edge cleanup
Strengthens the file-analyzer prompt with explicit ID format warnings
32 new normalization tests + 2 schema boundary tests

Reviewer focus

Start with packages/core/src/analyzer/normalize-graph.ts — that's the core logic. In particular:

stripToValidPrefix() (line 36) handles the recursive prefix-peeling — verify the loop correctly handles project-name, double-prefix, and bare-path cases
normalizeBatchOutput() (line 133) orchestrates everything — check the node-first-then-edges ordering and the ID map rewriting

The SKILL.md Phase 3 rewrite and prompt change are straightforward and follow the existing Phase 4/5 normalization patterns.

Test plan

pnpm --filter @understand-anything/core build passes
pnpm --filter @understand-anything/core test — all 191 tests pass
Run /understand --full on a multi-file project and verify no dropped nodes or edges in the dashboard

Note

Medium Risk
Touches graph assembly semantics (ID canonicalization and edge rewrites), which can change node/edge identity and deduplication behavior; covered by new unit tests but could still affect downstream consumers relying on prior IDs.

Overview
Introduces analyzer/normalize-graph.ts with normalizeNodeId, normalizeComplexity, and normalizeBatchOutput to canonicalize node IDs (strip project/double prefixes, add missing file:/func:/class: prefixes), coerce numeric/aliased complexity to simple|moderate|complex, and rewrite + dedupe edges while dropping dangling references and reporting stats.

Exports these utilities from packages/core/src/index.ts, adds comprehensive normalization tests plus a schema test asserting node IDs may be bare strings, and updates /understand docs/prompts to explicitly warn about invalid ID formats and define the new Phase 3 normalization sequence before dedupe/dangling-edge cleanup.

^{Written by Cursor Bugbot for commit 60c2ead. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-31T05:46:11Z

understand-anything-plugin/packages/core/src/analyzer/normalize-graph.ts

+    const oldSource = String(raw.source ?? "");
+    const oldTarget = String(raw.target ?? "");
+    const newSource = idMap.get(oldSource) ?? oldSource;
+    const newTarget = idMap.get(oldTarget) ?? oldTarget;


Edge source/target IDs not normalized before lookup

High Severity

Edge source/target IDs are only resolved via idMap.get(oldSource) ?? oldSource, but idMap only contains original node IDs as keys. When an edge references a node using a different malformed format than the node's original ID (e.g., edge uses "proj:file:src/a.ts" but the node was "file:src/a.ts"), the lookup misses, and the raw malformed ID falls through — failing the validNodeIds check and being silently dropped as dangling. The edge source/target needs to be independently normalized (via stripToValidPrefix) before the idMap fallback. The test "deduplicates edges after ID rewriting" passes for the wrong reason — the second edge is dropped as dangling, not deduplicated — masking this bug.

Additional Locations (1)

understand-anything-plugin/packages/core/src/__tests__/normalize-graph.test.ts#L290-L331

Lum1104 · 2026-03-31T07:32:17Z

@codex review this

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa8a194652

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T07:35:56Z

understand-anything-plugin/packages/core/src/analyzer/normalize-graph.ts

+    const newSource = idMap.get(oldSource) ?? oldSource;
+    const newTarget = idMap.get(oldTarget) ?? oldTarget;


Rewrite non-canonical edge IDs before dangling-edge pruning

normalizeBatchOutput only rewrites edge endpoints through exact idMap lookups, so endpoints that are malformed-but-normalizable (for example proj:file:src/a.ts when the node exists as file:src/a.ts) are left unchanged and then dropped as dangling. In parallel batch merges this can silently remove valid relationships whenever edge IDs differ in formatting from node IDs, which defeats the normalization pass's purpose.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-31T07:35:56Z

understand-anything-plugin/packages/core/src/analyzer/normalize-graph.ts

+  if (prefix) {
+    return `${prefix}:${path}`;


Enforce node-type prefix when canonicalizing node IDs

When a node ID already has any valid prefix, normalizeNodeId returns it unchanged even if that prefix contradicts node.type (for example a function node with a file: ID). This leaves non-canonical IDs in place and can cause collisions during later node deduplication and edge rewriting, turning recoverable model output errors into dropped or miswired graph entities.

Useful? React with 👍 / 👎.

…r output Parallel file-analyzer subagents can produce inconsistent node IDs (project-name prefixed, double-prefixed, bare paths) and invalid complexity values. Phase 3 ASSEMBLE now normalizes these deterministically before merging, preventing cascading edge drops and dashboard load failures. - Add normalize-graph.ts with normalizeNodeId, normalizeComplexity, and normalizeBatchOutput utilities - Rewrite SKILL.md Phase 3 with 6-step normalization sequence - Strengthen file-analyzer prompt with ID format warnings - Add 32 normalization tests and 2 schema boundary tests

fishinakleinbottle · 2026-03-31T12:07:49Z

@Lum1104 there were some redundant changes in this PR - like VALID_NODE_TYPES, VALID_EDGE_TYPES which were already being checked for, and normalizing batch output which is also taken care of.

I'll close this and create a new PR with just new changes.

fishinakleinbottle self-assigned this Mar 31, 2026

fishinakleinbottle linked an issue Mar 31, 2026 that may be closed by this pull request

Parallel file-analyzer subagents can produce inconsistent node IDs and invalid complexity values #59

Closed

cursor bot reviewed Mar 31, 2026

View reviewed changes

fishinakleinbottle force-pushed the fix/normalization-for-generated-graph branch from f579b43 to aa8a194 Compare March 31, 2026 05:55

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

fishinakleinbottle force-pushed the fix/normalization-for-generated-graph branch from 9da0166 to 60c2ead Compare March 31, 2026 11:05

fishinakleinbottle closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): deterministic normalization for parallel batch analyzer output#60

fix(core): deterministic normalization for parallel batch analyzer output#60
fishinakleinbottle wants to merge 1 commit intoLum1104:mainfrom
fishinakleinbottle:fix/normalization-for-generated-graph

fishinakleinbottle commented Mar 31, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 31, 2026

Uh oh!

Lum1104 commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Uh oh!

fishinakleinbottle commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const newSource = idMap.get(oldSource) ?? oldSource;
		const newTarget = idMap.get(oldTarget) ?? oldTarget;

Conversation

fishinakleinbottle commented Mar 31, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reviewer focus

Test plan

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 31, 2026

Choose a reason for hiding this comment

Edge source/target IDs not normalized before lookup

Uh oh!

Lum1104 commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

fishinakleinbottle commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fishinakleinbottle commented Mar 31, 2026 •

edited by cursor bot

Loading