Skip to content

feat: inline cross-reference hyperlinks in law reader#20

Open
daffaromero wants to merge 19 commits intoilhamfp:mainfrom
daffaromero:feature/crossref-links
Open

feat: inline cross-reference hyperlinks in law reader#20
daffaromero wants to merge 19 commits intoilhamfp:mainfrom
daffaromero:feature/crossref-links

Conversation

@daffaromero
Copy link

@daffaromero daffaromero commented Feb 27, 2026

Summary

  • Adds Wikipedia-style inline cross-reference links to pasal body text in the law reader
  • Intra-document "Pasal N" / "Pasal N ayat (M)" references become anchor links that scroll to the target pasal and trigger the existing `HashHighlighter` green flash
  • Cross-document UU/PP/Perpres/Perppu/Perda citations are resolved server-side against a `worksLookup` map and rendered as `` components navigating to the correct law page; unresolvable citations fall back to plain text (no broken links)

Note: This branch is rebased onto `fix/structured-law-pasal-loading` (PR #21) to avoid a merge conflict and regression. Either order of merging is now safe — both PRs agree on the `!hasStructure` pagination guard.

Changes

File Change
`src/lib/crossref.ts` Pure tokenizer — parses legal citation patterns into `Token` discriminated union
`src/lib/tests/crossref.test.ts` 20 Vitest tests covering all token types, edge cases, Perpu/Perppu aliases, Perpres, Perda, optional Nomor keyword, mixed references, and unknown types
`src/components/reader/RichPasalContent.tsx` `"use client"` component that renders tokenized text as links or plain fragments
`src/app/[locale]/peraturan/[type]/[slug]/page.tsx` Adds `worksLookup` map (built server-side from all works, passed to reader); preserves `!hasStructure` pagination guard from PR #21
`src/components/reader/PasalList.tsx` Threads `worksLookup` prop down to `PasalBlock`
`src/components/reader/PasalBlock.tsx` Replaces plain text `
` with `RichPasalContent`

Bugs fixed

Bug Fix
`citationToSlugKey()` emitted `uu-no-13-tahun-2003` but DB trigger `generate_work_slug()` produces `uu-13-2003` — cross-document links were silently falling back to plain text 100% of the time Fixed `citationToSlugKey()` to emit `{type}-{number}-{year}` format; updated test fixtures to match
Scanned PDFs use all-caps `UNDANG-UNDANG` / `PASAL` — not matched by original regex Added `i` flag to `CROSSREF_RE` (`gi`), making the entire outer regex case-insensitive

Test Plan

  • Open a law page with cross-references (e.g. `/peraturan/uu/uu-1-2026`)
  • Verify "Pasal N" text renders as a dotted-underlined link
  • Click a pasal link — should scroll to anchor and flash green via `HashHighlighter`
  • Verify a cross-UU citation (e.g. `UNDANG-UNDANG Nomor 6 Tahun 2023`) renders as a link to `/peraturan/uu/uu-6-2023`
  • Verify a citation for a law not in DB renders as plain text (no broken link)
  • Verify a structured law with 100+ pasals (e.g. UU Cipta Kerja) still renders all BABs correctly (pagination guard intact)
  • 44/44 Vitest tests passing (20 crossref + 24 across other test files)
  • No new lint errors introduced

@daffaromero daffaromero force-pushed the feature/crossref-links branch 2 times, most recently from 630f402 to b826e33 Compare February 27, 2026 17:00
daffaromero and others added 2 commits February 28, 2026 21:33
The usePagination flag was based solely on pasal count (>= 100), but
client-side infinite scroll doesn't work when pasals are rendered
per-BAB server-side — only the initial 30 SSR pasals were ever shown
under their BABs, leaving all subsequent BABs empty.

Fix: skip client pagination when the law has BABs/aturan/lampiran
structure nodes, and always fetch the full pasal set SSR instead.
Flat laws (no BABs) with 100+ pasals still use infinite scroll.

Co-authored-by: Claude <noreply@anthropic.com>
…types

The original hasBABs check only tested for bab/aturan/lampiran, but the
BAB rendering path fires on any structural node (babNodes.length > 0
includes bagian and paragraf nodes too). A law with bagian-only structure
and 100+ pasals would still regress under the previous check.

Replace hasBABs with hasStructure = structure.length > 0 — aligns the
pagination guard directly with the rendering condition.

Co-authored-by: Claude <noreply@anthropic.com>
@daffaromero
Copy link
Author

PR Description Audit

Went through the description against the actual code. A few things to correct:


1. Test count is wrong — 20, not 31

The description says "31/31 Vitest tests passing". The test file (src/lib/__tests__/crossref.test.ts) has 20 it() blocks, not 31. Easy to miscount if you're going by describe blocks rather than individual cases.


2. Cross-document UU links are actually still broken (slug format mismatch)

This is the most significant issue. The "Bugs fixed" table says:

Cross-UU slug was uu-13-2003 but DB slugs are uu-no-13-tahun-2003 | Fixed citationToSlugKey() to emit uu-no-${number}-tahun-${year}

This is inverted. The DB trigger (generate_work_slug() in migration 053) produces slugs in the format:

{type_prefix}-{number}-{year}

e.g. "uu-13-2003", "pp-74-2008". There is no no- or tahun- in DB slugs.

The worksLookup map is built from w.slug directly (line 251–258 of page.tsx), with the comment even confirming: "uu-13-2003" → "/peraturan/uu/uu-13-2003".

But citationToSlugKey("Undang-Undang Nomor 13 Tahun 2003") returns "uu-no-13-tahun-2003". That key will never exist in worksLookup, so worksLookup[slugKey] is always undefined, and all cross-document citations silently fall back to plain text.

The tests pass because they construct the worksLookup fixture with the uu-no-13-tahun-2003 key directly — they test that the tokenizer resolves a key it's given, not that the key matches what the DB actually produces.

The actual correct fix would be to change citationToSlugKey to emit "uu-13-2003" format (matching the DB trigger), or to build the worksLookup map using a derived key in uu-no-N-tahun-YYYY format alongside the raw slug. Either way, the formats need to agree.


3. Bug fix description for the i flag is inaccurate

The table says:

Scanned PDFs use capital-A Ayat — not matched by original regex | Added i flag to all inner regexes

Looking at the code, the i flag was added to CROSSREF_RE itself (the outer regex, making it "gi"). The pasal match group (?:Pasal) is now case-insensitive via the outer flag. There are no "inner regexes" that had i flags added — the numMatch extractor inside the tokenizer already matched case-insensitively via the outer i. The description over-complicates what actually happened.


4. Minor: worksLookup ISR framing is slightly off

The description says "Fetches worksLookup map server-side (ISR 24h)". The ISR 24h comes from export const revalidate = 86400 at the page level — it applies to the whole page, not specifically to the worksLookup query. The query itself is just one of the parallel fetches inside LawReaderSection, with no dedicated cache. Not wrong per se, but the phrasing implies the lookup has its own separate caching strategy.


Summary

Claim Status
31 Vitest tests Incorrect — 20 tests
Cross-UU slug bug "fixed" Incorrect — still broken (format mismatch between citationToSlugKey and DB slugs)
i flag added to "all inner regexes" Inaccurate — added to outer CROSSREF_RE only
worksLookup ISR 24h Slightly misleading framing

@daffaromero daffaromero force-pushed the feature/crossref-links branch 4 times, most recently from 4c7bf91 to 97842ee Compare February 28, 2026 17:21
daffaromero and others added 17 commits March 1, 2026 00:25
Laws like UU 6/2023 (Ciptaker) wrap a full law as LAMPIRAN. The parser
picks up the LAMPIRAN's table of contents as real BAB nodes, producing
heading-only sections with no Pasal content in the reader.

Add a lightweight parallel query fetching all parent_id values for pasals
of the current work. Build structuralIdsWithPasals Set. Filter babNodes
so only top-level structural nodes (BAB/aturan/lampiran) with at least
one pasal directly or via a direct child section are rendered. Sub-sections
(Bagian/Paragraf, parent_id != null) are kept unconditionally.

Co-authored-by: Claude <noreply@anthropic.com>
The previous filter only checked direct children of each top-level BAB
node when determining whether it had pasal content. This missed the
BAB → Bagian → Paragraf → Pasal nesting depth documented in the schema,
causing those BABs to be silently filtered out.

Replace with a parent→children map + recursive hasDescendantPasal()
that walks the full subtree at any depth, so a BAB is only filtered if
no structural node in its entire subtree is a direct parent of a pasal.

Co-authored-by: Claude <noreply@anthropic.com>
…l to all structural nodes

Remove the unconditional short-circuit that passed any structural node with
a non-null parent_id through the babNodes filter. Phantom TOC-BABs inside a
LAMPIRAN have parent_id = lampiran_db_id (non-null), so the guard was letting
them through despite having zero pasal descendants.

Applying hasDescendantPasal() to every structural node regardless of depth
fixes UU 6/2023 (Cipta Kerja): the duplicated TOC BABs parsed from the
LAMPIRAN TOC pages are now correctly filtered out while real BABs and their
Bagian/Paragraf sub-sections remain (they ARE in structuralIdsWithPasals).

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
…ichPasalContent

Co-authored-by: Claude <noreply@anthropic.com>
…eference links

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Improves discoverability of cross-reference hyperlinks in pasal body
text — always-dotted underline at rest makes links visible without
the heavy weight of a permanent solid underline on dense legal prose.

Co-authored-by: Claude <noreply@anthropic.com>
Add tests for Perpres/Perda citations, optional Nomor keyword,
mixed Pasal+UU in one string, trailing text after last reference,
and unknown regulation types falling back to plain text.

Co-authored-by: Claude <noreply@anthropic.com>
- Match both 'ayat' and 'Ayat' in the Pasal regex — scanned PDFs vary in
  capitalization (e.g. 'Pasal 90 Ayat (3)' was not linked before)
- citationToSlugKey now produces 'uu-no-13-tahun-2003' format to match the
  slug column generated by the DB trigger (was producing 'uu-13-2003')
- Update all test fixtures and add new Ayat capital-case test

Co-authored-by: Claude <noreply@anthropic.com>
Scanned PDFs frequently use all-caps UNDANG-UNDANG, PASAL, PERATURAN etc.
Add 'i' flag to CROSSREF_RE so all capitalisation variants are matched.
Also adds two new tests covering UNDANG-UNDANG and PASAL all-caps forms.

Co-authored-by: Claude <noreply@anthropic.com>
citationToSlugKey() was returning 'uu-no-13-tahun-2003' but the DB
trigger generate_work_slug() (migration 053) produces 'uu-13-2003'.
The worksLookup map is keyed by raw DB slugs, so every cross-document
citation was silently falling back to plain text — never resolving to
a link.

Fix: remove '-no-' and '-tahun-' from the returned key format so it
matches {type}-{number}-{year}.

Update tests to use the correct slug fixtures.

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
@daffaromero daffaromero force-pushed the feature/crossref-links branch from 97842ee to 3fa3aaa Compare February 28, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant