Skip to content

test: prove scanSync JSON bug with multi-line tokens (expected to FAIL)#148

Open
pyramation wants to merge 2 commits intomainfrom
devin/1775610355-prove-scan-json-bug
Open

test: prove scanSync JSON bug with multi-line tokens (expected to FAIL)#148
pyramation wants to merge 2 commits intomainfrom
devin/1775610355-prove-scan-json-bug

Conversation

@pyramation
Copy link
Copy Markdown
Collaborator

@pyramation pyramation commented Apr 8, 2026

Summary

Adds three test cases that are expected to fail on the current main branch, proving that scanSync / scan cannot handle SQL containing multi-line tokens.

The root cause: build_scan_json() in full/src/wasm_wrapper.c escapes " and \ in token text but not \n, \r, or \t. When a token spans multiple lines (dollar-quoted function bodies, multi-line /* */ comments), the raw JSON contains literal control characters, and JSON.parse throws "Bad control character in string literal".

The three failing tests cover:

  1. Dollar-quoted function body — a CREATE FUNCTION with a multi-line PL/pgSQL body
  2. Dollar-quoted string with tabs — a $$...$$ literal containing actual tab characters
  3. Multi-line block comment — a /* ... */ comment spanning multiple lines

All test SQL uses template literals with actual newlines and tabs so the failing input is visually obvious in the source code (no escape sequences).

Do not merge this PR. It exists to demonstrate the bug. See PR #147 for the fix that makes these tests pass.

Review & Testing Checklist for Human

  • Verify CI failures are the expected ones: The Test full on * jobs should fail with SyntaxError: Bad control character in string literal in JSON on the three new tests. If they fail for a different reason, something else is wrong.
  • Cross-reference with PR fix: escape control characters in build_scan_json() for multi-line tokens #147: Confirm the fix PR's CI passes these same three tests (it does — 48/48 green).
  • Inspect the tab character in the "dollar-quoted tokens with tabs" test: The template literal on line ~252 contains a real \t — verify it renders as a tab in the diff view, not a run of spaces.

Notes

  • No production code is changed — this is test-only.
  • The diff also fixes a missing trailing newline at end of file.

Link to Devin session: https://app.devin.ai/sessions/67facbcfe0ae424bad3eafb4e6ca9059
Requested by: @pyramation

These tests demonstrate that scanSync throws 'Bad control character in
string literal' when scanning SQL with multi-line tokens (dollar-quoted
function bodies, tabs, multi-line C-style comments).

The root cause is that build_scan_json() in wasm_wrapper.c only escapes
'"' and '\\' in token text, but not '\n', '\r', '\t'. When token text
contains literal newlines, the JSON output has unescaped control chars
that break JSON.parse.

These tests are expected to FAIL on this branch (no fix applied).
See PR #147 for the fix.
@devin-ai-integration
Copy link
Copy Markdown

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant