fix(postgresql): only replace ? param placeholders outside quoted strings#218
fix(postgresql): only replace ? param placeholders outside quoted strings#218guoyangzhen wants to merge 3 commits intounjs:mainfrom
? param placeholders outside quoted strings#218Conversation
…ings and comments Fixes unjs#216 Fixes nuxt/content#3682 The previous normalizeParams used a naive sql.replace(/\?/g, ...) that replaced ALL question marks, including literal ones inside: - Single-quoted strings (e.g., 'What?') - Dollar-quoted strings (e.g., 397449text397449) - Comments This caused data corruption when messages contained question marks. The fix uses proper string-aware parsing to only replace ? characters that appear outside of quoted regions.
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/connectors/postgresql.ts`:
- Around line 66-83: Detect whether the single-quoted literal is an escape
string by checking the prefix (i > 0 and sql[i-1] === 'E' or 'e') when you enter
the sql[i] === "'" branch, and set a flag (e.g., isEscapeString). In the inner
loop that advances j, if isEscapeString and sql[j] === "\\" then advance j by 2
to skip backslash-escaped characters (ensuring j+1 < n) so sequences like \' and
\\ are not treated as terminators; retain the existing doubled-quote handling
(if sql[j] === "'" and sql[j+1] === "'") so both mechanisms work correctly, and
ensure you still push sql.slice(i, j) and set i = j after the loop.
- Around line 150-156: The current loop that treats every unquoted sql character
`?` (the block using variables sql, i, paramIdx, and result) incorrectly
rewrites PostgreSQL `?` operators; update the code to stop inferring
placeholders from raw SQL and instead accept explicit placeholder positions from
the template layer (e.g., add a parameter like placeholderIndexes or
placeholders to the function that produces this replacement). Use the provided
list of placeholder indices to increment paramIdx and replace only those
specific `?` locations with `$n` (leave other `?` operators untouched), and
update callers to pass the template-derived placeholder positions so the
function no longer scans and rewrites arbitrary `?` characters.
- Around line 85-109: The dollar-quote detection in the sql parsing logic
incorrectly treats `$tag$` inside identifiers as a string start; update the
guard around the branch that checks sql[i] === "$" (using variables sql, i, n,
tag, result) to first verify a token boundary on the left: if there is a
preceding character and it matches /[a-zA-Z0-9_$]/ then do not attempt
dollar-quote parsing (push sql[i], increment i, continue); otherwise proceed
with the existing tag-scanning logic so only true token-boundary dollar-quoted
strings are consumed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 763f3b8e-e977-4c7c-892a-314471c29041
📒 Files selected for processing (1)
src/connectors/postgresql.ts
| // Single-quoted string | ||
| if (sql[i] === "'") { | ||
| let j = i + 1; | ||
| while (j < n) { | ||
| if (sql[j] === "'") { | ||
| if (j + 1 < n && sql[j + 1] === "'") { | ||
| j += 2; // escaped quote '' | ||
| continue; | ||
| } | ||
| j++; | ||
| break; | ||
| } | ||
| j++; | ||
| } | ||
| result.push(sql.slice(i, j)); | ||
| i = j; | ||
| continue; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat src/connectors/postgresql.tsRepository: unjs/db0
Length of output: 4445
🌐 Web query:
PostgreSQL escape string constants E'...' backslash escapes syntax
💡 Result:
In PostgreSQL, escape string constants are written with an E (or e) prefix:
- Syntax:
E'...'(theEmust be immediately before the opening quote) [1]
Inside E'...', backslash introduces C-style escapes, e.g. [1][2]:
\\= backslash\'= single quote\n,\r,\t,\b,\f= newline, carriage return, tab, backspace, form feed\ooo= octal byte value (1–3 octal digits)\xhh= hex byte value (1–2 hex digits)\uXXXX/\UXXXXXXXX= Unicode code point (4 or 8 hex digits)- Any other character after
\is taken literally [2]
Why E matters: with the default setting standard_conforming_strings = on (default since PostgreSQL 9.1), backslashes are treated as ordinary characters in regular string literals ('...'), so backslash escapes are only recognized in E'...' strings [3].
Portable quote escaping (works in both forms): write a single quote as '' (two single quotes), e.g. 'Dianne''s horse' [1].
Sources:
[1] PostgreSQL 16 docs — Lexical Structure (string constants / escape string constants): https://www.postgresql.org/docs/16/sql-syntax-lexical.html
[2] PostgreSQL 12 docs — Lexical Structure (table of backslash escape sequences): https://www.postgresql.org/docs/12/sql-syntax-lexical.html
[3] PostgreSQL 14 docs — standard_conforming_strings behavior note: https://www.postgresql.org/docs/14/sql-syntax-lexical.html
Handle escape string constants E'...' with backslash escapes.
Lines 66–83 only recognize doubled '' as the escape mechanism. PostgreSQL escape strings prefixed with E/e also support \' (backslash-escaped quotes). Input like E'foo\'?bar' will cause the scanner to terminate at \' instead of the final unescaped ', leaving the ? inside the string to be incorrectly replaced. Update this branch to detect the E/e prefix and handle backslash escapes accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/connectors/postgresql.ts` around lines 66 - 83, Detect whether the
single-quoted literal is an escape string by checking the prefix (i > 0 and
sql[i-1] === 'E' or 'e') when you enter the sql[i] === "'" branch, and set a
flag (e.g., isEscapeString). In the inner loop that advances j, if
isEscapeString and sql[j] === "\\" then advance j by 2 to skip backslash-escaped
characters (ensuring j+1 < n) so sequences like \' and \\ are not treated as
terminators; retain the existing doubled-quote handling (if sql[j] === "'" and
sql[j+1] === "'") so both mechanisms work correctly, and ensure you still push
sql.slice(i, j) and set i = j after the loop.
| // Dollar-quoted string (PostgreSQL): $tag$...$tag$ or $$...$$ | ||
| if ( | ||
| sql[i] === "$" && | ||
| i + 1 < n && | ||
| (sql[i + 1] === "$" || /[a-zA-Z_]/.test(sql[i + 1])) | ||
| ) { | ||
| let j = i + 1; | ||
| if (sql[j] !== "$") { | ||
| while (j < n && /[a-zA-Z0-9_]/.test(sql[j])) j++; | ||
| if (j >= n || sql[j] !== "$") { | ||
| result.push(sql[i]); | ||
| i++; | ||
| continue; | ||
| } | ||
| } | ||
| const tag = sql.slice(i, j + 1); | ||
| const end = sql.indexOf(tag, j + 1); | ||
| if (end === -1) { | ||
| result.push(sql.slice(i)); | ||
| break; | ||
| } | ||
| result.push(sql.slice(i, end + tag.length)); | ||
| i = end + tag.length; | ||
| continue; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
head -120 src/connectors/postgresql.ts | tail -50Repository: unjs/db0
Length of output: 1186
🏁 Script executed:
fd -t f test -e ts -e js | grep -i postgres | head -20Repository: unjs/db0
Length of output: 154
🏁 Script executed:
rg -A 5 -B 5 "dollar|\\$tag|\\$\\$" --type ts --type jsRepository: unjs/db0
Length of output: 34
🏁 Script executed:
cat test/connectors/postgresql.test.tsRepository: unjs/db0
Length of output: 419
🏁 Script executed:
head -150 src/connectors/postgresql.ts | tail -80Repository: unjs/db0
Length of output: 1861
🌐 Web query:
PostgreSQL dollar-quoted strings token boundary rules $tag$ identifiers
💡 Result:
In PostgreSQL, a dollar-quoted string literal has the form:
$$ ... $$or$tag$ ... $tag$(postgresql.org)
What characters are allowed in $tag$
- The tag (if present) “follows the same rules as an unquoted identifier, except that it cannot contain a dollar sign.” (postgresql.org)
- Unquoted identifiers/keywords must start with a letter (including many non-ASCII letters) or
_; subsequent characters can include letters,_, digits, and$. (postgresql.org)
(So for dollar-quote tags: same, but no$anywhere inside the tag.) (postgresql.org) - Tags are case-sensitive:
$tag$...$tag$matches, but$TAG$...$tag$does not. (postgresql.org)
Token boundary / whitespace rule (common “gotcha”)
- If a dollar-quoted string follows a keyword or identifier, you must put whitespace before the
$...$, otherwise the opening delimiter is treated as part of the preceding identifier. (postgresql.org)
Example (needs a space):
SELECT $$x$$;is fineSELECT$$x$$;is not (becauseSELECT$$x$$is tokenized as one identifier-like token). (postgresql.org)
Citations:
- 1: https://www.postgresql.org/docs/current/sql-syntax-lexical.html
- 2: https://www.postgresql.org/docs/current/sql-syntax-lexical.html
- 3: https://www.postgresql.org/docs/current/sql-syntax-lexical.html
- 4: https://www.postgresql.org/docs/current/sql-syntax-lexical.html
- 5: https://www.postgresql.org/docs/current/sql-syntax-lexical.html
- 6: https://www.postgresql.org/docs/current/sql-syntax-lexical.html
- 7: https://www.postgresql.org/docs/current/sql-syntax-lexical.html
🏁 Script executed:
head -85 src/connectors/postgresql.ts | tail -40Repository: unjs/db0
Length of output: 1014
🏁 Script executed:
rg -B 20 "Dollar-quoted string" src/connectors/postgresql.ts | head -40Repository: unjs/db0
Length of output: 1013
🏁 Script executed:
rg -A 20 "normalizeParams" src/connectors/postgresql.ts | grep -A 20 "Parameter placeholder"Repository: unjs/db0
Length of output: 34
🏁 Script executed:
rg "normalizeParams" src/connectors/postgresql.ts -B 2 -A 2Repository: unjs/db0
Length of output: 327
Dollar-quote detection needs a token-boundary check.
Lines 86–100 will also match $tag$ inside an unquoted identifier, even though PostgreSQL only starts a dollar-quoted string at a token boundary. When a $ appears after an identifier character (letter, digit, underscore, or another $), it is part of the preceding identifier, not a string delimiter. In SELECT foo$tag$bar, ?, this code falls into lines 102–104 and the later placeholder is never rewritten.
Possible local guard
+ const prev = i === 0 ? "" : sql[i - 1];
if (
sql[i] === "$" &&
+ (i === 0 || !/[A-Za-z0-9_$]/.test(prev)) &&
i + 1 < n &&
(sql[i + 1] === "$" || /[a-zA-Z_]/.test(sql[i + 1]))
) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Dollar-quoted string (PostgreSQL): $tag$...$tag$ or $$...$$ | |
| if ( | |
| sql[i] === "$" && | |
| i + 1 < n && | |
| (sql[i + 1] === "$" || /[a-zA-Z_]/.test(sql[i + 1])) | |
| ) { | |
| let j = i + 1; | |
| if (sql[j] !== "$") { | |
| while (j < n && /[a-zA-Z0-9_]/.test(sql[j])) j++; | |
| if (j >= n || sql[j] !== "$") { | |
| result.push(sql[i]); | |
| i++; | |
| continue; | |
| } | |
| } | |
| const tag = sql.slice(i, j + 1); | |
| const end = sql.indexOf(tag, j + 1); | |
| if (end === -1) { | |
| result.push(sql.slice(i)); | |
| break; | |
| } | |
| result.push(sql.slice(i, end + tag.length)); | |
| i = end + tag.length; | |
| continue; | |
| } | |
| // Dollar-quoted string (PostgreSQL): $tag$...$tag$ or $$...$$ | |
| const prev = i === 0 ? "" : sql[i - 1]; | |
| if ( | |
| sql[i] === "$" && | |
| (i === 0 || !/[A-Za-z0-9_$]/.test(prev)) && | |
| i + 1 < n && | |
| (sql[i + 1] === "$" || /[a-zA-Z_]/.test(sql[i + 1])) | |
| ) { | |
| let j = i + 1; | |
| if (sql[j] !== "$") { | |
| while (j < n && /[a-zA-Z0-9_]/.test(sql[j])) j++; | |
| if (j >= n || sql[j] !== "$") { | |
| result.push(sql[i]); | |
| i++; | |
| continue; | |
| } | |
| } | |
| const tag = sql.slice(i, j + 1); | |
| const end = sql.indexOf(tag, j + 1); | |
| if (end === -1) { | |
| result.push(sql.slice(i)); | |
| break; | |
| } | |
| result.push(sql.slice(i, end + tag.length)); | |
| i = end + tag.length; | |
| continue; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/connectors/postgresql.ts` around lines 85 - 109, The dollar-quote
detection in the sql parsing logic incorrectly treats `$tag$` inside identifiers
as a string start; update the guard around the branch that checks sql[i] === "$"
(using variables sql, i, n, tag, result) to first verify a token boundary on the
left: if there is a preceding character and it matches /[a-zA-Z0-9_$]/ then do
not attempt dollar-quote parsing (push sql[i], increment i, continue); otherwise
proceed with the existing tag-scanning logic so only true token-boundary
dollar-quoted strings are consumed.
| // Parameter placeholder ? | ||
| if (sql[i] === "?") { | ||
| paramIdx++; | ||
| result.push(`$${paramIdx}`); | ||
| i++; | ||
| continue; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd src/connectors/postgresql.tsRepository: unjs/db0
Length of output: 34
🏁 Script executed:
cat -n src/connectors/postgresql.ts | head -200Repository: unjs/db0
Length of output: 5789
🏁 Script executed:
cat -n src/template.ts | head -50Repository: unjs/db0
Length of output: 1238
🏁 Script executed:
fd -e test -e spec | head -20Repository: unjs/db0
Length of output: 34
🏁 Script executed:
find . -name "*.test.*" -o -name "*.spec.*" | head -20Repository: unjs/db0
Length of output: 1264
🏁 Script executed:
rg "JSONB|json.*\?" -i --type ts | head -30Repository: unjs/db0
Length of output: 34
🏁 Script executed:
rg "doc\s*\?" --type tsRepository: unjs/db0
Length of output: 34
🏁 Script executed:
git log --oneline -n 20 -- src/connectors/postgresql.tsRepository: unjs/db0
Length of output: 133
🏁 Script executed:
cat -n test/connectors/postgresql.test.tsRepository: unjs/db0
Length of output: 524
🏁 Script executed:
git show 0243cbe --statRepository: unjs/db0
Length of output: 6136
🏁 Script executed:
git show 0243cbeRepository: unjs/db0
Length of output: 50367
🏁 Script executed:
rg "test.*normalizeParams|normalizeParams.*test" -A 5Repository: unjs/db0
Length of output: 34
🏁 Script executed:
rg "\?" test/ -t ts | head -30Repository: unjs/db0
Length of output: 510
🏁 Script executed:
cat -n test/connectors/_tests.ts | head -100Repository: unjs/db0
Length of output: 3119
🏁 Script executed:
rg "JSONB|json.*\?|doc.*\?" -i test/ --type tsRepository: unjs/db0
Length of output: 34
🏁 Script executed:
cat -n test/template.test.tsRepository: unjs/db0
Length of output: 966
🏁 Script executed:
rg "SELECT.*\?" test/ --type ts -B 2 -A 2Repository: unjs/db0
Length of output: 356
🏁 Script executed:
cat > /tmp/test_scenario.js << 'EOF'
// Simulate what happens with JSONB operator
// sqlTemplate function behavior
function sqlTemplate(strings, ...values) {
let result = strings[0] || "";
for (let i = 1; i < strings.length; i++) {
result += `?${strings[i] ?? ""}`;
}
return [result.trim(), values];
}
// Test case: SELECT doc ? ${key}
const [query, values] = sqlTemplate`SELECT doc ? ${key}`;
console.log("After template:", query);
console.log("Values:", values);
// Now normalizeParams would process this
// The query would be: "SELECT doc ? ?"
// But both ? marks would be rewritten to $1, $2
EOF
node /tmp/test_scenario.jsRepository: unjs/db0
Length of output: 814
🏁 Script executed:
cat > /tmp/test_normalize.js << 'EOF'
// Test the normalizeParams logic
function normalizeParams(sql) {
const result = [];
let i = 0;
let paramIdx = 0;
const n = sql.length;
while (i < n) {
// Single-quoted string
if (sql[i] === "'") {
let j = i + 1;
while (j < n) {
if (sql[j] === "'") {
if (j + 1 < n && sql[j + 1] === "'") {
j += 2;
continue;
}
j++;
break;
}
j++;
}
result.push(sql.slice(i, j));
i = j;
continue;
}
// Dollar-quoted string
if (
sql[i] === "$" &&
i + 1 < n &&
(sql[i + 1] === "$" || /[a-zA-Z_]/.test(sql[i + 1]))
) {
let j = i + 1;
if (sql[j] !== "$") {
while (j < n && /[a-zA-Z0-9_]/.test(sql[j])) j++;
if (j >= n || sql[j] !== "$") {
result.push(sql[i]);
i++;
continue;
}
}
const tag = sql.slice(i, j + 1);
const end = sql.indexOf(tag, j + 1);
if (end === -1) {
result.push(sql.slice(i));
break;
}
result.push(sql.slice(i, end + tag.length));
i = end + tag.length;
continue;
}
// Double-quoted identifier
if (sql[i] === '"') {
let j = i + 1;
while (j < n) {
if (sql[j] === '"') {
j++;
break;
}
j++;
}
result.push(sql.slice(i, j));
i = j;
continue;
}
// Line comment
if (i + 1 < n && sql[i] === "-" && sql[i + 1] === "-") {
const j = sql.indexOf("\n", i);
if (j === -1) {
result.push(sql.slice(i));
break;
}
result.push(sql.slice(i, j));
i = j;
continue;
}
// Block comment
if (i + 1 < n && sql[i] === "/" && sql[i + 1] === "*") {
const end = sql.indexOf("*/", i + 2);
if (end === -1) {
result.push(sql.slice(i));
break;
}
result.push(sql.slice(i, end + 2));
i = end + 2;
continue;
}
// Parameter placeholder ?
if (sql[i] === "?") {
paramIdx++;
result.push(`$${paramIdx}`);
i++;
continue;
}
result.push(sql[i]);
i++;
}
return result.join("");
}
// Test case from review: "SELECT doc ? ?"
const testInput = "SELECT doc ? ?";
const output = normalizeParams(testInput);
console.log("Input: ", testInput);
console.log("Output: ", output);
console.log("Result: Both ? converted to $1, $2 - INCORRECT!");
console.log("");
// Another test: What if user tries to escape the JSONB operator?
const escaped = `SELECT doc '?' = ${something}`;
console.log("If escaped in quotes: 'SELECT doc \\'?\\' = ?'");
console.log("Output: ", normalizeParams(`SELECT doc '?' = ?`));
EOF
node /tmp/test_normalize.jsRepository: unjs/db0
Length of output: 916
🏁 Script executed:
cat > /tmp/test_jsonb.js << 'EOF'
function normalizeParams(sql) {
const result = [];
let i = 0;
let paramIdx = 0;
const n = sql.length;
while (i < n) {
if (sql[i] === "'") {
let j = i + 1;
while (j < n) {
if (sql[j] === "'") {
if (j + 1 < n && sql[j + 1] === "'") {
j += 2;
continue;
}
j++;
break;
}
j++;
}
result.push(sql.slice(i, j));
i = j;
continue;
}
if (sql[i] === "?") {
paramIdx++;
result.push(`$${paramIdx}`);
i++;
continue;
}
result.push(sql[i]);
i++;
}
return result.join("");
}
// Scenario from review: JSONB operator in static SQL with placeholder
console.log("=== JSONB Operator Bug ===");
const jsonbQuery = "SELECT doc ? ?";
const result = normalizeParams(jsonbQuery);
console.log("Input: " + jsonbQuery);
console.log("Output: " + result);
console.log("Expected: SELECT doc ? $1");
console.log("Issue: Both ? marks converted, JSONB operator corrupted");
console.log("");
// Test: quoted operator doesn't have this problem
console.log("=== Workaround: Quote the operator ===");
const quotedQuery = "SELECT doc '?' = ?";
const quoted = normalizeParams(quotedQuery);
console.log("Input: " + quotedQuery);
console.log("Output: " + quoted);
console.log("");
// Test: array contains operator
const arrayOp = "SELECT arr ? ?";
const arrayResult = normalizeParams(arrayOp);
console.log("=== Array Operator ===");
console.log("Input: " + arrayOp);
console.log("Output: " + arrayResult);
EOF
node /tmp/test_jsonb.jsRepository: unjs/db0
Length of output: 381
🏁 Script executed:
rg "216|3682" --type md --type txtRepository: unjs/db0
Length of output: 34
🏁 Script executed:
git log --all --oneline | grep -i "jsonb\|operator\|question"Repository: unjs/db0
Length of output: 34
🏁 Script executed:
rg "E'|\\\\x" src/connectors/postgresql.tsRepository: unjs/db0
Length of output: 34
🏁 Script executed:
rg "Handles:|supports:|limitation" src/connectors/postgresql.ts -A 2Repository: unjs/db0
Length of output: 174
This still rewrites PostgreSQL ? operators as bind markers.
Lines 150–156 convert every unquoted ? to a positional parameter. When static SQL contains a PostgreSQL operator that uses ? (e.g., JSONB ?, array containment), it will be incorrectly rewritten. For example, sql\SELECT doc ? ${key}`producesSELECT doc ? ?beforenormalizeParams, and both question marks are converted to $1 $2`, corrupting the JSONB operator.
The quoted-string handling in this function cannot protect unquoted operators. Resolving this requires either (1) passing placeholder positions from the template layer instead of inferring them from raw SQL, (2) using a sentinel token that cannot collide with PostgreSQL syntax, or (3) requiring users to quote operators manually (e.g., '?').
🧰 Tools
🪛 Biome (2.4.6)
[error] 153-153: Expected a semicolon or an implicit semicolon after a statement, but found none
(parse)
[error] 153-153: Expected a semicolon or an implicit semicolon after a statement, but found none
(parse)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/connectors/postgresql.ts` around lines 150 - 156, The current loop that
treats every unquoted sql character `?` (the block using variables sql, i,
paramIdx, and result) incorrectly rewrites PostgreSQL `?` operators; update the
code to stop inferring placeholders from raw SQL and instead accept explicit
placeholder positions from the template layer (e.g., add a parameter like
placeholderIndexes or placeholders to the function that produces this
replacement). Use the provided list of placeholder indices to increment paramIdx
and replace only those specific `?` locations with `$n` (leave other `?`
operators untouched), and update callers to pass the template-derived
placeholder positions so the function no longer scans and rewrites arbitrary `?`
characters.
Fixes #216 | Fixes nuxt/content#3682
Problem
normalizeParamsusessql.replace(/\?/g, ...)that replaces ALL question marks — including literal ones inside quoted strings, comments, and identifiers. This causes data corruption when content containing?is stored via db0 PostgreSQL connector.Fix
Replace the regex with a proper character-by-character parser that tracks quoted/comment context and only replaces standalone
?outside these regions.Handles: single-quoted strings (with
''escapes), dollar-quoted strings, double-quoted identifiers, line/block comments.Summary by CodeRabbit