Skip to content

feat(format): add type-aware JSONL output using spanvalue extension points#581

Merged
apstndb merged 6 commits intomainfrom
issue-554-jsonl-format
Mar 25, 2026
Merged

feat(format): add type-aware JSONL output using spanvalue extension points#581
apstndb merged 6 commits intomainfrom
issue-554-jsonl-format

Conversation

@apstndb
Copy link
Copy Markdown
Owner

@apstndb apstndb commented Mar 25, 2026

Summary

Add type-aware JSONL output format (--format=jsonl) that produces proper JSON types for Spanner values. ARRAY and STRUCT are represented as JSON arrays/objects respectively, INT64 as numbers, BOOL as booleans, and NULL as null.

Built on top of #580 (basic JSONL with all-string values), this PR adds the type-aware value formatting layer.

Key Changes

  • decoder/jsonvalue.go: JSONFormatConfig() creates a spanvalue.FormatConfig using existing extension points (FormatComplexPlugins, FormatArray, FormatStruct) to produce valid JSON value strings. Uses structpb.Value.MarshalJSON() for most types; only INT64 and JSON columns need special handling.
  • format/cell_json.go: RawJSONCell lightweight marker type signals that cell text is valid JSON (no data carried, unlike the earlier JSONValueCell approach).
  • format/streaming_jsonl.go: writeValue() checks IsRawJSON(cell) to decide between WriteValue (raw JSON) and WriteToken(String(...)) (quoted string fallback for client-side statements).
  • format/mode.go: Add JSONValues ValueFormatMode for JSONL pipeline dispatch.
  • execute_sql.go: prepareFormatConfig returns decoder.JSONFormatConfig() for JSONValues mode. withRawJSONMarker applied when ValueFmtMode == JSONValues.
  • row_iter.go: withRawJSONMarker wraps cells with RawJSONCell (no GCV re-extraction, just type wrapping).

Development Insights

Discoveries

  • structpb.Value.MarshalJSON() produces correct JSON for all Spanner types except INT64 (StringValue→quoted) and JSON columns (StringValue→double-quoted). This eliminates the need for per-type handling.
  • spanvalue.FormatComplexPlugins can intercept ALL non-ARRAY/STRUCT types, not just PROTO/ENUM. This enables full JSON formatting via the existing extension point system.

CLAUDE.md Integration Candidates

  • None; the spanvalue extension pattern is documented in the code and test coverage.

Test Plan

  • make check passes
  • TestJSONFormatConfig: 21 test cases covering all Spanner types (NULL, BOOL, INT64, FLOAT64, STRING, ARRAY, STRUCT, JSON column, nested ARRAY, unnamed fields, NULL ARRAY, NaN/Infinity)
  • TestFormatJSONL: RawJSONCell with typed values and null, plus plain string fallback
  • TestValueFormatModeFor: JSONL returns JSONValues
  • TestJSONLFormatterLifecycle: write before init, double init idempotency

Fixes #554

apstndb and others added 5 commits March 25, 2026 18:37
Add --format=jsonl for structured, machine-readable output where each row
is a JSON object with column names as keys. JSONL is naturally safe for
complex Spanner types like ARRAY<STRUCT<...>> that contain commas, making
it ideal for downstream processing with jq, Go's json.Decoder, etc.

The implementation follows the existing StreamingFormatter pattern with
streaming enabled by default in AUTO mode.

Fixes #554

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the jsontext.Encoder to a struct field instead of creating a new
one per WriteRow call. The encoder is initialized once in the constructor
and reused for all rows, reducing allocations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use spanvalue's extension points (FormatComplexPlugins, FormatArray,
FormatStruct) to produce valid JSON value strings directly, instead of
converting to intermediate Go types. This approach:

- Produces proper JSON types: INT64→number, BOOL→boolean, NULL→null,
  ARRAY→JSON array, STRUCT→JSON object with field names
- Uses RawJSONCell marker (not JSONValueCell with Go values) to signal
  that cell text is valid JSON
- Adds JSONValues ValueFormatMode for the JSONL format pipeline
- Is structured for easy feedback to spanvalue package later

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflicts: keep type-aware JSONL (RawJSONCell, writeValue)
over string-only version from squash-merged PR #580.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Leverage structpb.Value.MarshalJSON() for most types instead of
handling each type individually. Only INT64 and JSON columns need
special handling (StringValue used as number/raw JSON respectively).

Also use ValueFmtMode instead of DisplayMode for withRawJSONMarker
decision to be consistent with the prepareFormatConfig dispatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the JSONL output format to be type-aware, ensuring that Spanner values are represented with their correct JSON data types rather than as generic strings. This change significantly improves the utility of JSONL output for downstream processing by providing properly structured and typed data. It integrates seamlessly with existing formatting infrastructure by utilizing spanvalue extension points and introduces a mechanism to mark pre-formatted JSON values.

Highlights

  • Type-aware JSONL Output: Implemented a new --format=jsonl option that produces proper JSON types for Spanner values, including arrays, structs, numbers, booleans, and nulls, instead of all-string representations.
  • Spanvalue Extension Points: Leveraged existing spanvalue extension points (FormatComplexPlugins, FormatArray, FormatStruct) to create a JSONFormatConfig that generates valid JSON value strings for Spanner types.
  • RawJSONCell Marker: Introduced a RawJSONCell lightweight marker type to signal that a cell's text is already valid JSON, allowing JSON-aware formatters to write it directly without additional quoting.
  • Efficient JSON Handling: Discovered that structpb.Value.MarshalJSON() correctly handles most Spanner types for JSON output, simplifying the implementation. Special handling was only required for INT64 (to avoid quoting) and JSON columns (to pass through raw JSON).
  • Comprehensive Testing: Added 21 test cases for TestJSONFormatConfig covering all Spanner types, including complex structures and edge cases like NaN/Infinity, and tests for RawJSONCell and the JSONL formatter lifecycle.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

Code Metrics Report

📊 View detailed coverage report (available for 7 days)

main (c217109) #581 (3bfccb4) +/-
Coverage 70.9% 70.9% -0.1%
Code to Test Ratio 1:1.3 1:1.3 +0.0
Test Execution Time 1m18s 1m9s -9s
Details
  |                     | main (c217109) | #581 (3bfccb4) |  +/-  |
  |---------------------|----------------|----------------|-------|
- | Coverage            |          70.9% |          70.9% | -0.1% |
  |   Files             |             76 |             78 |    +2 |
  |   Lines             |           7053 |           7094 |   +41 |
+ |   Covered           |           5003 |           5032 |   +29 |
+ | Code to Test Ratio  |          1:1.3 |          1:1.3 |  +0.0 |
  |   Code              |          16372 |          16470 |   +98 |
+ |   Test              |          21314 |          21619 |  +305 |
+ | Test Execution Time |          1m18s |           1m9s |   -9s |

Code coverage of files in pull request scope (82.6% → 80.8%)

Files Coverage +/- Status
internal/mycli/decoder/jsonvalue.go 95.0% +95.0% added
internal/mycli/execute_sql.go 80.0% -1.3% modified
internal/mycli/format/cell_json.go 66.6% +66.6% added
internal/mycli/format/mode.go 100.0% 0.0% modified
internal/mycli/format/streaming_jsonl.go 80.7% +2.5% modified
internal/mycli/row_iter.go 77.0% -10.0% modified

Reported by octocov

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new JSONValues formatting mode for the CLI, enabling Spanner data to be output as valid JSON. This includes a dedicated JSONFormatConfig for correct JSON serialization of various Spanner types, and a RawJSONCell type to signal pre-formatted JSON content. The JSONLFormatter is updated to handle these raw JSON cells appropriately. A review comment suggests an improvement in executeStreamingSQL to use qe.ValueFmtMode for consistency and maintainability when applying the RawJSONCell marker, aligning with the abstraction provided by ValueFormatMode.

The streaming path still used enums.DisplayModeJSONL directly instead
of qe.ValueFmtMode == format.JSONValues, inconsistent with the
buffered path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@apstndb
Copy link
Copy Markdown
Owner Author

apstndb commented Mar 25, 2026

@gemini-code-assist /request-summary

@gemini-code-assist
Copy link
Copy Markdown
Contributor

It looks like you tried to invoke a command, but /request-summary is not a recognized command. If you intended to request a pull request summary, please use the command /gemini summary.

@apstndb apstndb merged commit 6f62e81 into main Mar 25, 2026
2 checks passed
@apstndb apstndb deleted the issue-554-jsonl-format branch March 25, 2026 19:27
apstndb added a commit that referenced this pull request Mar 28, 2026
Document the JSONL output format added in #580/#581.

- system_variables.md: Add JSONL to CLI_FORMAT valid values, usage
  example, --format=jsonl flag note, and type-aware JSON description
- README.md: Add --format=jsonl as separate bullet (distinct from
  HTML/XML/CSV escaping story), update --format help text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add JSONL (JSON Lines) output format

1 participant