Skip to content

[improve][broker] PIP-464: Strict Avro schema validation for SchemaType.JSON#25362

Open
codelipenghui wants to merge 6 commits intoapache:masterfrom
codelipenghui:pip-464/strict-json-schema-avro-validation
Open

[improve][broker] PIP-464: Strict Avro schema validation for SchemaType.JSON#25362
codelipenghui wants to merge 6 commits intoapache:masterfrom
codelipenghui:pip-464/strict-json-schema-avro-validation

Conversation

@codelipenghui
Copy link
Contributor

@codelipenghui codelipenghui commented Mar 19, 2026

Motivation

The broker-side fallback logic for SchemaType.JSON schema validation is too lenient — it accepts any valid JSON as a schema definition, not just the legacy Jackson format from the Pulsar 2.0 era. This has caused real issues for non-Java clients (e.g., Rust) where users accidentally register JSON Schema Draft 2020-12 definitions:

  1. StructSchemaDataValidator accepts it (Avro parse fails → Jackson fallback succeeds)
  2. JsonSchemaCompatibilityCheck allows it (permissive mixed-format handling)
  3. But Java consumers fail with SchemaParseException: Type not supported: object because AvroBaseStructSchema requires Avro format with no fallback

The result is an asymmetry: broker accepts any JSON, consumer requires Avro. Schemas get stored that no Java consumer can read.

Changes

New broker configuration

  • schemaJsonAllowLegacyJacksonFormat (boolean, default false)

Modified components (6 source files)

  • ServiceConfiguration — new config field
  • StructSchemaDataValidator — gates Jackson JsonSchema fallback on config flag; when false, Avro SchemaParseException propagates directly
  • SchemaDataValidator — new validateSchemaData(data, allowLegacy) overload
  • SchemaRegistryServiceWithSchemaDataValidator — carries and passes config flag
  • JsonSchemaCompatibilityCheck — gates mixed-format compatibility on config flag; defense-in-depth rejection when existing schema is not valid Avro
  • SchemaRegistryService — wires config from PulsarService to validator and compatibility checker

Client-side (1 file)

  • ProducerImpl — deprecation comment on backward-compat code path (no behavioral change)

Tests (3 test files, +171 lines)

  • SchemaDataValidatorTest — 8 new tests: Avro accepted in both modes, Jackson rejected by default / accepted when enabled, JSON Schema Draft rejected / accepted, arbitrary JSON always rejected, AVRO type unaffected
  • JsonSchemaCompatibilityCheckTest — 4 new tests: legacy enabled allows mixed formats, default rejects mixed, Avro↔Avro unaffected, JSON Schema Draft rejected
  • SchemaRegistryServiceWithSchemaDataValidatorTest — 3 new tests: Jackson rejected by default, accepted when enabled, JSON Schema Draft rejected

Compatibility

This is a breaking change in default behavior. Users with legacy pre-2.1 Jackson-format schemas can restore the old behavior by setting schemaJsonAllowLegacyJacksonFormat=true in broker.conf.

Java producers are unaffected (JSONSchema.of() generates Avro format since 2.1). Non-Java clients that were incorrectly registering JSON Schema Draft definitions will get a clear error at registration time instead of a confusing consumer-side failure.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

🤖 Generated with Claude Code

…pe.JSON

Add `schemaJsonAllowLegacyJacksonFormat` broker config (default false) to
control whether the legacy Jackson JsonSchema format is accepted for
SchemaType.JSON schema definitions.

When disabled (default), StructSchemaDataValidator and
JsonSchemaCompatibilityCheck strictly require valid Avro schema format,
consistent with what the consumer side (AvroBaseStructSchema) already
requires. This fixes the asymmetry where the broker accepted any valid
JSON as a schema definition, but consumers failed with
SchemaParseException at read time.

When enabled, the pre-2.1 backward-compatible behavior is preserved.

Also deprecates (but does not remove) the ProducerImpl client-side code
that sends old Jackson format to brokers below protocol v13.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

@codelipenghui Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

…ation

Add AdminApiSchemaJsonValidationTest that tests the full server-side
flow using a real broker instance (MockedPulsarServiceBaseTest):

- Avro format JSON schema accepted (via Admin API and Producer API)
- JSON Schema Draft 2020-12 rejected by default
- Jackson JsonSchema format rejected by default
- Jackson and JSON Schema Draft accepted when legacy flag enabled
- SchemaType.AVRO unaffected by the JSON legacy config
- Schema compatibility rejects non-Avro after valid Avro schema exists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codelipenghui codelipenghui self-assigned this Mar 19, 2026
@codelipenghui codelipenghui modified the milestones: 5.0.0, 4.2.0 Mar 19, 2026
@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Mar 19, 2026
codelipenghui and others added 4 commits March 19, 2026 10:33
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avro's Schema.Parser throws AvroTypeException (not SchemaParseException)
for unresolvable type references like "type":"object". These two exception
types are siblings under AvroRuntimeException, so the catch block must
handle both to reach the Jackson fallback path when legacy mode is enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…changes

Avro 1.12.0 throws NullPointerException (not SchemaParseException)
when parsing non-Avro schemas like Jackson JsonSchema format. The
previous catch block only handled SchemaParseException and
AvroTypeException, so the legacy fallback was never reached.

Move the legacy Jackson fallback into the general catch(Exception)
block so it handles all exception types from the Avro parser.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant