[improve][broker] PIP-464: Strict Avro schema validation for SchemaType.JSON#25362
Open
codelipenghui wants to merge 6 commits intoapache:masterfrom
Open
[improve][broker] PIP-464: Strict Avro schema validation for SchemaType.JSON#25362codelipenghui wants to merge 6 commits intoapache:masterfrom
codelipenghui wants to merge 6 commits intoapache:masterfrom
Conversation
…pe.JSON Add `schemaJsonAllowLegacyJacksonFormat` broker config (default false) to control whether the legacy Jackson JsonSchema format is accepted for SchemaType.JSON schema definitions. When disabled (default), StructSchemaDataValidator and JsonSchemaCompatibilityCheck strictly require valid Avro schema format, consistent with what the consumer side (AvroBaseStructSchema) already requires. This fixes the asymmetry where the broker accepted any valid JSON as a schema definition, but consumers failed with SchemaParseException at read time. When enabled, the pre-2.1 backward-compatible behavior is preserved. Also deprecates (but does not remove) the ProducerImpl client-side code that sends old Jackson format to brokers below protocol v13. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codelipenghui Please add the following content to your PR description and select a checkbox: |
…ation Add AdminApiSchemaJsonValidationTest that tests the full server-side flow using a real broker instance (MockedPulsarServiceBaseTest): - Avro format JSON schema accepted (via Admin API and Producer API) - JSON Schema Draft 2020-12 rejected by default - Jackson JsonSchema format rejected by default - Jackson and JSON Schema Draft accepted when legacy flag enabled - SchemaType.AVRO unaffected by the JSON legacy config - Schema compatibility rejects non-Avro after valid Avro schema exists Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avro's Schema.Parser throws AvroTypeException (not SchemaParseException) for unresolvable type references like "type":"object". These two exception types are siblings under AvroRuntimeException, so the catch block must handle both to reach the Jackson fallback path when legacy mode is enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…changes Avro 1.12.0 throws NullPointerException (not SchemaParseException) when parsing non-Avro schemas like Jackson JsonSchema format. The previous catch block only handled SchemaParseException and AvroTypeException, so the legacy fallback was never reached. Move the legacy Jackson fallback into the general catch(Exception) block so it handles all exception types from the Avro parser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The broker-side fallback logic for
SchemaType.JSONschema validation is too lenient — it accepts any valid JSON as a schema definition, not just the legacy Jackson format from the Pulsar 2.0 era. This has caused real issues for non-Java clients (e.g., Rust) where users accidentally register JSON Schema Draft 2020-12 definitions:StructSchemaDataValidatoraccepts it (Avro parse fails → Jackson fallback succeeds)JsonSchemaCompatibilityCheckallows it (permissive mixed-format handling)SchemaParseException: Type not supported: objectbecauseAvroBaseStructSchemarequires Avro format with no fallbackThe result is an asymmetry: broker accepts any JSON, consumer requires Avro. Schemas get stored that no Java consumer can read.
Changes
New broker configuration
schemaJsonAllowLegacyJacksonFormat(boolean, defaultfalse)Modified components (6 source files)
ServiceConfiguration— new config fieldStructSchemaDataValidator— gates Jackson JsonSchema fallback on config flag; whenfalse, AvroSchemaParseExceptionpropagates directlySchemaDataValidator— newvalidateSchemaData(data, allowLegacy)overloadSchemaRegistryServiceWithSchemaDataValidator— carries and passes config flagJsonSchemaCompatibilityCheck— gates mixed-format compatibility on config flag; defense-in-depth rejection when existing schema is not valid AvroSchemaRegistryService— wires config fromPulsarServiceto validator and compatibility checkerClient-side (1 file)
ProducerImpl— deprecation comment on backward-compat code path (no behavioral change)Tests (3 test files, +171 lines)
SchemaDataValidatorTest— 8 new tests: Avro accepted in both modes, Jackson rejected by default / accepted when enabled, JSON Schema Draft rejected / accepted, arbitrary JSON always rejected, AVRO type unaffectedJsonSchemaCompatibilityCheckTest— 4 new tests: legacy enabled allows mixed formats, default rejects mixed, Avro↔Avro unaffected, JSON Schema Draft rejectedSchemaRegistryServiceWithSchemaDataValidatorTest— 3 new tests: Jackson rejected by default, accepted when enabled, JSON Schema Draft rejectedCompatibility
This is a breaking change in default behavior. Users with legacy pre-2.1 Jackson-format schemas can restore the old behavior by setting
schemaJsonAllowLegacyJacksonFormat=trueinbroker.conf.Java producers are unaffected (
JSONSchema.of()generates Avro format since 2.1). Non-Java clients that were incorrectly registering JSON Schema Draft definitions will get a clear error at registration time instead of a confusing consumer-side failure.Documentation
docdoc-requireddoc-not-neededdoc-complete🤖 Generated with Claude Code