fix: improve toml-test invalid compliance#2
Merged
dereuromark merged 4 commits intomasterfrom Mar 25, 2026
Merged
Conversation
- Add UTF-8 encoding validation upfront to reject invalid byte sequences - Add control character validation in basic and literal strings - Reject bare CR (without LF) in multiline strings - Enforce lowercase-only prefixes for non-decimal integers (0x, 0o, 0b) - Reject signed non-decimal integers (+0x, -0o, etc.) - Update tests to verify new validation rules These changes address several toml-test invalid test failures by enforcing stricter TOML spec compliance: - Invalid UTF-8 now rejected early - Control characters (< 0x20 except tab, or 0x7F) rejected in strings - Bare CR rejected (TOML requires CRLF or LF only) - Integer prefixes must be lowercase per TOML spec - Only decimal integers can have sign prefix
Update compliance numbers after fixing: - UTF-8 encoding validation - Control character validation in strings - Bare CR rejection in multiline strings - Integer prefix validation (lowercase only) - Signed non-decimal integer rejection TOML 1.1: 96.8% invalid compliance (up from 90.3%) TOML 1.0: 95.3% invalid compliance (up from 89.0%)
Add stricter validation for table semantics edge cases: 1. Cannot extend explicitly defined tables via dotted keys - [a.b.c] followed by [a] + b.c.t = ... is now rejected 2. Cannot extend array tables via dotted keys - [[a.b]] followed by [a] + b.y = ... is now rejected 3. Cannot explicitly define tables created by dotted keys - [a] + b.c = 1 followed by [a.b] is now rejected The fix introduces a new 'dotted' kind for implicit tables created via dotted key notation, distinguishing them from 'implicit' tables created by super-table headers which CAN be explicitly defined later. Add semantic test fixtures for all three patterns.
With the dotted key vs explicit table conflicts now properly rejected, invalid test compliance improves significantly: TOML 1.1: 98.5% invalid (up from 96.8%) TOML 1.0: 97.0% invalid (up from 95.3%)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Improves toml-test invalid compliance from 90.3% to 98.5% (TOML 1.1).
Lexer Fixes
Normalizer Fixes (Table Semantics)
[a.b.c]followed by[a]+b.c.t = ...is now rejected[[a.b]]followed by[a]+b.y = ...is now rejected[a]+b.c = 1followed by[a.b]is now rejectedCompliance Summary
Follows up on PR #1 which established the baseline compliance testing.