Skip to content

feat(runtime-expressions): improve ABNF grammar clarity#454

Open
frankkilcommins wants to merge 4 commits intoOAI:v1.1-devfrom
frankkilcommins:abnf-grammer-improvements
Open

feat(runtime-expressions): improve ABNF grammar clarity#454
frankkilcommins wants to merge 4 commits intoOAI:v1.1-devfrom
frankkilcommins:abnf-grammer-improvements

Conversation

@frankkilcommins
Copy link
Copy Markdown
Collaborator

@frankkilcommins frankkilcommins commented Mar 23, 2026

$components now requires explicit component type (parameters/successActions/failureActions). Generic components.name pattern removed. Note: This was already semantically invalid per spec.

fixes: #424
fixes: #425
fixes: #426
fixes: #428
fixes: #437

resolves: #427

"firstName": "{$inputs.customer#/firstName}",
"lastName": "{$inputs.customer#/lastName}",
"dateOfBirth": "{$inputs.customer#/dateOfBirth}",
"postalCode": "{$inputs.customer#/postalCode}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to include more complex and diverse examples demonstrating the application of ABNF syntax.

component-name = identifier

; Identifier rule
identifier = 1*( ALPHA / DIGIT / "." / "-" / "_" )
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR's identifier = 1*( ALPHA / DIGIT / "." / "-" / "_" ) is used for all IDs (stepId, workflowId, sourceDescriptionName, component keys, input/output names). But the spec defines two
different patterns:

  • stepId, workflowId, sourceDescriptionName: SHOULD [A-Za-z0-9_-]+ (no dot)
  • Components keys: MUST ^[a-zA-Z0-9.-_]+$ (with dot)

A single shared identifier rule conflates these — it allows dots in step/workflow IDs where the spec says they shouldn't be, and it's only SHOULD-level enforcement anyway. Separate rules would
be more faithful to the spec's intent.

field-name = identifier

; Source descriptions expressions
source-reference = source-name "." reference-id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source-reference is too restrictive with identifier

The proposed grammar uses:

  source-reference = source-name "." reference-id                                             
  reference-id = identifier                      

The <reference> part can be an operationId from an OpenAPI description or a workflowId from
an Arazzo document. OpenAPI does not constrain operationId to any specific character set —
it's just a string. This means operationIds like get/pets, get pets, or create-user@v2 are
technically valid in OpenAPI but would be rejected by the identifier rule.

I'd suggest using a less restrictive rule for reference-id — something like 1*CHAR (any
character except { and }) — to avoid rejecting valid OpenAPI operationIds.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to:

  ; Source descriptions expressions
  source-reference = source-name "." reference-id
  source-name = identifier-strict
  reference-id = 1*CHAR
      ; operationIds have no character restrictions in OpenAPI/AsyncAPI
      ; Resolution priority defined in spec text: (1) operationId/workflowId, (2) field names

char0n added a commit to swaggerexpert/arazzo-runtime-expression that referenced this pull request Apr 2, 2026
Restructure the ABNF grammar to use explicit, typed reference rules in the
primary grammar instead of relying on secondary grammars with two-pass parsing.
This improves grammar clarity and aligns with the proposed spec changes in
OAI/Arazzo-Specification#454.

Key changes:
- Add $self expression support
- Add $inputs/$outputs JSON Pointer support (e.g., $inputs.customer#/firstName)
- Inline all secondary grammars into the primary grammar
- Extract shared identifier and identifier-strict rules
- Adapt json-pointer to exclude { and } from unescaped for unambiguous
  embedded expression parsing, fixing the body expression extract limitation
- Require explicit component types (parameters/successActions/failureActions)
- Update README with current grammar and examples

Resolves: OAI/Arazzo-Specification#424, OAI/Arazzo-Specification#425,
OAI/Arazzo-Specification#426, OAI/Arazzo-Specification#428

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@char0n
Copy link
Copy Markdown
Contributor

char0n commented Apr 2, 2026

Implementation Verification

I implemented the proposed grammar changes in my ABNF parser at swaggerexpert/arazzo-runtime-expression#116 to verify the grammar is correct and parseable. All 152 tests pass. Below are the findings from the implementation.

Issue: unescaped in json-pointer still includes { and }

The CHAR rule correctly excludes { (%x7B) and } (%x7D) for unambiguous embedded expression parsing, but the unescaped rule in json-pointer still uses %x30-7D, which includes both characters.

This means embedded expressions containing JSON pointers — like {$request.body#/status}, {$inputs.customer#/firstName}, or {$steps.foo.outputs.bar#/0/id} — cannot be reliably parsed. The json-pointer's unescaped will consume the closing }, making it impossible to determine where the expression ends.

Suggested fix — change unescaped from:

unescaped = %x00-2E / %x30-7D / %x7F-10FFFF

to:

unescaped = %x00-2E / %x30-7A / %x7C / %x7F-10FFFF
    ; %x2F ('/'), %x7E ('~'), %x7B ('{'), %x7D ('}') are excluded

This is a minor deviation from RFC 6901, but { and } in JSON Pointer reference tokens are extremely rare in practice, and without this fix the expression-string grammar cannot work correctly for any expression containing a json-pointer.

Issue: Single identifier rule conflates two different spec constraints

The proposed grammar uses a single identifier = 1*( ALPHA / DIGIT / "." / "-" / "_" ) rule for everything — step IDs, workflow IDs, source description names, component keys, input/output names, and field names. However, the spec defines two different patterns:

  • stepId, workflowId, sourceDescriptionName: SHOULD conform to [A-Za-z0-9_\-]+ (no dot)
  • Components keys: MUST match ^[a-zA-Z0-9\.\-_]+$ (with dot)

Using a single shared rule allows dots in step/workflow IDs where the spec says they shouldn't be. In my implementation, I split this into two rules:

identifier        = 1*(ALPHA / DIGIT / "." / "-" / "_")   ; for field names, component keys
identifier-strict = 1*(ALPHA / DIGIT / "_" / "-")          ; for step/workflow/source-description IDs

Issue: source-descriptions-reference (reference-id) is too restrictive

The proposed grammar constrains reference-id to identifier, but this value can be an operationId from an OpenAPI description. OpenAPI does not constrain operationId to any specific character set — it's just a string. OperationIds like get/pets, get pets, or create-user@v2 are technically valid in OpenAPI but would be rejected by the identifier rule.

In my implementation, I use 1*CHAR (any character except { and }) for this rule.

Issue: Simplified CHAR rule diverges from OpenAPI

The PR replaces the JSON string-based CHAR definition (from RFC 7159, with escape sequences) with a simpler character range: CHAR = %x00-7A / %x7C / %x7E-10FFFF. This changes the semantics — a bare \ becomes a valid character, and JSON escape sequences like \n, \uXXXX are no longer recognized.

OpenAPI's runtime expression ABNF uses the RFC 7159-based CHAR definition. Since Arazzo builds on top of OpenAPI and shares the runtime expression concept, simplifying CHAR introduces a subtle divergence. An expression valid in one spec could behave differently in the other. I'd recommend keeping the RFC 7159-based definition for interoperability.

Suggestion: name rule is not "legacy"

The PR labels the name rule as ; Legacy 'name' rule (retained for query/path references). This rule isn't legacy — it's the correct rule for query and path parameter names, which are user-defined and can contain any valid character. The comment could be misleading and suggest future removal. A more accurate comment would be something like ; Unconstrained name rule for query/path references.

Note: Example file version mismatch

The example fixes in examples/1.0.0/bnpl-arazzo.yaml (changing $inputs.customer.firstName to $inputs.customer#/firstName) apply 1.1.0 grammar semantics to a 1.0.0 example file. This could cause confusion about backward compatibility. Consider applying these fixes only to a 1.1.0 example, or noting that the 1.0.0 example has been updated to reflect the corrected grammar.

Note: Missing comma in example payload

In bnpl-arazzo.yaml, there's a pre-existing missing comma after the postalCode line in the JSON payload template, making it invalid JSON:

"postalCode": "{$inputs.customer#/postalCode}"
  "termsAndConditionsAccepted": true

Our ABNF grammar for reference

For reference, here is the complete ABNF grammar from my implementation that addresses the issues above:

; Arazzo runtime expression ABNF syntax
expression = (
    "$url" /
    "$method" /
    "$statusCode" /
    "$request." source /
    "$response." source /
    "$inputs." inputs-reference /
    "$outputs." outputs-reference /
    "$steps." steps-reference /
    "$workflows." workflows-reference /
    "$sourceDescriptions." source-reference /
    "$components." components-reference /
    "$self"
  )
; Request/Response sources
source                  = ( header-reference / query-reference / path-reference / body-reference )
header-reference        = "header." token
query-reference         = "query." name
path-reference          = "path." name
body-reference          = "body" ["#" json-pointer ]

; Input/Output references
inputs-reference        = inputs-name ["#" json-pointer]
inputs-name             = identifier
outputs-reference       = outputs-name ["#" json-pointer]
outputs-name            = identifier

; Steps expressions
steps-reference         = steps-id ".outputs." outputs-name ["#" json-pointer]
steps-id                = identifier-strict

; Workflows expressions
workflows-reference     = workflows-id "." workflows-field "." workflows-field-name ["#" json-pointer]
workflows-id            = identifier-strict
workflows-field         = "inputs" / "outputs"
workflows-field-name    = identifier

; Source descriptions expressions
source-reference                = source-descriptions-name "." source-descriptions-reference
source-descriptions-name        = identifier-strict
source-descriptions-reference   = 1*CHAR

; Components expressions
components-reference    = components-type "." components-name
components-type         = "parameters" / "successActions" / "failureActions"
components-name         = identifier

; Unconstrained name rule for query/path references and source description references
name                    = *( CHAR )

; Grammar for parsing template strings with embedded expressions
expression-string    = *( literal-char / embedded-expression )
embedded-expression  = "{" expression "}"
literal-char         = %x00-7A / %x7C / %x7E-10FFFF  ; anything except { (%x7B) and } (%x7D)

; JSON Pointer (RFC 6901, adapted)
; { (%x7B) and } (%x7D) are excluded from 'unescaped' for unambiguous embedded expression parsing
json-pointer     = *( "/" reference-token )
reference-token  = *( unescaped / escaped )
unescaped        = %x00-2E / %x30-7A / %x7C / %x7F-10FFFF
                 ; %x2F ('/'), %x7E ('~'), %x7B ('{'), %x7D ('}') are excluded
escaped          = "~" ( "0" / "1" )
                 ; representing '~' and '/', respectively

; https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.6
token          = 1*tchar
tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
               / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
               / DIGIT / ALPHA
               ; any VCHAR, except delimiters

; https://www.rfc-editor.org/rfc/rfc7159#section-7
CHAR = unescape /
    escape (
        %x22 /          ; "    quotation mark  U+0022
        %x5C /          ; \    reverse solidus U+005C
        %x2F /          ; /    solidus         U+002F
        %x62 /          ; b    backspace       U+0008
        %x66 /          ; f    form feed       U+000C
        %x6E /          ; n    line feed       U+000A
        %x72 /          ; r    carriage return U+000D
        %x74 /          ; t    tab             U+0009
        %x75 4HEXDIG )  ; uXXXX                U+XXXX
escape         = %x5C   ; \
unescape       = %x20-21 / %x23-5B / %x5D-7A / %x7C / %x7E-10FFFF
               ; %x7B ('{') and %x7D ('}') are excluded from 'unescape'

; Identifier rules
identifier        = 1*(ALPHA / DIGIT / "." / "-" / "_")
                  ; Alphanumeric with dots, hyphens, underscores
identifier-strict = 1*(ALPHA / DIGIT / "_" / "-")
                  ; Alphanumeric with hyphens, underscores (no dots)

; https://datatracker.ietf.org/doc/html/rfc5234#appendix-B.1
HEXDIG         =  DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
DIGIT          =  %x30-39   ; 0-9
ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z

@frankkilcommins frankkilcommins requested a review from char0n April 6, 2026 15:46
unescaped = %x00-2E / %x30-7D / %x7F-10FFFF
; %x2F ('/') and %x7E ('~') excluded from 'unescaped'
unescaped = %x00-2E / %x30-7A / %x7C / %x7E-10FFFF
; Excludes / (%x2F), { (%x7B), } (%x7D), and ~ (%x7E)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it make sense to explicitly define some other literals, like boolean, undefined, null?
They can be used in runtime expressions.

// Boolean literals for true and false
Boolean
  = "true" / "false"

// Null literal
Null
  = "null"

Undefined
  = "undefined"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The literals (boolean, null, undefined) are used in Criterion Object conditions alongside runtime expressions, not within runtime expressions themselves. These are already defined in the Criterion Object section as part of the condition syntax. The ABNF grammar here specifically defines the structure of runtime expressions ($inputs.foo, $statusCode, etc.), which reference values but don't contain literals. The condition evaluation syntax is separate from the runtime expression parsing syntax.

Do you have scenarios that you're thinking of? I'd tend to scope that under a different enhancement issue if warranted.

@char0n
Copy link
Copy Markdown
Contributor

char0n commented Apr 8, 2026

Hi @frankkilcommins,

Hi Frank,

Great progress on the grammar updates — the identifier-strict split, reference-id = 1*CHAR, and json-pointer unescaped fix all look good.

I noticed a couple of issues with the latest changes.

input-name = name breaks json-pointer parsing

Since name = *( CHAR ) and CHAR includes # (%x23, within the %x00-7A range), the greedy *( CHAR ) will consume the # as part of the name. This means the optional ["#" json-pointer] in rules like:

inputs-reference = input-name [ "#" json-pointer ]

would never match, because the parser consumes customer#/firstName entirely as the input-name, leaving nothing for the json-pointer.

For example, $inputs.customer#/firstName would parse with input-name = "customer#/firstName" and no json-pointer — which defeats the purpose of adding json-pointer support to inputs/outputs.

The previous version using identifier worked because identifier doesn't include #, so the parser correctly stops at # and matches the json-pointer. The same issue applies to output-name = name and field-name = name in workflows-reference.

The fix is either:

  1. Keep identifier for these rules (what I use in my implementation)
  2. Define a new rule like name-no-hash that is CHAR minus # — more permissive than identifier but still allows the json-pointer delimiter to work

Option 1 is simpler. Option 2 is more permissive but requires defining a new character class.

CHAR redefinition diverges from OpenAPI

The PR redefines CHAR as a simple character range (%x00-7A / %x7C / %x7E-10FFFF), dropping the RFC 7159 JSON string definition with escape sequences. OpenAPI's runtime expression ABNF uses the RFC 7159-based CHAR with unescape / escape rules. Since Arazzo builds on top of OpenAPI and shares the runtime expression concept, redefining CHAR introduces a divergence — a bare \ becomes valid in Arazzo but not in OpenAPI, and escape sequences like \n or \uXXXX are no longer recognized. I'd recommend keeping the RFC 7159-based definition for interoperability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants