Skip to content

Add constructor-form discriminated unions#3911

Closed
CaliLuke wants to merge 20 commits intogoadesign:v3from
CaliLuke:feat/oneof-payload-result-unions
Closed

Add constructor-form discriminated unions#3911
CaliLuke wants to merge 20 commits intogoadesign:v3from
CaliLuke:feat/oneof-payload-result-unions

Conversation

@CaliLuke
Copy link
Contributor

Summary

This PR adds constructor-form discriminated unions via OneOf(...) so payloads and results can be defined directly from named branch types, for example:

Payload(OneOf(TextPayload, JSONPayload))
Result(OneOf(TextResult, JSONResult))

The implementation keeps the existing declaration-form OneOf("name", func() { ... }) behavior and extends Goa to support constructor-form unions across DSL evaluation, expression finalization, service codegen, HTTP transport generation, gRPC/protobuf generation, and OpenAPI generation.

What Changed

DSL evaluation

  • Added constructor-form OneOf(...) support while preserving declaration-form OneOf(...).
  • Added deferred named-type resolution for constructor-form string references so forward declarations and recursion work.
  • Propagated union discriminator key overrides from enclosing payload/result attributes.
  • Rejected malformed constructor unions and unsupported defaults at the DSL boundary.

Variant naming and wire contract

  • Added deterministic naming for constructor-form union branches and derived union types.
  • Centralized the stable naming/suffixing logic in expr so DSL finalization and transport/codegen paths share one implementation.
  • Kept discriminator values stable across branch reordering and duplicate-name cases covered by tests.
  • Added collision-safe name generation for Go and protobuf field names after normalization.

Service and transform codegen

  • Generated union types, kinds, constructors, and transforms for constructor-form unions.
  • Preserved nil-safety in transform paths.
  • Reused shared generated union types for repeated anonymous constructor unions in covered paths.
  • Added focused tests for order stability and collision handling.

Validation and defaults

  • Ensured validation code only validates the active union branch.
  • Rejected default values for union attributes intentionally rather than letting them leak into codegen.

HTTP transport codegen

  • Added HTTP encode/decode support for top-level and nested constructor unions.
  • Preserved required-body semantics with null / pointer-body handling.
  • Rejected constructor unions in unsupported HTTP positions such as multipart payloads.
  • Covered WebSocket streaming constructor unions in the test suite.

OpenAPI v3 / v2

  • Generated canonical discriminated-union schemas and wrapper schemas for OpenAPI v3.
  • Added canonical example generation for constructor unions, including nested unions and custom discriminator/value keys.
  • Chose to fail closed for ambiguous raw union examples instead of guessing a branch.
  • Kept OpenAPI v2 generation on a graceful fallback path rather than panicking.

gRPC / protobuf

  • Generated oneof-based protobuf message shapes for constructor unions.
  • Added validation for explicit rpc:tag collisions between union branches and sibling fields.
  • Added collision-safe protobuf oneof field naming after normalization.
  • Added unary and streaming proto compilation coverage for constructor unions.

Design Choices And Tradeoffs

  • Constructor-form unions require named branch types for stable discriminators. I chose not to infer stable public branch names from anonymous object/map/array shapes because that would make the wire contract brittle and order-sensitive.
  • Ambiguous raw OpenAPI examples are omitted rather than canonicalized to an arbitrary branch. That is a deliberate fail-closed choice: if the generator cannot determine the intended branch confidently, it should not invent one.
  • Union defaults are rejected intentionally. A default on a discriminated union is not just a value default, it is effectively a branch-selection default, so rejecting it at the DSL boundary is safer than allowing partially-specified semantics to leak into transports and generated code.
  • I centralized deterministic union naming logic in expr but kept transport-specific normalization local. That keeps the naming contract shared while still allowing Go and protobuf to apply their own identifier rules.

Test Coverage

This PR adds targeted coverage for:

  • constructor-form DSL evaluation, forward references, recursion, and invalid forms
  • stable discriminator naming and duplicate-name collision cases
  • union validation generation and nested branch validation
  • HTTP encode/decode and request/response body generation
  • OpenAPI schema/example generation, including nested/custom-key/ambiguous-example cases
  • gRPC unary and streaming generation, including .proto compilation and rpc:tag collision handling
  • order-stability tests for generated Go/HTTP/protobuf names

Verification

Ran:

  • go test ./...

Notes

This is a large PR and it will clearly go through review-driven changes and follow-up refinements. I’m open to the necessary modifications and any follow-up cleanup the maintainers think is the right shape for Goa.

@raphael
Copy link
Member

raphael commented Mar 15, 2026

Thanks for the PR! I have to admit I am a little hesitant to merge this as is :) It's a lot of changes touching a lot of pretty delicate places. I did a review and I can see a number of smells in the suggested changes, things like:

		"isPointerTypeRef": func(ref string) bool {
			return strings.HasPrefix(ref, "*")
		},

Checks shouldn't rely on generated syntax, they should use properties defined on the codegen data types. In this specific case slices and maps would return false and there are probably more subtle cases this is missing. Another one that caught my eye are the changes to the Attribute DSL function, looksLikeOneOfAttributeDeclarationSignature does not inspire confidence. The Attribute function is obviously quite central to everything in Goa.

In general I really appreciate contributions and want to accept them as much as possible, LLMs make submissions like this quite interesting to review :)

It might be helpful to step back and break down the problem into smaller chunks and create a proper proposal that we humans can actually understand and gauge? open to suggestions!

@raphael
Copy link
Member

raphael commented Mar 15, 2026

FWIW, here is what my LLM review returned:

Review of PR #3911 against v3 found five high-confidence issues.

Critical: constructor-union accessor de-duplication is not propagated into validation or transforms. codegen/service/service_data.go now generates stable de-duplicated branch names via uniqueUnionFieldNames(), and codegen/service/service_test.go explicitly expects a suffixed accessor like FooBar2. But codegen/validation.go still passes Goify(v.Name, true) into the union validation template, and codegen/go_transform.go does the same in transformUnion(). For collisions such as foo_bar vs foo-bar, the generated union exposes AsFooBar() and AsFooBar2(), while validation and transforms still call AsFooBar() for both branches. The practical result is that a valid second branch can fail validation as “missing value”, and union-to-union transforms can read or write the wrong branch.

Critical: grpc/codegen/protobuf.go does not reserve the full enclosing message namespace before allocating constructor-union branches. In the object emitter, usedTags is pre-seeded only from non-union siblings, and protoBufOneOfDef() auto-allocates constructor-union tags before it has seen later unions with explicit rpc:tags. The same function also derives union field names with uniqueProtoUnionFieldNames(), but that helper only de-duplicates within the union itself, not against sibling message fields. Those two gaps create two real .proto compile failures: an earlier constructor union can auto-claim field numbers that a later union explicitly reserves, and a normal field like foo_bar can collide with a union branch that also normalizes to foo_bar.

Important: union transforms are still index-based even though union identity and derived naming are now order-insensitive. expr/hasher.go sorts union members before hashing, and expr/union_naming.go sorts derived branch/type names to keep constructor unions stable across reordering. But codegen/go_transform.go transformUnion() still pairs srcUnion.Values[i] with tgtUnion.Values[i] and only checks IsCompatible() at that index. For object branches, that compatibility check is too weak to catch semantic swaps, so OneOf(Text, JSON) and OneOf(JSON, Text) can be treated as the same union for naming while still being cross-wired during transformation.

Important: the branch makes Swagger 2 cookie-auth output invalid. http/codegen/openapi/v2/builder.go copies s.In straight into the Swagger 2 securityDefinitions entry, and http/codegen/cookie_security_test.go now asserts def.In == "cookie" for the v2 document. That conflicts with both the Swagger 2 spec and the Swagger 2 API key docs, which restrict apiKey security schemes to in: query or in: header. Cookie auth is documented separately as an OpenAPI 3 feature in Swagger’s cookie auth docs. So this PR bakes an invalid Swagger 2 contract into both codegen and tests.

Important: the claimed Swagger 2 “graceful fallback” for constructor unions is not actually spec-valid, and the new test would not catch that. http/codegen/openapi/json_schema.go still appends constructor-union branches to valueSchema.AnyOf, but the Swagger 2 spec only carries allOf forward from JSON Schema in its Schema Object subset. Meanwhile http/codegen/openapi/v2/files_test.go uses validateSwagger(), which only unmarshals JSON and checks that swagger is non-empty, and the new constructor-union assertion only checks that the emitted body/response schema is an "object". That means the test can pass even when the generated Swagger 2 document still contains unsupported anyOf.

@CaliLuke
Copy link
Contributor Author

Thanks for the detailed review! I've gone ahead and addressed the specific code smells and structural issues raised in both your manual review and the LLM findings.

Summary of changes:

  • Removed Code Smells: Replaced the isPointerTypeRef string hack with a robust Ptr property on TypeData, and refactored the OneOf DSL to use clean type assertions instead of hacky signature checks.
  • Improved Transform/Validation Consistency: Shared the UniqueUnionFieldNames logic across service, validation, and transport layers. Union transforms now match branches by name rather than index, ensuring semantic correctness regardless of branch order.
  • Protobuf Collision Prevention: Updated .proto generation to pre-seed used tags and names from all sibling fields, preventing collisions between union branches and regular fields.
  • Swagger 2.0 Compliance: Fallback cookie auth to header and omitted anyOf from schemas to ensure strictly valid Swagger 2.0 documents.

I've updated the CONSTRUCTOR_UNION_FAILURE_INVENTORY.md and verified all tests pass (make test). I'd appreciate another look when you have a chance!

@raphael
Copy link
Member

raphael commented Mar 16, 2026

Thanks for addressing these. The changes are still massive and it's really difficult to review and make sure they don't introduce issues. There are a few places that seem to introduce new concepts that I can't easily tell whether they really are needed or just the LLM not understanding how the current code works / leveraging existing constructs properly. The whole union field name uniqueness set of changes and the changes to go_transform.go come to mind - adding a new parameter to that function is a big deal and the parameter should not be required - this code was carefully crafted :)
Again I want to support contributions but I also owe it to the community to maintain the quality of the package and this change is just too broad to be purely vibe-coded.

@CaliLuke
Copy link
Contributor Author

CaliLuke commented Mar 16, 2026

This contribution exists because I need it for my own app. It currently generates a few key api endpoints that are supporting real users and while clearly not perfect it's doing the job. I can keep it in my own fork if it's not up to the standard of the framework but I would rather be "in the mainstream" and benefit from the collective work of this community.

How do you suggest we get to the point where this can be merged? I'm open to all kinda of suggestions.

My ultimate goal is to benefit from support of discriminated unions in goa without maintaining that feature by myself, how that comes about is not important to me.

CaliLuke added 20 commits March 16, 2026 11:06
- Remove isPointerTypeRef string-hack in favor of TypeData.Ptr property.
- Refactor OneOf DSL to remove hacky signature signature-counting.
- Refactor transformUnion to match by branch name instead of index.
- Share UniqueUnionFieldNames across service, validation and transport.
- Pre-seed used tags/names in Protobuf generation to prevent collisions.
- Fallback cookie auth to header in Swagger 2.0 for spec compliance.
- Omit anyOf from Swagger 2.0 schemas for spec compliance.
- Update tests and inventory.
@CaliLuke CaliLuke force-pushed the feat/oneof-payload-result-unions branch from 46c39cf to 60d53eb Compare March 16, 2026 18:08
@raphael
Copy link
Member

raphael commented Mar 17, 2026

Makes sense, could we break this down into smaller chunks that can be reviewed and understood by humans (you and me)? There are some valuable fixes in there I saw (nil checks that are currently missing). We could start by merging these in. Then proceed until we get to the "meat" of the PR. The go transform changes need special attention in particular.

@CaliLuke
Copy link
Contributor Author

ok I'm going to workout a way to split the PR in more manageable pieces and then resubmit incrementally. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants