Skip to content

Migrate to Axum#4409

Draft
jcjones wants to merge 17 commits intomainfrom
jcj/4283-migrate-to-axum
Draft

Migrate to Axum#4409
jcjones wants to merge 17 commits intomainfrom
jcj/4283-migrate-to-axum

Conversation

@jcjones
Copy link
Copy Markdown
Contributor

@jcjones jcjones commented Mar 2, 2026

Resolves #4283

Authored by Claude with a bunch of hand-holding by me.

Review roadmap

Review Roadmap

Migrates from Trillium to Axum across the entire server stack (~4700 lines changed, 55 files, 13 commits). The first 9 commits perform the migration; the last 4 fix regressions and address review feedback.

Commits

Short ID Description
e902c8ae Core framework replacement — deps, core/src/http.rs, problem_details
3521a7ed Server infra — Stopper, CloneCounter, setup_server, zpages, Prometheus
764859ce Aggregator updates — metrics, problem_details
d2df7d12 Interop binaries and integration test proxy
daa2d6b0 Migrate tests and remove trillium deps
24a6e17d draft-ietf-ppm-dap-18 HTTP media types (cherry-picked from main)
10bbd1be Fix some tests
ccdf423a Lint fixes
ccc833cf Merge origin/main
68a1ef4e Fix race in CloneCounterObserver, stream upload body, wire LIFO queue, remove duplicate CORS
703c013c Review fixes — Error::Db downcast, proxy headers, visibility, error messages
23fc6634 Fix interop proxy duplicate content-type; add CancellationToken to LIFO queue
54809101 Proxy: append not insert for multi-valued headers; upload-resp TODO

Group 1: Dependencies (skim)

Trillium crates out, axum 0.8 / tower 0.5 / tower-http 0.6 in.

  • Cargo.toml (workspace + 6 crate tomls) — dependency swap

Group 2: Core utilities — read first

Small, self-contained changes the rest depends on.

  • core/src/http.rsextract_bearer_token() takes &HeaderMap; check_content_type() generic over MediaType
  • core/src/auth_tokens.rsWithAuthenticationToken implemented for HeaderMap
  • aggregator_core/src/lib.rsinstrumented() becomes Axum middleware; adds BYTES_HISTOGRAM_BOUNDARIES

Group 3: Error handling and response types

  • aggregator/src/aggregator/problem_details.rsProblemDocument implements IntoResponse; removed Trillium conn-ext traits
  • aggregator/src/aggregator/error.rsError::Db(User(...)) fixed to use downcast() so response body is preserved

Group 4: Main handler file — review in sections

aggregator/src/aggregator/http_handlers.rs (1440 lines changed) — read in order:

  1. Imports — axum/tower/opentelemetry
  2. Response wrappersEncodedBody<T>, EmptyBody implementing IntoResponse
  3. State & metrics — simplified AggregatorState; new HttpMetrics struct
  4. Builder & build()Router with routes, CORS layer, metrics; comment explains layer ordering
  5. HTTP metrics middleware — replaces trillium-opentelemetry::Metrics + StatusCounter
  6. Handlers — mechanical: conn → extractors, return Result<Response, Error>. Exception: upload() uses Body stream (not Bytes) for disconnect detection
  7. Helpersvalidate_content_type delegates to generic check_content_type::<M>()

Group 5: Server lifecycle

  • aggregator/src/binary_utils.rsStopper (wraps CancellationToken), CloneCounter/CloneCounterObserver (race-condition fix in 68a1ef4e), setup_server() uses axum::serve + graceful shutdown
  • aggregator/src/binaries/aggregator.rsRouter::nest() composition

Group 6: Aggregator API

  • aggregator_api/src/lib.rs — returns Router; metrics layer; removed ReplaceMimeTypes handler and unused State generic
  • aggregator_api/src/routes.rs — handlers converted to axum extractors

Group 7: Metrics

  • aggregator/src/metrics.rs — rewritten with axum; returns error response instead of panicking on encode failure
  • aggregator/src/metrics/tests/prometheus.rs — uses tower::ServiceExt::oneshot()

Group 8: LIFO queue middleware

  • aggregator/src/aggregator/queue.rs — rewritten as axum middleware; adds CancellationToken extension support to race acquisition against client disconnect

Group 9: Interop binaries and integration proxy

  • interop_binaries/src/commands/janus_interop_aggregator.rs — axum server; proxy uses Response::builder() + append (not default into_response())
  • interop_binaries/src/commands/janus_interop_client.rs — axum handlers
  • interop_binaries/src/commands/janus_interop_collector.rs — axum handlers
  • integration_tests/tests/integration/simulation/proxy.rs — proxy rewritten with axum + reqwest

Group 10: Test migrations (largest by line count, most mechanical)

Pattern: trillium_testing::TestConnRequest::builder() + tower::ServiceExt::oneshot(). Spot-check a few, then skim.

  • aggregator/src/aggregator/http_handlers/tests/report.rs — includes upload_client_early_disconnect and upload_client_http11_bulk tests
  • aggregator/src/aggregator/taskprov_tests.rs — large but mechanical
  • aggregator_api/src/tests.rs — largest test file change, same pattern
  • Remaining test files in http_handlers/tests/ — all follow the same pattern

What to look for

  • Middleware layer ordering — axum applies layers in reverse (last .layer() = outermost). See comment in http_handlers.rs build().
  • Upload body streamingupload() uses Body stream + into_data_stream().into_async_read(), not Bytes. This restores streaming and enables disconnect detection. Fixed in 68a1ef4e.
  • CloneCounterObserver race fixinto_future() now registers Notified before checking count (68a1ef4e). Subtle ordering dependency.
  • LIFO queue wiring — was silently discarded pre-68a1ef4e; now applied via sub-router with from_fn_with_state.
  • CORSCorsLayer::permissive() returns *; manual origin-echoing removed. Intentional.
  • Proxy response constructionResponse::builder() (no default headers) + append. Fixed across 703c013c23fc663454809101.
  • TODO(Implement draft-ietf-ppm-dap-17+ #4402)upload-resp media type should be upload-errors per DAP-18 (deferred).
  • TODO — aggregator API metrics wiring noted in 703c013c.
  • cached_resource.rs — switched from raw string comparison to MIME-aware check_content_type (23fc6634).

This commit covers:
  ✔ Chunk 1: Add axum deps to workspace, migrate core layer
  ✔ Chunk 2: Migrate aggregator_core instrumented() to axum middleware
and part of
  ◼ Chunk 3: Migrate problem_details and error handling to axum

Resolves #4283

Authored by Claude
@jcjones
Copy link
Copy Markdown
Contributor Author

jcjones commented Mar 2, 2026

Claude's plan file

Migrate Janus from Trillium to Axum (Issue #4283)

Context

Trillium is a less widely adopted web framework with a smaller ecosystem. Axum (built on tower and hyper) is the dominant async Rust web framework with broader ecosystem support, more middleware, and better long-term maintenance prospects. This migration replaces all Trillium usage across the Janus codebase.

Scope Assessment

42 Rust files reference trillium. 6 Cargo.toml files have trillium dependencies. The key areas are:

Area Files Lines Complexity
DAP HTTP handlers aggregator/src/aggregator/http_handlers.rs 1,237 High - custom Handler impls, error handling, CORS
DAP handler tests aggregator/src/aggregator/http_handlers/tests/*.rs ~7,500 Medium - mechanical but large
Aggregator API aggregator_api/src/{lib,routes,tests}.rs ~2,970 Medium - uses api() macro heavily
Request queue aggregator/src/aggregator/queue.rs 1,103 High - custom middleware
Server setup aggregator/src/binary_utils.rs 765 Medium - server config, zpages
Problem details aggregator/src/aggregator/problem_details.rs 339 Medium - Conn extension traits
Core HTTP utils core/src/http.rs, core/src/auth_tokens.rs ~360 Low
Instrumentation aggregator_core/src/lib.rs 203 Medium - #[handler] macro
Interop binaries interop_binaries/src/commands/*.rs ~400 Medium
Integration tests integration_tests/tests/integration/simulation/*.rs ~600 Medium - proxy/fault injection
Metrics (Prometheus) aggregator/src/metrics.rs + tests ~200 Low

Migration Strategy: Chunk-by-Chunk (Bottom-Up)

Each chunk is a standalone PR that compiles and passes tests.

Chunk 1: Dependencies & Core Layer

Goal: Add axum deps, migrate lowest-level shared utilities that don't break anything.

Files to modify:

  • Cargo.toml (workspace) — add axum, axum-extra, tower, tower-http, hyper; keep trillium deps (removed last)
  • core/Cargo.toml — add axum dep
  • core/src/http.rs — add axum-compatible extract_bearer_token (can support both temporarily)
  • core/src/auth_tokens.rs — add axum test helper alongside trillium one

Chunk 2: Aggregator Core Instrumentation

Goal: Migrate the instrumented() handler wrapper in aggregator_core.

Files to modify:

  • aggregator_core/Cargo.toml — swap trillium for axum
  • aggregator_core/src/lib.rs — rewrite instrumented() as axum middleware (tower Layer/Service or axum::middleware::from_fn)

Chunk 3: Problem Details & Error Handling

Goal: Migrate error types from impl Handler to impl IntoResponse.

Files to modify:

  • aggregator/src/aggregator/problem_details.rs — replace ProblemDetailsConnExt / RetryAfterConnExt traits on Conn with axum response types
  • aggregator/src/aggregator/error.rs — implement IntoResponse for Error
  • aggregator/src/aggregator/http_handlers.rs — migrate Error, EncodedBody, EmptyBody, StatusCounter from impl Handler to impl IntoResponse

Key mapping:

  • impl Handler for Errorimpl IntoResponse for Error
  • conn.with_problem_document(...) → return a ProblemDocument response type
  • conn.with_status(...) → return (StatusCode, body)
  • conn.insert_state(ErrorCode(...)) → response extensions or metric labels

Chunk 4: Request Queue Middleware

Goal: Migrate LIFORequestQueue from trillium Handler to tower middleware.

Files to modify:

  • aggregator/src/aggregator/queue.rs — rewrite LIFOQueueHandler<H> as a tower Layer/Service, or use axum::middleware::from_fn_with_state

Chunk 5: DAP HTTP Handlers (Main Aggregator)

Goal: Migrate all DAP endpoint handlers and the AggregatorHandlerBuilder.

Files to modify:

  • aggregator/src/aggregator/http_handlers.rs — the big one:
    • AggregatorHandlerBuilder::build() returns axum::Router instead of trillium tuple
    • Handler functions: hpke_config, upload, aggregation_jobs_put/post/get/delete, collection_jobs_*, aggregate_shares_* — change signatures from (conn: &mut Conn, State(x): State<T>) -> Result<(), Error> to axum extractors returning impl IntoResponse
    • CORS preflight handlers → use tower-http::cors::CorsLayer or keep manual handlers
    • StatusCounter → axum middleware (tower layer)
    • Metrics → replace trillium-opentelemetry with equivalent axum/tower OTEL middleware

Key mappings:

  • trillium_api::State<T>axum::extract::State<T> (axum uses a single state type, typically wrapped in an Arc<AppState>)
  • trillium_api::api(handler) → just use axum handler functions directly (axum has native JSON/extractor support)
  • trillium_router::Routeraxum::Router
  • trillium_router::RouterConnExt::route() → axum's MatchedPath extractor
  • conn.cancel_on_disconnect(future) → use axum::extract::ConnectInfo or tokio CancellationToken
  • conn.request_headers()axum::extract::HeaderMap
  • conn.set_body(bytes) → return (headers, StatusCode, bytes)
  • Path params like :task_idaxum::extract::Path(task_id)

Chunk 6: Aggregator API

Goal: Migrate the admin API.

Files to modify:

  • aggregator_api/Cargo.toml — swap trillium for axum
  • aggregator_api/src/lib.rsaggregator_api_handler() returns axum::Router; ReplaceMimeTypes becomes axum middleware; auth_check becomes an extractor or middleware
  • aggregator_api/src/routes.rs — migrate handler signatures

Chunk 7: Server Infrastructure

Goal: Migrate server startup, signal handling, zpages.

Files to modify:

  • aggregator/Cargo.toml — swap trillium deps for axum
  • aggregator/src/binary_utils.rs:
    • setup_server() — use axum::serve() with tokio::net::TcpListener instead of trillium_tokio::config()
    • zpages_server() / zpages_handler() — rewrite as axum router
    • Stopper → use tokio_util::sync::CancellationToken or axum's graceful shutdown
  • aggregator/src/binaries/aggregator.rs — update handler composition
  • aggregator/src/metrics.rs — replace trillium-prometheus with metrics-exporter-prometheus or equivalent

Chunk 8: Interop Binaries

Goal: Migrate test/interop servers.

Files to modify:

  • interop_binaries/Cargo.toml
  • interop_binaries/src/lib.rs
  • interop_binaries/src/commands/janus_interop_aggregator.rs
  • interop_binaries/src/commands/janus_interop_client.rs
  • interop_binaries/src/commands/janus_interop_collector.rs

Chunk 9: Test Infrastructure

Goal: Migrate all test utilities and test files.

Files to modify:

  • All http_handlers/tests/*.rs files (~7,500 lines)
  • aggregator_api/src/tests.rs (~2,200 lines)
  • aggregator/src/binary_utils.rs tests
  • integration_tests/tests/integration/simulation/proxy.rsFaultInjectorHandler, InspectHandler
  • Other integration test files

Key mapping:

  • trillium_testing::{get, post, put, delete}axum::test::TestClient or direct tower::ServiceExt::oneshot calls
  • test_conn.run_async(&handler).awaitapp.oneshot(request).await
  • test_conn.status()response.status()
  • take_response_body(&mut test_conn)response.into_body() with axum::body::to_bytes

Chunk 10: Cleanup

Goal: Remove all trillium dependencies.

  • Remove all trillium* entries from workspace Cargo.toml
  • Remove from all crate Cargo.toml files
  • Verify no remaining trillium references: grep -r trillium

Verification

After each chunk:

  1. cargo check --all-targets — must compile
  2. cargo test -p <affected-crate> — unit tests pass
  3. cargo clippy --all-targets — no warnings

After full migration:
4. cargo test --all — all tests pass
5. Integration tests pass
6. Manual smoke test of aggregator startup and basic DAP flow

Notes

  • Axum state model: Axum uses a single State<T> per router. We'll likely need an AppState struct bundling the datastore, config, aggregator, etc. — or use Extension for less-common state.
  • Streaming bodies: Trillium uses AsyncRead; axum uses axum::body::Body (which wraps http_body::Body). The upload handler reads streaming request bodies — this needs careful mapping.
  • cancel_on_disconnect: Trillium has conn.cancel_on_disconnect(); in axum, we can check request.extensions().get::<hyper::body::Incoming>() or use CancellationToken patterns.
  • OpenTelemetry metrics: trillium-opentelemetry provides route-aware metrics. The axum equivalent is typically done via tower-http or a custom middleware layer. Consider axum-otel-metrics or a bespoke tower layer.

jcjones added 12 commits March 2, 2026 17:12
… axum

- Add `Stopper` type wrapping `CancellationToken` to replace
  `trillium_tokio::Stopper` throughout the codebase
- Add `CloneCounterObserver`/`CloneCounter` to replace
  `trillium_tokio::CloneCounterObserver` for tracking spawned tasks
- Rewrite `setup_server` to use `axum::serve` with graceful shutdown
- Rewrite `zpages_server`/`zpages_handler` as axum Router
- Rewrite `prometheus_metrics_server` to use axum and
  `prometheus::TextEncoder` directly
- Migrate `binaries/aggregator.rs` to use axum Router and `Router::nest`
  for aggregator API path prefix
- Extract `Error::to_response(&self)` method so both `Error` and
  `ArcError` can implement `IntoResponse` without requiring `Clone`
- Migrate `#[cfg(feature = "test-util")]` modules in
  `aggregation_job_init.rs` and `aggregation_job_continue.rs` to use
  axum Router + `tower::ServiceExt::oneshot`
- Update all `Stopper` imports across job_driver, garbage_collector,
  aggregation_job_creator, and integration_tests
Migrate all test code from trillium_testing patterns to axum/tower:
- Replace `get/post/put/delete().run_async(&handler)` with
  `handler.oneshot(Request::builder()...)` using tower::ServiceExt
- Replace trillium assertion macros (assert_status!, assert_headers!,
  assert_response!, assert_body_contains!) with direct assertions on
  http::Response
- Convert handler types from `Box<dyn Handler>` to `Router`
- Update auth token test helpers for http::HeaderMap
- Migrate trillium_tokio server tests to axum::serve with graceful
  shutdown
- Update aggregator_api middleware and route patterns for axum

Remove all trillium dependencies except trillium-rustls and
trillium-tokio which are still needed transitively by divviup-client
in integration tests.
- Fix CloneCounterObserver::into_future race condition: register the
  Notified future before checking the count so a concurrent notify_one()
  cannot be lost between the check and the await.
- Stream upload body instead of buffering: replace to_bytes(body,
  usize::MAX) with into_data_stream().into_async_read(), restoring the
  streaming behavior from the old trillium handler and eliminating the
  unbounded memory DoS vector.
- Remove duplicate CORS handling: the CorsLayer already sets
  Access-Control-Allow-Origin: *, so the manual origin-echoing code in
  hpke_config() and upload() was redundant and contradictory.
- Wire LIFO queue into aggregation job routes: the queue middleware was
  constructed but silently discarded. Now applied via a sub-router with
  from_fn_with_state, restoring back-pressure and load-shedding for
  helper aggregators.
- Remove dead _aggregation_job_post variable.
- Fix Error::Db(User(...)) losing response body by using downcast() instead
  of downcast_ref() + status_code(), and remove now-unused status_code()
- Reuse state.http_client in ready_endpoint instead of allocating per request
- Use append instead of insert for proxy response headers to preserve
  multi-valued headers
- Clean up put_aggregation_job test helper's dead .header() builder call
- Add comment explaining axum layer ordering (last = outermost)
- Remove unused State parameter and generic from replace_mime_types middleware
- Return error response instead of panicking on metrics encode failure
- Log zpages server errors instead of silently discarding them
- Restrict AggregatorState and check_content_type_value visibility
- Improve content-type error messages to include expected type
- Update stale draft-07 doc reference to draft-18
- Add TODO for aggregator API metrics wiring
…to LIFO queue

The interop aggregator's proxy_handler was constructing responses via
`(status, body).into_response()`, which sets a default content-type of
application/octet-stream. Backend headers were then appended, resulting
in duplicate content-type headers. The client saw the default one first,
causing all Docker integration tests to fail with "unexpected content
type in server response". Fix by building an empty Response and using
insert (not append) for backend headers.

Also update cached_resource.rs to use MIME-aware content-type validation
via check_content_type instead of raw string comparison, and add
CancellationToken extension support to lifo_queue_middleware for future
client-disconnect detection.
Use `append` instead of `insert` when copying backend headers in the
interop proxy so multi-valued headers (e.g. Set-Cookie) are preserved.
This is safe now that the response is constructed via Response::builder()
which starts with no default headers.

Add inline TODO(#4402) noting upload-resp should be upload-errors per
DAP-18.
jcjones added 4 commits March 4, 2026 14:31
…aggregator API

Move `HttpMetrics`, `ErrorCode`, and `http_metrics_middleware` from the
aggregator's http_handlers into `janus_aggregator_core` so both the DAP
aggregator and the aggregator API can share the same metrics plumbing.

`HttpMetrics::new(meter, counter_name)` takes a configurable counter name
so each crate gets its own metric (`janus_aggregator_responses` vs
`janus_aggregator_api_responses`).

Wire up the middleware in `aggregator_api` by adding Extension and
middleware layers to the router, replacing the previously unused `_meter`
parameter.
CloneCounter::drop used notify_one(), which could miss a concurrent drop
if two counters dropped between the observer registering its Notified
future and awaiting it. Switch to notify_waiters() so the observer is
always woken regardless of how many drops occur in that window.
The Trillium-to-axum migration inadvertently changed CORS from echoing
the request Origin header to returning wildcard `*`. While both allow
cross-origin access, wildcard disallows credentialed requests. Use
`AllowOrigin::mirror_request()` to restore the original behavior.
@jcjones jcjones marked this pull request as ready for review March 4, 2026 23:56
@jcjones jcjones requested a review from a team as a code owner March 4, 2026 23:56
@jcjones jcjones marked this pull request as draft March 5, 2026 16:16
jcjones added a commit that referenced this pull request Mar 5, 2026
- This adds an `axum_hpke_config` handler function, wiring it into Axum.
- It leaves behind the existing `hpke_config` and `hpke_config_cors_preflight` methods to support other unit tests that want them for now, but we'll clean that up in part 6.
- Strips out part 1's demo test, and instead routes `"/hpke_config"` via `axum::routing::get(axum_hpke_config::<C>).layer(hpke_cors)`, roughly like I do in Relay.
- Pulled `impl Error` and `into_response_with_retry_after` out of #4409.

Partly authored by Claude
jcjones added a commit that referenced this pull request Mar 5, 2026
- This adds an `axum_hpke_config` handler function, wiring it into Axum.
- It leaves behind the existing `hpke_config` and `hpke_config_cors_preflight` methods to support other unit tests that want them for now, but we'll clean that up in part 6.
- Strips out part 1's demo test, and instead routes `"/hpke_config"` via `axum::routing::get(axum_hpke_config::<C>).layer(hpke_cors)`, roughly like I do in Relay.
- Pulled `impl Error` and `into_response_with_retry_after` out of #4409.

Partly authored by Claude
jcjones added a commit that referenced this pull request Mar 9, 2026
- This adds an `axum_hpke_config` handler function, wiring it into Axum.
- It leaves behind the existing `hpke_config` and `hpke_config_cors_preflight` methods to support other unit tests that want them for now, but we'll clean that up in part 6.
- Strips out part 1's demo test, and instead routes `"/hpke_config"` via `axum::routing::get(axum_hpke_config::<C>).layer(hpke_cors)`, roughly like I do in Relay.
- Pulled `impl Error` and `into_response_with_retry_after` out of #4409.

Partly authored by Claude

WIP
jcjones added a commit that referenced this pull request Mar 10, 2026
- This adds an `axum_hpke_config` handler function, wiring it into Axum.
- It leaves behind the existing `hpke_config` and `hpke_config_cors_preflight` methods to support other unit tests that want them for now, but we'll clean that up in part 6.
- Strips out part 1's demo test, and instead routes `"/hpke_config"` via `axum::routing::get(axum_hpke_config::<C>).layer(hpke_cors)`, roughly like I do in Relay.
- Pulled `impl Error` and `into_response_with_retry_after` out of #4409.

Partly authored by Claude
jcjones added a commit that referenced this pull request Mar 13, 2026
* Migrate to Axum [part 2]: Migrate hpke_config to Axum

- This adds an `axum_hpke_config` handler function, wiring it into Axum.
- It leaves behind the existing `hpke_config` and `hpke_config_cors_preflight` methods to support other unit tests that want them for now, but we'll clean that up in part 6.
- Strips out part 1's demo test, and instead routes `"/hpke_config"` via `axum::routing::get(axum_hpke_config::<C>).layer(hpke_cors)`, roughly like I do in Relay.
- Pulled `impl Error` and `into_response_with_retry_after` out of #4409.

Partly authored by Claude
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate to Axum from Trillium

1 participant