-
Notifications
You must be signed in to change notification settings - Fork 258
A114: WRR Support for Custom Backend Metrics #536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sauravzg
wants to merge
5
commits into
grpc:master
Choose a base branch
from
sauravzg:wrr-custom-metrics
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
19efcc0
A114: WRR Support for Custom Backend Metrics
sauravzg 2ef0f64
Fixup: Address comments around explicit metric list and incorrect use…
sauravzg 86e403a
Fixup: Change fallback order and document edge case about handling <=…
sauravzg 7d728c4
Fixup: Rephrase some comments/descriptions
sauravzg 7aecc0e
Fixup: Support utilization.*
sauravzg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,163 @@ | ||
| # A114: WRR Support for Custom Backend Metrics | ||
|
|
||
| - Author(s): sauravzg | ||
| - Approver: markdroth | ||
| - Status: In review | ||
| - Implemented in: | ||
| - Last updated: 2026-01-30 | ||
| - Discussion at: | ||
|
|
||
| ## Abstract | ||
|
|
||
| This proposal updates the client-side `weighted_round_robin` (WRR) load balancing policy to support customizable utilization metrics. It adds a new configuration field `metric_names_for_computing_utilization` to the WRR LB policy config. This allows users to specify which backend metrics should be used to compute endpoint weights, enabling the use of custom metrics (via ORCA named metrics) instead of relying solely on the default `application_utilization` or `cpu_utilization`. | ||
|
|
||
| ## Background | ||
|
|
||
| The existing `weighted_round_robin` policy (defined in [gRFC A58][A58]) calculates endpoint weights based on standard metrics provided by the backend via ORCA (Open Request Cost Aggregation) load reports. Specifically, it uses `application_utilization` if available, and falls back to `cpu_utilization`. | ||
|
|
||
| However, services may want to drive load balancing decisions based on other resources, such as memory utilization, queue depth, or custom application-defined metrics. The [Custom Backend Metrics][A51] specification (ORCA) supports reporting arbitrary named metrics, and xDS has updated its WRR implementation to allow selecting these metrics for utilization calculation. | ||
|
|
||
| To support advanced load balancing scenarios, gRPC's WRR policy needs to support this flexibility. | ||
|
|
||
| ### Related Proposals | ||
|
|
||
| - [A58: Client-Side Weighted Round Robin LB Policy][A58] | ||
| - [A51: Custom Backend Metrics][A51] | ||
|
|
||
| ## Proposal | ||
|
|
||
| ### Service Config Update | ||
|
|
||
| We will add a new field `metric_names_for_computing_utilization` to the [`WeightedRoundRobinLbConfig`][WeightedRoundRobinLbConfigProto] message in the Service Config. | ||
|
|
||
| ```protobuf | ||
| message WeightedRoundRobinLbConfig { | ||
| // ... existing fields ... | ||
|
|
||
| // A list of metrics to be considered for overriding the default utilization | ||
| // computation behavior. | ||
| // | ||
| // For map fields in the ORCA proto, the string will be of the form | ||
| // "<map_field_name>.<map_key>". For example, the string "named_metrics.foo" | ||
| // will mean to look for the key "foo" in the ORCA "named_metrics" field. | ||
| // | ||
| // Utilization is computed by taking the max of the values of the | ||
| // metrics specified here. In the absence of this field or absence of | ||
| // valid values for the specified metrics, the policy will fall back to the | ||
| // existing default behavior. | ||
| repeated string metric_names_for_computing_utilization = 7; | ||
| } | ||
| ``` | ||
|
|
||
| ### Weight Calculation Logic | ||
|
|
||
| The weight calculation logic in the WRR policy will be updated to determine the `utilization` value as follows from the [`OrcaLoadReport`][OrcaLoadReportProto] | ||
|
|
||
| 1. **Check Custom Metrics**: If `metric_names_for_computing_utilization` is configured: | ||
| - Iterate through the specified metric names. | ||
| - **Resolve Metric Value**: | ||
| - If the name is of the format `field.key` (e.g., `named_metrics.foo`), look up the map field `field` and retrieve the value for `key`. | ||
| - If the name is a simple field name (e.g., `cpu_utilization`, `mem_utilization`), look up the `field`. | ||
markdroth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - **Only the following fields are supported:** | ||
| - `application_utilization` | ||
| - `cpu_utilization` | ||
sauravzg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - `mem_utilization` | ||
| - `named_metrics.*` | ||
sauravzg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - `utilization.*` | ||
| - **Compute Max**: Track the maximum value among all successfully resolved, positive ( > 0), finite metrics. | ||
| - If a max value is found, use it as the `utilization`. | ||
| 2. **Fallback**: If checking custom metrics did not determine a valid utilization value (or if `metric_names_for_computing_utilization` is not configured), fall back to the existing WRR utilization behavior defined in [gRFC A58][A58]. | ||
|
|
||
| #### Pseudocode | ||
|
|
||
| ``` | ||
| function GetUtilization(report, configured_metrics): | ||
| # 1. Check Custom Metrics | ||
| max_util = null | ||
|
|
||
| for metric_name in configured_metrics: | ||
| value = null | ||
|
|
||
| if metric_name contains ".": | ||
| # Map lookup (e.g. "named_metrics.foo" -> map="named_metrics", key="foo") | ||
| map_name, key = split_on_first_dot(metric_name) | ||
| if report has map field map_name: | ||
| value = report[map_name][key] | ||
| else: | ||
| # Root field lookup (e.g. "mem_utilization") via Reflection | ||
| if report has field metric_name: | ||
| value = report[metric_name] | ||
|
|
||
| # Only consider valid, non-nan, positive values | ||
| if value is not null and !is_nan(value) and value > 0: | ||
sauravzg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| if max_util is null or value > max_util: | ||
| max_util = value | ||
|
|
||
| if max_util is not null: | ||
| return max_util | ||
|
|
||
| # 2. Fallback to existing WRR behavior (A58) | ||
| if report.application_utilization > 0: | ||
| return report.application_utilization | ||
| return report.cpu_utilization | ||
| ``` | ||
|
|
||
| #### Implementation Notes | ||
|
|
||
| Since `OrcaLoadReport` is often exposed as a language-specific proxy object rather than a raw Protobuf message (e.g., in Java and C++), implementations are **not** expected to use Protobuf reflection to look up arbitrary fields. Instead, implementations should manually handle the supported metric list specified above. | ||
|
|
||
| As a consequence, support for any new standard fields added to `OrcaLoadReport` in the future will require explicit code changes in the implementation. This behavior is consistent with the [current Envoy implementation](https://github.com/envoyproxy/envoy/blob/35749578db375f5fe8ac5dd293cb7c4efb689611/source/common/orca/orca_load_metrics.cc#L47-L73). | ||
|
|
||
| #### Validity and Edge Cases | ||
|
|
||
| - **Nan Values**: As shown above, `NaN` values in reports are explicitly ignored to prevent undefined behavior in weight calculations. This is relevant because behavior of `max` on `NaN` in c++ is inconsistent based on order. | ||
| - **Zero Values**: Any value equal to `0.0` is treated as missing and ignored. This is consistent with [gRFC A58][A58] and the Envoy implementation, due to the lack of support for optional fields in the load report (where `0` cannot be distinguished from an unset field). | ||
| - **Negative Values**: Any value `< 0.0` is also treated as missing and **ignored silently**. This is consistent with the current [Envoy implementation](https://github.com/envoyproxy/envoy/blob/113f2c2d1015913256a2a8c6f9f97d0622623f45/source/extensions/load_balancing_policies/client_side_weighted_round_robin/client_side_weighted_round_robin_lb.cc#L214-L218). | ||
| - **Bound Checks**: The final selected `utilization` is subject to the standard validation logic from [gRFC A58][A58] (e.g., ensuring the value is positive) before being used to compute weight to avoid undefined behavior in weight calculations. | ||
| - **Normalization**: The WRR policy does not normalize reported metrics; the application is responsible for this. | ||
|
|
||
| The rest of the weight calculation formula (using QPS, EPS, and penalty) from [gRFC A58][A58] remains unchanged. | ||
|
|
||
| ### xDS Integration | ||
|
|
||
| We will support the `metric_names_for_computing_utilization` field in the xDS [`ClientSideWeightedRoundRobin`][ClientSideWeightedRoundRobinProto] policy. | ||
|
|
||
| When converting the xDS configuration to the gRPC Service Config `WeightedRoundRobinLbConfig`, the `metric_names_for_computing_utilization` field should be copied over directly. | ||
|
|
||
| ### Temporary environment variable protection | ||
|
|
||
| The features described in this proposal will be guarded by the environment variable `GRPC_EXPERIMENTAL_WRR_CUSTOM_METRICS`, which defaults to `false`. | ||
|
|
||
| ## Rationale | ||
|
|
||
| - **Consistency with Envoy**: This design mirrors the corresponding feature in Envoy, ensuring consistent behavior for xDS-controlled clients. | ||
| - **Flexibility**: Allows users to define load balancing weights based on the actual bottleneck resource of their application (e.g., memory-bound services). | ||
| - **Backward Compatibility**: The default behavior (using `application_utilization` or `cpu_utilization`) remains unchanged if the new field is not configured. | ||
|
|
||
| ## Implementation | ||
|
|
||
| This will be implemented in all languages C++, Java, and Go. | ||
|
|
||
| ### C++ | ||
|
|
||
| - **xDS Integration**: Update [`ClientSideWeightedRoundRobinLbPolicyConfigFactory`](https://github.com/grpc/grpc/blob/f7f13023412c1a589af7558eb0b9f8f664a76431/src/core/xds/grpc/xds_lb_policy_registry.cc#L68) to copy over the new field from the xDS configuration. | ||
| - **Config**: Update [`WeightedRoundRobinLbConfig`](https://github.com/grpc/grpc/blob/f7f13023412c1a589af7558eb0b9f8f664a76431/src/core/load_balancing/weighted_round_robin/weighted_round_robin.cc#L133) to include `metric_names_for_computing_utilization`. | ||
| - **Weight Calculation Logic**: Update callers of [`EndpointWeight::MaybeUpdateWeight`](https://github.com/grpc/grpc/blob/f7f13023412c1a589af7558eb0b9f8f664a76431/src/core/load_balancing/weighted_round_robin/weighted_round_robin.cc#L212) to implement the new utilisation selection logic. | ||
|
|
||
| ### Java | ||
|
|
||
| - **xDS Integration**: Update [`convertWeightedRoundRobinConfig`](https://github.com/grpc/grpc-java/blob/a9f73f4c0aa5617aa2b6ae6ac805693915899b6a/xds/src/main/java/io/grpc/xds/LoadBalancerConfigFactory.java#L286) to copy over the field from the xDS configuration. | ||
| - **Config**: Update [`WeightedRoundRobinLoadBalancerConfig`](https://github.com/grpc/grpc-java/blob/a9f73f4c0aa5617aa2b6ae6ac805693915899b6a/xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java#L716) to include `metric_names_for_computing_utilization`. | ||
| - **Weight Calculation Logic**: Update [`OrcaPerRequestUtil`](https://github.com/grpc/grpc-java/blob/a9f73f4c0aa5617aa2b6ae6ac805693915899b6a/xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java#L364) to implement the new utilisation selection logic. | ||
|
|
||
| ### Go | ||
|
|
||
| - **xDS Integration**: Update the configuration struct and conversion logic in [`converter.go`](https://github.com/grpc/grpc-go/blob/c05cfb3693bf18086810e671a6f9c05f296e0183/internal/xds/xdsclient/xdslbregistry/converter/converter.go#L241) to copy over the new field from the xDS configuration. | ||
| - **Config**: Update the configuration struct in [`config.go`](https://github.com/grpc/grpc-go/blob/7985bb44d26ecbeb8950996d028e38a0de08070b/balancer/weightedroundrobin/config.go#L26) to include `metric_names_for_computing_utilization`. | ||
| - **Weight Calculation Logic**: Update the weight update function in [`balancer.go`](https://github.com/grpc/grpc-go/blob/7985bb44d26ecbeb8950996d028e38a0de08070b/balancer/weightedroundrobin/balancer.go#L546) to implement the new utilisation selection logic. | ||
|
|
||
| [A58]: A58-client-side-weighted-round-robin-lb-policy.md | ||
| [A51]: A51-custom-backend-metrics.md | ||
| [ClientSideWeightedRoundRobinProto]: https://github.com/envoyproxy/envoy/blob/7242d5ad170523d7936849e596d261e3502c3886/api/envoy/extensions/load_balancing_policies/client_side_weighted_round_robin/v3/client_side_weighted_round_robin.proto#L86 | ||
| [WeightedRoundRobinLbConfigProto]: https://github.com/grpc/grpc-proto/blob/ec99424f3b7dae9db194f848b4cea52ecfae07af/grpc/service_config/service_config.proto#L205 | ||
| [OrcaLoadReportProto]: https://github.com/cncf/xds/blob/0feb69152e9f7e8a45c8a3cfe8c7dd93bca3512f/xds/data/orca/v3/orca_load_report.proto#L15 | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.