feat(loadbalancer): Add LoadBalancerType Client Side Weighted Round Robin#7407
feat(loadbalancer): Add LoadBalancerType Client Side Weighted Round Robin#7407altaiezior wants to merge 32 commits intoenvoyproxy:mainfrom
Conversation
9a814bd to
573a483
Compare
|
@jukie I have added the implementation and also tested it on my local setup PS: the repo is so easy to contribute everything just works with the docs given on the site :) |
|
Also I wanted to know should I include slow start in client wrr? So the thing is that I have submitted the proposal in grpc-xds grpc/proposal#498 and also in envoy I have got the proto updated. It is not implemented yet, but I am trying to pick it up this month if my time allows |
|
Also I am unsure of how to test this e2e, so I have just included an AI generated e2e test suite. The challenge here is that we need multiple replicas with each server respond with a specific header containing rps and cpu_utilisation and then the traffic is distributed by calculating the weight (rps / cpu) I don't know what the current e2e tests allow and if this type of test case is feasible to write |
5f3d64b to
ff2a353
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #7407 +/- ##
=======================================
Coverage 73.81% 73.81%
=======================================
Files 241 241
Lines 36608 36688 +80
=======================================
+ Hits 27021 27082 +61
- Misses 7681 7698 +17
- Partials 1906 1908 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I have started the implementation of slow_start_config and locality lb config with WRR as well, if possible I would also want to include them in the gatway implementation. |
|
We wait to add features here until they've made it into a full envoy release. The flow would be getting this lb support added for 1.7 and if your envoy changes get merged we can add that support to gateway in 1.8. Let's keep the scope of this PR to what's currently available and we can always include additional features in a follow-up. Are you able to make the suggested changes or can you join the contributors call next week to discuss? |
|
Sure, I just paused because the other changes were also approved, but I understand I will make the respective changes as suggested. Will try to complete them by today / tomorrow @jukie |
f926f90 to
c7ecca3
Compare
|
@jukie I have made the respective changes |
…Gateway CRDs, ensuring configurable parameters and validation rules are integrated. Includes e2e test for validation. Signed-off-by: anurag.ag <anuragagarwal561994@users.noreply.github.com>
Signed-off-by: anurag.ag <anuragagarwal561994@users.noreply.github.com>
Signed-off-by: anurag.ag <anuragagarwal561994@users.noreply.github.com>
…eway CRDs and related configurations. Update associated test data and documentation. Signed-off-by: anurag.ag <anuragagarwal561994@users.noreply.github.com>
…entSideWeightedRoundRobin configuration, update affected tests and CRDs. Signed-off-by: anurag.ag <anuragagarwal561994@users.noreply.github.com>
…cross Gateway CRDs, configuration files, and related tests. Adjust documentation to reflect percentage-based representation. Signed-off-by: anurag.ag <anuragagarwal561994@users.noreply.github.com>
a0471e0 to
dd5d285
Compare
|
Overall looks good, just a few more comments @anuragagarwal561994! Thanks for adding this and make sure to run Sorry for the delayed review on this. I'll prioritize helping you with this next week. |
…kend Utilization (ORCA) load balancing in Gateway CRDs and related docs. Refine header handling and metric formats. Signed-off-by: anurag.ag <anuragagarwal561994@users.noreply.github.com>
848b92f to
9a8f647
Compare
…ercent` across API, tests, and internal logic for clarity and precision. Adjust related documentation and validations. Signed-off-by: anurag.ag <6075379+altaiezior@users.noreply.github.com>
…n`. Adjust logic, tests, and documentation to highlight default value and ORCA header removal. Signed-off-by: anurag.ag <6075379+altaiezior@users.noreply.github.com>
621d26d to
9c9dc20
Compare
|
|
||
| gwAddr := kubernetes.GatewayAndRoutesMustBeAccepted(t, suite.Client, suite.TimeoutConfig, suite.ControllerName, kubernetes.NewGatewayRef(gwNN), &gwapiv1.HTTPRoute{}, false, routeNN) | ||
|
|
||
| t.Run("traffic should be split roughly evenly (defaults to equal weights without ORCA)", func(t *testing.T) { |
There was a problem hiding this comment.
can we test the feature in the e2e ? i.e. have the backend craft a endpoint-load-metrics response header and use that in LB decision making
There was a problem hiding this comment.
@arkodg this we have discussed in earlier comment that it will take some time to build and can be done seaprately because this requires changes with the current echo application too.
There was a problem hiding this comment.
thanks, imo, lets track this with a GH issue
some inspiration
cat >"${WORKDIR}/backend.py" <<'PY'
#!/usr/bin/env python3
import os
import time
from http.server import BaseHTTPRequestHandler, HTTPServer
from urllib.parse import urlparse, parse_qs
BACKEND_ID = os.environ.get("ORCA_ID", "backend")
DEFAULT_MEM = os.environ.get("ORCA_MEM_UTIL", "0.5")
DEFAULT_CPU = os.environ.get("ORCA_CPU_UTIL", "0.1")
DEFAULT_EPS = os.environ.get("ORCA_EPS", "0.0")
DEFAULT_RPS_FRACTIONAL = os.environ.get("ORCA_RPS_FRACTIONAL", "1.0")
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
qs = parse_qs(urlparse(self.path).query)
mem = qs.get("mem", [DEFAULT_MEM])[0]
cpu = qs.get("cpu", [DEFAULT_CPU])[0]
eps = qs.get("eps", [DEFAULT_EPS])[0]
rps_fractional = qs.get("rps_fractional", [DEFAULT_RPS_FRACTIONAL])[0]
# ORCA native HTTP text encoding.
orca_header = (
"TEXT "
f"cpu_utilization={cpu}, "
f"mem_utilization={mem}, "
f"eps={eps}, "
f"rps_fractional={rps_fractional}"
)
body = (
f"{BACKEND_ID} mem={mem} cpu={cpu} eps={eps} "
f"rps_fractional={rps_fractional}\n"
)
self.send_response(200)
self.send_header("content-type", "text/plain")
# This is the ORCA header Envoy consumes.
self.send_header("endpoint-load-metrics", orca_header)
self.end_headers()
self.wfile.write(body.encode("utf-8"))
def log_message(self, fmt, *args):
# Keep logs readable in the demo.
now = time.strftime("%H:%M:%S")
print(f"[{now}] {BACKEND_ID} " + fmt % args)
if __name__ == "__main__":
port = int(os.environ.get("PORT", "18080"))
print(f"starting {BACKEND_ID} on {port}")
HTTPServer(("127.0.0.1", port), Handler).serve_forever()
PY
chmod +x "${WORKDIR}/backend.py"
…ross API, internal logic, and templates. Adjust defaults, documentation, and validations to reflect behavior change. Signed-off-by: anurag.ag <6075379+altaiezior@users.noreply.github.com>
7d50f1b to
ac1d840
Compare
Signed-off-by: Arko Dasgupta <arkodg@users.noreply.github.com>
|
hey @altaiezior thanks for patiently addressing all the comments, the PR looks good ! |
Signed-off-by: anurag.ag <6075379+altaiezior@users.noreply.github.com>
f9d0cbd to
53f1473
Compare
|
@zirain I am not able to also run the e2e test case on my local because of the above issues, I have made the changes to include the e2e test case for wrr as well. But these seem to be an issue with the master branch itself, let me know once this is fixed so I can also pull the latest version and fix the things in my local too |
|
hey @altaiezior can you rebase again, looks like there's another conflict :( |
|
@altaiezior could you add a release note? |
| backendUtilization := policy.LoadBalancer.BackendUtilization | ||
| if backendUtilization != nil { | ||
| if backendUtilization.BlackoutPeriod != nil { | ||
| if d, err := time.ParseDuration(string(*backendUtilization.BlackoutPeriod)); err == nil { |
There was a problem hiding this comment.
Can you add full error handling for this and the other options?
| } | ||
| case args.loadBalancer.BackendUtilization != nil: | ||
| cswrr := &cswrrv3.ClientSideWeightedRoundRobin{} | ||
| if v := args.loadBalancer.BackendUtilization; v != nil { |
There was a problem hiding this comment.
v is already guaranteed to be non-nil due to the case check
| // Note: In the internal IR/XDS configuration this value is converted back to a | ||
| // floating point multiplier (value / 100.0). |
There was a problem hiding this comment.
nit: can probably remove this note
| // Defaults to false. | ||
| // +optional | ||
| // +kubebuilder:default=false | ||
| KeepResponseHeaders *bool `json:"keepResponseHeaders,omitempty"` |
There was a problem hiding this comment.
Is this implemented? I don't see handling for it. Not opposed to adding this but there's already fields for header management so it could be good enough to mention or include in the docs follow-up.
There was a problem hiding this comment.
hey i had requested for this, to avoid having the user to do this manually
|
@jukie I will be able to pick up these changes only by next week as I am travelling and been catching up with my work lately. |
What type of PR is this?
What this PR does / why we need it:
This PR provides addition of new load balancer type client side weighted round robin. This is a new load balancing extension introduced since envoy 1.32
https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/client_side_weighted_round_robin/v3/client_side_weighted_round_robin.proto
Which issue(s) this PR fixes:
Fixes #7305
Release Notes: Yes/No