Distribution of Multi-Architecture Requests in Serverledge by FilippoMuschera · Pull Request #18 · serverledge-faas/serverledge

FilippoMuschera · 2026-03-25T16:21:09Z

Introducing native support for heterogeneous clusters (x86 and ARM) within Serverledge and a new intelligent load-balancing system. The load-balancing algorithms have been extended by integrating Multi-Armed Bandit (MAB) algorithms to dynamically optimize execution times based on the architectural affinities of the functions.

Serverledge runtimes have been adapted to support ARM and extended to provide execution environments for Python functions with ML libraries, as well as for the Java and Go languages.

…erhub

This reverts commit 876b23e.

This reverts commit 3e61d58.

…oud (tests might run slower)

This commit refactors the node registration process to include the architecture of the node. This is to prepare for function scheduling based on architecture compatibility. Includes: - Add Arch to NodeID and NodeRegistration - Include architecture in ETCD registration payload - Update runtime info with compatible architectures

Add image architecture discovery and caching in etcd to allow function to run on ARM or x86 nodes. Also add support for architectures to custom runtimes. Update go.mod and go.sum to add new dependencies.

Refactor dependencies in go.mod and go.sum. Indirect dependency for docker was a newer version where a function was now deprecated.

Adds tests for detecting image architectures via Docker. Also updates Makefile with unit test target.

Update etcd and test cache logic.

Improve test initialization and teardown in main_test.go. Also update Makefile and script to reliably manage etcd instance during test execution.

For the moment this will run only on the ArchitectureAware branch

Saved info about architectures supported by the runtime of each function in the struct Function as well, to exploit the saving of this data over etcd and to use it also for offloaded requests.

Right now it will not choose the best architecture, but will take it into consideration for compatibility: i.e.: it won't try to run an amd64-only container on an arm64 node.

…to a misplacing inside an inner if-block.

…tedArchs field of Function. Adds X86 and ARM constants for supported architectures. Normally this field is assigned by the node who registers the function the first time, and then pushed to etcd so that it can be retrieved by other nodes. Here in the test the function registration is skipped/done manually through etcd directly, and so it was necessary to manually include this field and its values.

The API test was failing intermittently due to insufficient sleep time on less powerful hardware. This tries to mitigate this issue.

The scheduler now takes into account the node architecture during offloading to ensure function compatibility. Requests are dropped if the current node's architecture is not supported by the function's runtime.

Explicitly setting ctx to nil. This helps with garbage collection in MAB mode where ctx might still be set but is not used.

MABs: UCB-1 and LinUCB

experiments: update files to work with new deployment system

Introduce a memory penalty factor to the LinUCB reward calculation. The new `MAB_LINUCB_LAMBDA` configuration key controls the weight of this penalty. The reward is now `reward = -log(durationMs) - (lambda * memPenalty(memUsage))`, where `memPenalty` is a function that grows non-linearly from 0 at 70% memory utilization to 1 at 100% utilization. This change aims to improve resource awareness of the LinUCB policy by penalizing architectures experiencing high memory pressure, promoting better load distribution and preventing resource exhaustion.

Introduce a new `Random` load balancing mode to the ArchitectureAwareBalancer. This change adds a `selectArchitectureRandom` function that randomly chooses between ARM and x86 architectures. This mode can be enabled via the `LB_MODE` configuration and serves as a testing and baseline option for experiments, complementing the existing MABs and Round Robin modes. It provides a simple, unbiased selection mechanism for architectures.

Temporarily disable the read lock on node.LocalResources in GetServerStatus to prevent a deadlock. The deadlock occurs when AcquireWarmContainer is called concurrently with GetServerStatus, as AcquireWarmContainer attempts to acquire a write lock while GetServerStatus holds a read lock, blocking new readers as per Go's RLock documentation. This is a temporary solution to unblock experiments. A more robust refactor of the resource access in GetServerStatus is needed to ensure proper synchronization without deadlocks.

This commit extends the node metric tracking to include CPU utilization. Previously, the load balancer only considered free memory when making scheduling decisions. This change introduces `FreeCPU` to the `NodeMetric` struct and updates the `NodeMetricCache.Update` method to accept and store this new metric.

Introduce a configurable validity period for the offloading cache in the scheduler. This allows to tune how long offloading decisions are cached for the `EdgeOnlypolicy`. The `OFFLOADING_CACHE_VALIDITY` constant is added to `internal/config/keys.go` to define the configuration key. The `internal/scheduling/offloading.go` file is updated to read this configuration value and apply it to the `CacheValidity` duration, replacing the previous hardcoded value of 15 seconds. The default value for this configuration is 60 seconds if not explicitly set, to match default janitor interval of Serverledge nodes

Move the acquisition of LocalResources RLock in `GetServerStatus` after fetching `WarmStatus` and Vivaldi coordinates to prevent a deadlock with `AcquireWarmContainer`. Update the `api.go` handler to include a `Serverledge-Timestamp` header in responses, allowing the load balancer (`lb.go`) to use the actual node timestamp for metric updates instead of `time.Now().Unix()`. This ensures more accurate latency measurements and better distributed system consistency.

FilippoMuschera and others added 30 commits October 6, 2025 17:50

just a test with a multi-arch image

35d57bc

PATCH: etcd image set to bitnamilegacy due to bitnami changes to dock…

ac74f01

…erhub

Updated Go version to 1.24

6ad54e3

Set dockerhub to "fmuschera" for multi-arch images

876b23e

Set dockerhub to "fmuschera" for multi-arch images

1e7638a

Revert "Set dockerhub to "fmuschera" for multi-arch images"

1dd53dd

This reverts commit 876b23e.

Changed etcd image (test for ARM test run)

2277c78

changed dockerhub source for multi-arch tests

d38d8d1

additional logging for test

3e61d58

Revert "additional logging for test"

9b943d1

This reverts commit 3e61d58.

typo fix

0a0cac8

Added 'make clean'

3b10925

Merge remote-tracking branch 'origin/main'

6edaa84

Temporarily reduced CPUDemand of tests to test it on small VM over Cl…

d90241e

…oud (tests might run slower)

Enhance custom runtime support

0e167e1

Add image architecture discovery and caching in etcd to allow function to run on ARM or x86 nodes. Also add support for architectures to custom runtimes. Update go.mod and go.sum to add new dependencies.

Update go.mod and go.sum dependencies

6c9b3ce

Refactor dependencies in go.mod and go.sum. Indirect dependency for docker was a newer version where a function was now deprecated.

Minor fixes

2b531e0

Add image architecture detection tests

556ccf0

Adds tests for detecting image architectures via Docker. Also updates Makefile with unit test target.

Improve docker image architecture test

091b5ed

Improved image architecture caching

66063c7

Update etcd and test cache logic.

Improve test setup/teardown for better stability

5819880

Improve test initialization and teardown in main_test.go. Also update Makefile and script to reliably manage etcd instance during test execution.

New workflow to run all tests upon push

14d9d27

For the moment this will run only on the ArchitectureAware branch

Improve architecture support

c87048e

Saved info about architectures supported by the runtime of each function in the struct Function as well, to exploit the saving of this data over etcd and to use it also for offloaded requests.

Make policies architecture-aware

5cbf85a

Right now it will not choose the best architecture, but will take it into consideration for compatibility: i.e.: it won't try to run an amd64-only container on an arm64 node.

fix: f.SupportedArchs assignment was getting potentially skipped due …

bf7fe05

…to a misplacing inside an inner if-block.

Add arch mismatch test

69e98b2

Increase sleep duration in API test

2c74e47

The API test was failing intermittently due to insufficient sleep time on less powerful hardware. This tries to mitigate this issue.

Consider node architecture when offloading

a2d358a

The scheduler now takes into account the node architecture during offloading to ensure function compatibility. Requests are dropped if the current node's architecture is not supported by the function's runtime.

FilippoMuschera and others added 30 commits February 6, 2026 16:18

mab: Refactor ctx handling in UCB1 bandit

7ad41d8

Explicitly setting ctx to nil. This helps with garbage collection in MAB mode where ctx might still be set but is not used.

examples/experiments: Adjust script to support 3-way experiments

0ddf70d

mab: Adjust UCB1 min sample count

a0a3240

Merge pull request #4 from FilippoMuschera/LinUCB_MAB

1684894

MABs: UCB-1 and LinUCB

experiments: update files to work with new deployment system

9ca1e1a

experiments: update files to work with new deployment system

baaad21

experiments: update files to work with new deployment system

experiments: update files to work with new deployment system

cd86292

experiments: update files to work with new deployment system

4120c09

Merge remote-tracking branch 'origin/main'

3c442c5

experiments: fix deadlock introduced in previous commit

f3a07e5

fix: mab: Use config constant for LinUCB alpha parameter

2e5f677

experiments: adjust locust and comparison script parameters

3281666

experiments: adjust function for experiments

f043cee

experiments: adjust function for experiments

64fe4e3

experiments: adjust experiments values

fdd1e0b

experiments: adjust experiments values

c530dcf

experiments: adjust experiments values

7f2c65d

experiments: adjust experiments values

d187933

experiments: refinements

f1ebe5c

experiments: prepare experiment with noisy neighbor

9e952d2

experiments: add janitor interval to worker config

f2d396f

lb: Increase default replica count for architecture-aware load balancer

ab53b05

mab: Adjust LinUCB memory penalty threshold

56b7485

experiments: Adjust load testing parameters

0f7372d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribution of Multi-Architecture Requests in Serverledge #18

Distribution of Multi-Architecture Requests in Serverledge #18
FilippoMuschera wants to merge 179 commits intoserverledge-faas:mainfrom
FilippoMuschera:main

FilippoMuschera commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FilippoMuschera commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant