A benchmark comparing floating-point performance and resource efficiency across different container runtimes (standard Linux, static binary, WebAssembly).
-
Source code:
-
mmb.c: Benchmark source. Performs matrix multiplication with integrated validity check. Outputs iterations and throughput in MFLOPS.
-
single_env_bench.py: A Python orchestrator that handles pod deployment, log parsing, and cgroup resource monitoring for a single execution environment.
-
metaorchestrator.py: The main benchmarking script. Coordinates a benchmark suite across multiple execution environments. It is configured exclusively by
bench_config.yaml
-
-
Dockerfile: Multi-stage build file defining three targets:
debian: Standard GCC build (Debian Slim)static: Statically linked binary (Scratch)wasm: WebAssembly build using WASI SDK (Scratch).
-
Makefile: Automates building the Docker images and importing them directly into the K3s containerd registry.
- OS: Linux (root required for /sys/fs/cgroup access).
- Cluster: K3s (Makefile targets k3s ctr).
- Tools: Docker, kubectl, Python 3 (pyyaml required).
- Wasm: KWasm Operator.
- Build and Import Images: Use the Makefile to build all Docker container images and import them into the K3s registry.
make image-buildflowchart TD
Root[make image-build]
subgraph DebianFlow [Debian]
direction TB
DebBuild[build standard image]
end
subgraph StaticFlow [Static]
direction TB
StatBuild[build static binary image]
end
subgraph WasmFlow [WASM]
direction TB
WasmBuild[build WASM image]
end
Import[import to k3s containerd]
%% Connections
Root --> DebianFlow
Root --> StaticFlow
Root --> WasmFlow
%% Converge to single import step
DebBuild --> Import
StatBuild --> Import
WasmBuild --> Import
Creates mmb-debian:latest, mmb-static:latest, mmb-wasm:latest
- Clean Up:
- To remove images from Docker/K3S:
make reset - To also delete build tools autogenerated by Docker:
make hard-reset
sudo python3 metaorchestrator.py
Note: Root privileges (sudo) are required to read from /sys/fs/cgroup.
The metaorchestrator.py script automates the benchmarking process across many environments by executing single_env_bench.py for each image in bench_config.yaml, using the shared arguments defined below them.
These are the pre-defined parameters of the benchmark suite:
environments:
- image: mmb-debian:latest
runtime_class: default
- image: mmb-static:latest
runtime_class: default
- image: mmb-wasm:latest
runtime_class: wasmtime
- image: mmb-wasm:latest
runtime_class: wasmedge
- image: mmb-wasm:latest
runtime_class: wasmer
- image: mmb-wasm:latest
runtime_class: wamr
shared_args:
sizes: [256, 512]
trials: 3
duration: 30
warmup: true
interval: 1.0
namespace: default
results_subfolder_name: initial_bm
---
config:
look: classic
theme: redux-dark
---
flowchart TB
subgraph DataSources["Data Sources"]
PodLogs["mmb.c output
(via kubectl logs)"]
Cgroups["cgroup filesystem
(/sys/fs/cgroup)"]
end
subgraph Host[" "]
direction BT
Orchestrator[("Orchestrator")]
DataSources
end
PodLogs -- Iterations, MFLOPS, Validity --> Orchestrator
Cgroups -- CPU Usage, Memory, Throttling --> Orchestrator
The results are saved in the /results directory, with one .json file for each execution environment tested. Each entry (<size>_<mode>) contains:
Lifecycle events, recorded as Unix timestamps:
start: Deployment trigger timerunning_time: Pod reachedRunningstatebench_start: The moment the C application printsBENCH_START(after optional warmup).end: The moment the C application printsBENCH_END.
---
config:
theme: mc
---
gantt
title Trial execution flow
dateFormat X
axisFormat %
section Orchestrator
apply & wait :orch1, 0, 2
logs streaming :orch3, after orch1, 7s
cgroup monitor thread active :crit, orch4, after ben2, 4s
save / cleanup :orch5, after ben4, 1s
section Pod Status
ContainerCreating :pod1, 0, 2
Running :pod2, after pod1, 7s
Terminating :pod3, after pod2, 1s
section App (mmb.c)
init :ben1, after pod1, 1s
warmup (5s) :ben2, after ben1, 1s
workload loop :crit, ben3, after ben2, 4s
validation & exit :ben4, after ben3, 1s
section Timestamps
start :milestone, m0, 0, 0s
running_time :milestone, m1, after pod1, 0s
bench_start :milestone, m2, after ben2, 0s
end :milestone, m3, after ben3, 0s
Metrics extracted directly from the C application's standard output:
iterations: Total matrix multiplications completed.throughput_mflops: Speed in MFLOPS.valid:trueif calculation check passed.
A list of raw snapshots captured from /sys/fs/cgroup at the defined --interval. Each sample object contains:
timestamp: Time of the snapshot.usage_usec: Total CPU time consumed in microseconds.user_usec: CPU time spent in user space.system_usec: CPU time spent in kernel space.nr_throttled: Cumulative count of throttle events.mem_bytes: Total memory usage (RSS + Cache).rss_bytes: Resident Set Size.
Derived from processing the raw cgroup samples:
cold_start_time: Startup latency (running_time-start).avg_cpu_cores: Average CPU usage normalized to cores (e.g., 1.0 = 1.0 cores used).peak_mem_bytes: The highest memory usage observed during the trial.avg_mem_bytes: The average memory usage throughout the execution.throttled_events: The total number of times the CPU was throttled by the cgroup controller.