DEMO VIDEO: link
This framework automates the complete benchmarking lifecycle for containerized AI services:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Recipe │───▶│ Deploy │───▶│ Load Gen │───▶│ Analyze │───▶│ Report │
│ (YAML) │ │ (Slurm) │ │ (Clients) │ │ (Metrics) │ │ (MD/JSON) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
- One-command execution from declarative YAML recipes
- Automated analysis including saturation detection and bottleneck attribution
- Full reproducibility with embedded metadata and rerun support
- Dual interface with CLI and web UI feature parity
- Real-time monitoring via Prometheus/Grafana integration
- Python 3.10+
- SSH access to MeluXina (or compatible HPC cluster)
- SSH key authentication configured
# Clone the repository
git clone https://github.com/EUMASTER4HPC/Team1_EUMASTER4HPC2526.git
cd Team1_EUMASTER4HPC2526
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux/Mac
# or: venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Verify installation
python src/frontend.py --helpCreate ~/.melussh or configure SSH with host alias meluxina:
# ~/.ssh/config
Host meluxina
HostName login.lxp.lu
User your_username
IdentityFile ~/.ssh/id_meluxina# 1. Run a Redis benchmark
python src/frontend.py examples/recipe_redis.yaml
# 2. Monitor execution
python src/frontend.py --watch BM-20260112-001
# 3. Generate analysis report
python src/frontend.py --report BM-20260112-001
# 4. View results in web interface
python src/frontend.py --web
# Open http://localhost:5000python src/frontend.py --uiProvides a guided menu for benchmark operations.
| Category | Service | Port | Use Case |
|---|---|---|---|
| Inference | vLLM | 8000 | High-performance LLM serving with PagedAttention |
| Ollama | 11434 | Lightweight local LLM deployment | |
| Database | PostgreSQL | 5432 | OLTP transactional workloads |
| Redis | 6379 | In-memory caching and pub/sub | |
| MinIO | 9000 | S3-compatible object storage | |
| Vector DB | ChromaDB | 8000 | Embedding storage for RAG |
| Qdrant | 6333 | High-performance vector search |
Recipes define complete experiments in declarative YAML:
configuration:
target: meluxina
service:
type: vllm # Service type
partition: gpu # Slurm partition
num_gpus: 1 # GPU allocation
time_limit: "01:00:00" # Job time limit
settings:
model: facebook/opt-125m # Model to serve
client:
type: vllm_smoke # Client type
partition: cpu
settings:
num_requests: 100
max_tokens: 50
benchmarks:
num_clients: 4 # Concurrent clientsSee docs/RECIPE_REFERENCE.md for complete reference.
Identifies the optimal operating point using maximum curvature analysis:
# Run concurrency sweep
python src/frontend.py examples/recipe_redis.yaml --clients 1,2,4,8,16,32
# Generate sweep report
python src/frontend.py --sweep-report BM-001,BM-002,BM-003,BM-004,BM-005Outputs:
- Latency knee point (where P99 grows superlinearly)
- Throughput saturation (max sustainable RPS)
- SLO compliance limit (max concurrency under latency target)
Classifies limiting factors from resource utilization:
| Bottleneck | Indicators |
|---|---|
| GPU-bound | GPU util >80%, stable CPU, rising TTFT |
| CPU-bound | High CPU time, low GPU, stable memory |
| Memory-bound | High RSS, OOM errors, latency spikes |
| Queueing | Throughput plateau, exploding P99 |
python src/frontend.py --compare BM-001 BM-002Flags regressions when:
- P99 latency increases >10%
- Throughput decreases >10%
- Success rate drops >1%
# Benchmark Operations
python src/frontend.py <recipe.yaml> # Run benchmark
python src/frontend.py --ui # Interactive mode
# Monitoring
python src/frontend.py --list # List all benchmarks
python src/frontend.py --watch <id> # Live status
python src/frontend.py --logs <id> # View logs
python src/frontend.py --stop <id> # Cancel jobs
# Results
python src/frontend.py --collect <id> # Download artifacts
python src/frontend.py --metrics <id> # View metrics
python src/frontend.py --report <id> # Generate report
# Analysis
python src/frontend.py --compare <a> <b> # Regression detection
python src/frontend.py --sweep-report <ids> # Saturation analysis
# Web Interface
python src/frontend.py --web # Launch at :5000Launch with python src/frontend.py --web and open http://localhost:5000
Pages:
- Dashboard - Overview of all benchmarks with status
- Run Recipe - Deploy benchmarks from UI
- Benchmarks - Detailed benchmark views
- Monitoring - Prometheus/Grafana integration
- Metrics - Charts and statistics
- Reports - Generated analysis with plots
Team1_EUMASTER4HPC2526/
├── src/
│ ├── frontend.py # CLI entry point, recipe parsing
│ ├── core/
│ │ ├── manager.py # Service/client orchestration
│ │ ├── aggregator.py # Metrics aggregation
│ │ ├── saturation.py
│ │ ├── bottleneck.py
│ │ ├── lifecycle.py # Job lifecycle management
│ │ └── collector.py # Artifact collection
│ ├── infra/
│ │ ├── communicator.py # SSH/Slurm abstraction
│ │ └── storage.py # Benchmark state persistence
│ ├── models/
│ │ ├── service.py
│ │ └── client.py
│ ├── builders/
│ │ └── command_builders.py # Sbatch script generation
│ ├── monitoring/
│ │ ├── manager.py # Prometheus/Grafana stack
│ │ └── monitor.py # Metrics collection
│ ├── reporting/
│ │ ├── reporter.py # Report generation
│ │ ├── plotting.py # Chart generation
│ │ └── artifacts.py
│ └── web/
│ └── flask_app.py # Web interface
├── examples/ # Recipe templates
├── measurements/ # Benchmark campaigns
├── results/ # Benchmark artifacts
├── reports/ # Generated reports
├── docs/ # Documentation
│ ├── RECIPE_REFERENCE.md # Recipe format reference
│ └── methodology.md # Benchmarking methodology
├── scripts/ # Automation scripts
├── requirements.txt # Python dependencies
└── README.md
Each benchmark produces:
results/<benchmark_id>/
├── run.json # Complete metadata + embedded recipe
├── requests.jsonl # Per-request timing (microsecond precision)
├── summary.json # Aggregated metrics
└── logs/ # Service and client logs
reports/<benchmark_id>/
├── report.md
├── report.json
└── plots/ # Visualization PNGs
├── latency_percentiles.png
├── throughput_timeline.png
└── success_rate.png
Every benchmark is fully reproducible:
# Rerun with identical configuration
python src/frontend.py --rerun BM-20260112-001Captured metadata includes:
- Complete YAML recipe (embedded)
- Container image digests
- Slurm job IDs and node allocations
- Timestamps for all lifecycle events
| Component | Technology |
|---|---|
| Language | Python 3.10+ |
| Cluster Communication | Fabric, Paramiko (SSH) |
| Job Scheduling | Slurm |
| Containerization | Apptainer |
| Web Framework | Flask |
| Visualization | Matplotlib, Chart.js |
| Monitoring | Prometheus, Grafana |