Physics-based adaptive fan control for the Supermicro ARS-210M-NR server
Your Supermicro ARS-210M-NR came with fan curves designed for worst-case datacenter scenarios. quiet-lab learns YOUR specific setup and runs as quiet as physics allows.
- Datacenter fan curves assume 40°C ambient and maximum load 24/7
- Your homelab is probably 20°C ambient with <10% average load
- Result: Unnecessary noise, wasted power, and unhappy family members
quiet-lab uses Model Predictive Control (MPC) and Extended Kalman Filtering to:
- Learn your server's actual thermal dynamics
- Predict temperature changes before they happen
- Optimize fan speeds to minimize noise while maintaining safety
- Adapt to seasonal changes and hardware aging
From real-world deployment on a 64-core Ampere Altra server:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Idle Fan Speed | 80-100% | 25-35% | -65% |
| Noise Level | 65 dB | 42 dB | -23 dB |
| Power (Fans) | 120W | 15W | -87% |
| Temperature Prediction | N/A | ±0.96°C | Learns your system |
- 🧮 Model Predictive Control - Anticipates thermal changes before they happen
- 📈 Self-Learning - Adapts to your specific hardware via Kalman filtering
- 🔇 Multi-Objective Optimization - Balances temperature, power, and acoustics
- 🛡️ Multiple Safety Layers - Hardware limits, software limits, and failsafe modes
- 📊 Production Mode - Stops learning once converged for minimal overhead
- ⚡ Efficient - <1% of one core usage after convergence
- Supermicro ARS-210M-NR server (currently the only supported model)
- Linux OS with systemd
- Python 3.8+
ipmitool,python3-numpy,python3-scipyinstalled- Root access (for IPMI commands)
# Clone the repository
git clone https://github.com/0xalcibiades/quiet-lab.git
cd quiet-lab
# Run the installation script (installs dependencies automatically)
sudo ./install-fan-control.sh
# Start the service
sudo systemctl start fan-control
# Check status
sudo systemctl status fan-control
sudo journalctl -u fan-control -fThe installation script will:
- Install required packages (ipmitool, python3-numpy, python3-scipy)
- Copy the fan control script to
/usr/local/bin/ - Install and enable the systemd service
- Create necessary directories for logs and saved models
Note: The learned thermal model is saved at /var/lib/thermal_model.pkl
quiet-lab implements a sophisticated control system based on first-principles thermodynamics, online parameter learning, and predictive optimization. Here's the detailed mathematical foundation:
The server is modeled as a network of thermal nodes, each representing a component (CPU, DIMM, VRM, etc.). The heat transfer dynamics follow Newton's law of cooling with forced convection:
Where:
-
$C = \text{diag}(C_1, ..., C_n) \in \mathbb{R}^{n \times n}$ — Thermal capacitance matrix [J/K] -
$T = [T_1, ..., T_n]^{\top} \in \mathbb{R}^n$ — Temperature state vector [K] -
$Q(t) = [Q_1(t), ..., Q_n(t)]^{\top} \in \mathbb{R}^n$ — Heat generation vector [W] -
$H(u) \in \mathbb{R}^{n \times n}$ — Heat transfer matrix (diagonal) [W/K] -
$u = [u_1, ..., u_m]^{\top} \in [0,1]^m$ — Fan PWM control vector -
$T_{\text{inlet}} \in \mathbb{R}$ — Ambient inlet temperature [K]
The heat transfer matrix has diagonal structure:
where
The EKF jointly estimates both system states (temperatures) and parameters (heat transfer coefficients). The augmented state vector is:
Prediction Step:
where the Jacobian of the state transition is:
Update Step:
The innovation
MPC solves a finite-horizon optimal control problem every 15 seconds:
where
Subject to:
- System dynamics:
$x_{k+1} = f_d(x_k, u_k, w_k)$ - Box constraints:
$u_{\min} \preceq u_k \preceq u_{\max}$ - Rate constraints:
$\lVert u_k - u_{k-1} \rVert_{\infty} \leq \Delta u_{\max}$
The stage cost
where the penalty function is:
with urgency scaling:
The psychoacoustic model captures human perception of fan noise:
┌────────────────────────────────────────────────────────────┐
│ IPMI Hardware Interface │
│ ┌─────────────┬─────────────┐ │
│ │ Sensors │ Actuators │ │
│ └──────┬──────┴──────┬──────┘ │
└────────────────────────┼─────────────┼─────────────────────┘
│ ΔT = 15s │
┌────────────────────────▼─────────────▼─────────────────────┐
│ Extended Kalman Filter │
│ ┌────────────────────────────────────────────────────┐ │
│ │ State Prediction: x̂ₖ₊₁ = f(x̂ₖ, uₖ, wₖ) │ │
│ │ Covariance Update: Pₖ₊₁ = FₖPₖFₖᵀ + GₖWGₖᵀ │ │
│ │ Innovation: νₖ = zₖ - Cx̂ₖ │ │
│ │ Kalman Gain: Kₖ = PₖCᵀ(CPₖCᵀ + R)⁻¹ │ │
│ └────────────────────────────────────────────────────┘ │
└────────────────────────┬───────────────────────────────────┘
│ (T̂, Ĥ)
┌────────────────────────▼───────────────────────────────────┐
│ Model Predictive Controller │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Solve: min Σ ℓ(xₖ, uₖ) s.t. dynamics & constraints │ │
│ │ U │ │
│ │ Horizon: N = 3 steps (45 seconds lookahead) │ │
│ │ Solver: Sequential Quadratic Programming (SLSQP) │ │
│ └────────────────────────────────────────────────────┘ │
└────────────────────────┬───────────────────────────────────┘
│ u*₀
┌────────────────────────▼───────────────────────────────────┐
│ Safety Governor │
│ • Emergency override: T > Tᶜʳⁱᵗ - ε → u = 1 │
│ • Rate limiting: clip(Δu, -Δuₘₐₓ, Δuₘₐₓ) │
│ • Saturation: project(u, [0.25, 1.0]) │
└────────────────────────────────────────────────────────────┘
The system exhibits distinct learning phases characterized by the Frobenius norm of the heat transfer matrix updates:
Phase 1: Exploration (t ∈ [0, 30 min])
-
$\lVert \Delta H \rVert_F > 1.0$ — Rapid initial learning - Conservative control:
$u \approx 0.5 \cdot 1$
Phase 2: Refinement (t ∈ [30 min, 2 hr])
-
$0.1 < \lVert \Delta H \rVert_F < 1.0$ — Convergence to true parameters - Aggressive optimization begins
Phase 3: Adaptation (t > 2 hr)
-
$\lVert \Delta H \rVert_F < 0.1$ — Fine-tuning for load variations - Optimal steady-state operation
The learned heat transfer matrix
- Primary cooling paths (large
$H_{ij}$ ) - Cross-component thermal coupling
- Non-intuitive airflow patterns unique to the chassis
- Pre-start: Fans set to 100% before service starts
- Graceful shutdown: Returns to BIOS control on stop
- Crash protection: Automatic failover to safe mode
- Temperature limits: Hard-coded critical thresholds
- Watchdog: Detects hung processes
- Minimum fan speeds: Prevents stalling (25% minimum)
- Supermicro ARS-210M-NR (64-core Ampere Altra)
- All thermal nodes and fan zones are hardcoded for this specific model
- Sensor names match the ARS-210M-NR IPMI interface
- Thermal capacitance values tuned for this hardware
While the current implementation is specific to the ARS-210M-NR, the underlying physics-based approach and control algorithms can be adapted to any server with:
- IPMI 2.0+ for sensor reading and fan control
- Temperature sensors for critical components
- PWM-controllable fans with zones
- Power consumption measurements (optional but helpful)
To adapt for a different server, you would need to:
- Map your server's IPMI sensor names in the
thermal_nodesconfiguration - Identify your fan zones and PWM control addresses
- Adjust thermal capacitance values for your components
- Update the initial H matrix estimates based on your server's airflow patterns
The core algorithms (Extended Kalman Filter, Model Predictive Control, thermal physics model) remain the same.
Contributions are welcome
Q: How long until it learns my system?
A: 50% learning in ~30 minutes, 95% in 2-3 hours, fully converged in 4-6 hours.
Q: What if it fails?
A: Fans automatically return to BIOS control. Your hardware's built-in thermal protection remains active.
Q: Does it work with other servers?
A: Currently hardcoded for the Supermicro ARS-210M-NR, but the physics-based approach generalizes. See the Hardware Support section for adaptation requirements.
Q: Power savings?
A: Typical 10W per fan reduction from fan power alone, plus better overall cooling efficiency.
MIT License - See LICENSE for details.
- Inspired by higher WAF
- Built with NumPy, SciPy, and 🔇
- Special thanks to everyone who said "that's sounds like a giant mosquito in your garage"
The best fan controller is the one you forget exists