## Feature: Prevent Workload Reshuffling During Control-Plane Disconnects

### The Problem

When control-plane connectivity to a worker node is lost, K8s immediately assumes the node has failed and begins evicting and rescheduling workloads to other nodes.

For location-bound real-time workloads (EtherCAT, PROFINET connections), this is catastrophic. The workload is still running fine on its hardware, but K8s reshuffles it anyway—breaking the control loop.

K8s can't distinguish between:
- Transient network loss (worker is healthy, just disconnected)
- Actual node failure (worker is dead)

So it treats both the same: reshuffle everything.


### Motivation

Industrial edge deployments operate in environments with unreliable network connectivity (4G/5G dropouts, WiFi interference, cellular gaps). Control-plane disconnects are temporary and expected.

But current K8s behavior treats every control-plane disconnect as permanent node failure and immediately reshuffles workloads. This breaks location-bound real-time control loops that are physically wired to specific hardware.

A network hiccup (or longer outage) shouldn't destroy production. Margo needs semantics to distinguish transient control-plane loss from actual node failure, so location-bound workloads can survive network interruptions without being evicted (even after the connection restores.)

### What We Need

Orchestration semantics that:
- Don't assume node failure just because control-plane lost heartbeat
- Keep location-bound workloads pinned during control-plane disconnects
- Only evict if the worker node itself is actually unhealthy

This requires the ability to mark workloads as "location-bound" so the orchestrator knows: transient network loss ≠ node failure.

### How This Relates to Margo

Device capabilities (#96, #136) could enable this by allowing WFM to understand which workloads are location-bound and shouldn't be evicted on control-plane loss.

---
*Posted as an individual contributor.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

## Feature: Prevent Workload Reshuffling During Control-Plane Disconnects #173

The Problem

Motivation

What We Need

How This Relates to Margo

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

## Feature: Prevent Workload Reshuffling During Control-Plane Disconnects #173

Description

The Problem

Motivation

What We Need

How This Relates to Margo

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions