-
Notifications
You must be signed in to change notification settings - Fork 10
Description
The Problem
When a worker node reboots, the designated Virtual Control workload should always start automatically—just like a hardware PLC does.
Standard K8s requires control-plane connectivity for Kubelet to sync pod specs and restart workloads. If the network is down during boot, workloads may not start until connectivity is restored.
Why This Matters
Industrial deployments can't depend on network availability to restore service after a reboot. A PLC restarts itself. A containerized Virtual PLC should too.
When a node reboots without network access:
- Container images must be available locally (pre-cached or persisted)
- Workload state/configuration must survive the reboot (local storage, volumes)
- Deployment specs must be available offline (not fetched from WFM at boot time)
This requires Margo to define:
- Container image caching/pinning strategy for edge devices
- Local persistent storage semantics for stateful workloads
- Offline deployment manifest availability (git-ops-like, content-addressable)
Without this, containerized real-time workloads can't match the reliability of hardware PLCs.
What We Need
- Workloads marked as "location-bound" or "critical" should auto-start on node reboot
- This should work regardless of control-plane connectivity at boot time
- Local pod specs and settings etc should be cached/persisted to enable offline recovery
How This Relates to Margo
Device capabilities + deployment targeting could flag which workloads are critical for auto-recovery.
Posted as an individual contributor.