Skip to content

overlay: Put() returns EBUSY when CSI drivers use Bidirectional mount propagation #737

@trilamsr

Description

@trilamsr

Bug

Put() in the overlay driver fails with "replacing mount point: device or resource busy" when the overlay merged/ directory is shared with other mount namespaces via Bidirectional mount propagation.

How to reproduce

On a Kubernetes cluster using CRI-O with overlay storage:

  1. Run a CSI driver DaemonSet with mountPropagation: Bidirectional on /var/lib/kubelet
  2. Create and stop containers on the same node

Put() calls unix.Unmount(mountpoint, MNT_DETACH) (succeeds), then unix.Rename(merged.1, merged) (fails with EBUSY). The kernel can't complete the deferred unmount because the CSI driver's namespace still holds a mount reference via shared propagation.

Impact

CRI-O reports FailedKillPod, pods stuck Terminating indefinitely. Kubelet retries every few seconds. I hit this with JuiceFS CSI on CRI-O 1.34.0 / kernel 5.15.0 — pods stuck 42 days, 423k+ event repeats.

error killing pod: failed to "KillPodSandbox" ... 
  replacing mount point "/var/lib/containers/storage/overlay/.../merged": device or resource busy

Proposed fix

Handle EBUSY on the rename — clean up the temp dir and return success. MNT_DETACH already detached the mount from the calling namespace. DeferredRemove() handles directory cleanup later.

PR: #738

Related: #100

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions