Skip to content

feat: support VOLUME_MOUNT_GROUP for non-root pod volume access#9

Open
sheyabernstein wants to merge 2 commits intomainfrom
feat/support-volume-mount-group
Open

feat: support VOLUME_MOUNT_GROUP for non-root pod volume access#9
sheyabernstein wants to merge 2 commits intomainfrom
feat/support-volume-mount-group

Conversation

@sheyabernstein
Copy link
Collaborator

@sheyabernstein sheyabernstein commented Mar 20, 2026

Overview

mkfs.ext4 creates the filesystem root as root:root 0755, blocking writes from non-root pods. Instead of a blanket chmod 0777, implement proper CSI VOLUME_MOUNT_GROUP support so kubelet passes the pod's fsGroup to the driver.

  • Advertise VOLUME_MOUNT_GROUP node capability
  • Set fsGroupPolicy to File so kubelet delegates to the driver
  • Apply chown/chmod 0775 with setgid in both NodeStageVolume and NodePublishVolume using the volume_mount_group from the request
  • NodePublishVolume handles it too so volumes staged before the upgrade also get correct permissions without needing PVC recreation
  • Use os.ModeSetgid (not octal 02775) so new subdirectories created by kubelet for subPath mounts inherit the fsGroup automatically

Testing

Setup: Deployed Mimir via Helm on a test cluster with PVCs backed by csi-hyperstack StorageClass. Mimir pods run as non-root (runAsUser: 1001, fsGroup: 1001).

Reproducing the issue:

  • With the upstream image, pods with CSI-backed PVCs (mimir-compactor, mimir-kafka, mimir-ingester, mimir-store-gateway) all entered CrashLoopBackOff
  • Kafka logs confirmed: java.nio.file.AccessDeniedException: /var/lib/kafka/data
  • Root cause: mkfs.ext4 creates the filesystem root as root:root 0755, blocking writes from non-root pods

Testing the fix:

  1. Upgraded the Helm release with the new CSI driver image
  2. Verified CSIDriver object updated to fsGroupPolicy: File via kubectl get csidriver hyperstack.csi.nexgencloud.com -o yaml
  3. Verified VOLUME_MOUNT_GROUP capability (type 6) advertised via NodeGetCapabilities in driver logs
  4. Deleted and recreated mimir-compactor-0 - pod started successfully (previously CrashLoopBackOff)
  5. Confirmed in CSI node logs that NodePublishVolume received volume_mount_group: "10001" and applied chown/chmod 02775 (setgid) on the staging path
  6. Verified this works on already-staged volumes - no PVC deletion or volume re-creation required, making this a non-breaking upgrade
  7. Verified fix for subPath mounts (e.g. Prometheus with fsGroup: 2000 and subPath: prometheus-db) — pod started successfully after previously failing with permission denied

mkfs.ext4 creates the filesystem root as root:root 0755, blocking writes from non-root pods. Instead of a blanket chmod 0777, implement proper CSI VOLUME_MOUNT_GROUP support so kubelet passes the pod's fsGroup to the driver.

- Advertise VOLUME_MOUNT_GROUP node capability
- Set fsGroupPolicy to File so kubelet delegates to the driver
- Apply chown/chmod 0775 in both NodeStageVolume and NodePublishVolume using the volume_mount_group from the request
- NodePublishVolume handles it too so volumes staged before the upgrade also get correct permissions without needing PVC recreation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants