Skip to content

storage: fix UID/GID map handling for shifting layers#734

Open
giuseppe wants to merge 2 commits intocontainers:mainfrom
giuseppe:fix-map-handling
Open

storage: fix UID/GID map handling for shifting layers#734
giuseppe wants to merge 2 commits intocontainers:mainfrom
giuseppe:fix-map-handling

Conversation

@giuseppe
Copy link
Copy Markdown
Member

@giuseppe giuseppe commented Apr 1, 2026

when the driver supports shifting, set the rechown target to identity instead of the container's maps.

Closes: https://redhat.atlassian.net/browse/RHEL-160859
Assisted-by: Claude Opus 4.6
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

giuseppe added 2 commits April 1, 2026 15:34
Remove the canUseShifting branch from CreateContainer and always store
the caller's IDMappingOptions.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when the driver supports shifting, set the rechown target to identity
instead of the container's maps.

Closes: https://redhat.atlassian.net/browse/RHEL-160859
Assisted-by: Claude Opus 4.6
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@mtrmac
Copy link
Copy Markdown
Contributor

mtrmac commented Apr 2, 2026

Note to self: Also #735.

@TomSweeneyRedHat
Copy link
Copy Markdown
Member

LGTM
but definitely want some other head nods

@giuseppe
Copy link
Copy Markdown
Member Author

giuseppe commented Apr 8, 2026

this is needed for a customer reported issue, can we move it forward?

Copy link
Copy Markdown
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can’t find any explanation of what the problem is, clear enough to ignorant me.

Per the test, is this correct? “store.Diff relies on the ID mappings stored in layers.json to interpret layer contents, but for container layers created when shifting is possible, we store an empty map there”.


Fine, given that, yes, this looks like it would fix that test case.

But how do I review / audit that? It’s a bit surprising that we change what is stored in layers.json and ~nothing else in the codebase changes. (Well, it could be adequately explained by “the rules were always clear and this is just a local bug”, but…)

  • Is there a “layer” in the call stack responsible for shifting support? IOW, which parts of the call stack are supposed to see different data based on supportsShifting being true/false, and which parts of the call stack should see the same data in both cases? So far, layers.go was mostly out of that business, and it looked like this is mostly a concern of store.go; now layers.go needs to know (even more).
  • What are the exact rules? “from the hosts point of view, content of each layer must be mapped using the ID maps stored in the Layer object?” Fine…
    • check.go has a special case for canUseShifting disabling ID maps. Was that always incorrect?
    • populateLayerOptions continues to use the ~previous logic of CreateContainer (I guess that’s ~blocked on higher layers like #717 , but what does c/storage think is the data model?)
    • so does imageTopLayerForMapping
    • (I have no idea what the DisableShifting code in Store.Mount is doing — how can the caller know whether the layer will be ID-mapped or not on return?!)
    • ⚠️ layerMatchesMappingOptions entirely disables the canUseShifting case if the layer has ID maps set.

AFAICT this PR fixes one case at the cost of, at best, maybe, making container layers reliable but inconsistent with image layers. I don‘t know how to tell for sure.

What would help me is:

  • a clear documentation of what fields in data structures mean (or a re-confirmation that the existing documentation is correct and that any deviation in code is a bug).
    • If “the same values” meant different things in different layers of the call stack, have that eliminated by using separate types, or at least written down
    • if the semantics is conditional, have that documented
  • … ideally bounded to some specific fields / features / conditions to minimize scope.
  • #735 suggests that the ID mapping semantics of overlay differs between kernel’s overlay and fuse-overlayfs?! Is that the case?

Then it would be possible to do a repo-wide symbol search / audit for whether the rules are consistent, even for an ignoramus like me.

@chaserhkj
Copy link
Copy Markdown

Just to add to the discussion since my PR is related:

#735 suggests that the ID mapping semantics of overlay differs between kernel’s overlay and fuse-overlayfs?! Is that the case?

Yes, but the difference is mostly in a slightly different but related scope. There are two scopes of question here, one is the layer creation scope where a chown on the layer file tree could happen to fix the layer permissions; the other one is the layer mount scope where a mount-time id map is performed to mount a view of the fs with a different set of ownership IDs then the underlying fs.

The difference between overlay and fuse-overlayfs is mainly in the mount time scope:

  • When using overlayfs, the driver performs id-mapped mounts first before entering overlay mount procedure. This effectively adds a shim layer on top of all layers before the overlay is mounted. Thus right before performing the overlay mount the driver has a view of the fs that is already mapped onto the target ID space.
  • When using fuse-overlayfs, the driver use -o uidmapping= and -o gidmapping= provided by fuse-overlayfs to do the map and the overlay mount in one go. Thus right before performing the overlay mount the driver has a view of the fs that is unmapped yet.

#735 mainly concerns a chown operation on the upper layer (diff) that happens right before the overlay mount is committed that assumes that the layer is already mapped to the target space. Thus with the second case it just breaks the permission on the diff layer.

Note that this PR, although working in the creation scope instead of mounting scope, seems to tamper with the layerOptions of the diff layer as well (changes in CreateContainer function). This seems to be related to the behavior of how UpdateLayerIDMap is called.

On the other hand I dug into the code for a while on how the ID maps are being passed around, and I feel that a lot of the confusion around the handling of this just comes from the notion of "layer id map" itself. A layer is a storage object, it is not yet instantiated into a mount, but an ID map is associated with a mount and a userns, which is a runtime object. It feels very strange that a layer could have an attribute of IDMap. And it seems that at mount time the driver does not use layer id map to do the idmapped mount at all, the maps used there are passed down from the Container object's id map attributes, and there is no data path between the maps on the layer object and the maps on the Container object.

As far as I can tell, the layer ID maps seems to be only used to determine if UpdateLayerIDMap should be called to perform chown on the layers. But the purpose of this function seems very unclear to me, as well. I traced back the data path and it seems that all layer ID maps comes from library consumers. I can't really think of any legit practical case that this attribute can be used reasonably, as why is a chown needed in the first place? The linked jira ticket of this PR shows a podman failure related to this, but I feel that is just because podman called both container creation and layer creation methods with the id maps populated, it might just be fine if it does not fill the id maps for the layer creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants