storage: fix UID/GID map handling for shifting layers#734
storage: fix UID/GID map handling for shifting layers#734giuseppe wants to merge 2 commits intocontainers:mainfrom
Conversation
Remove the canUseShifting branch from CreateContainer and always store the caller's IDMappingOptions. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
when the driver supports shifting, set the rechown target to identity instead of the container's maps. Closes: https://redhat.atlassian.net/browse/RHEL-160859 Assisted-by: Claude Opus 4.6 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
Note to self: Also #735. |
|
LGTM |
|
this is needed for a customer reported issue, can we move it forward? |
mtrmac
left a comment
There was a problem hiding this comment.
I can’t find any explanation of what the problem is, clear enough to ignorant me.
Per the test, is this correct? “store.Diff relies on the ID mappings stored in layers.json to interpret layer contents, but for container layers created when shifting is possible, we store an empty map there”.
Fine, given that, yes, this looks like it would fix that test case.
But how do I review / audit that? It’s a bit surprising that we change what is stored in layers.json and ~nothing else in the codebase changes. (Well, it could be adequately explained by “the rules were always clear and this is just a local bug”, but…)
- Is there a “layer” in the call stack responsible for shifting support? IOW, which parts of the call stack are supposed to see different data based on
supportsShiftingbeing true/false, and which parts of the call stack should see the same data in both cases? So far,layers.gowas mostly out of that business, and it looked like this is mostly a concern ofstore.go; nowlayers.goneeds to know (even more). - What are the exact rules? “from the hosts point of view, content of each layer must be mapped using the ID maps stored in the
Layerobject?” Fine…check.gohas a special case forcanUseShiftingdisabling ID maps. Was that always incorrect?populateLayerOptionscontinues to use the ~previous logic ofCreateContainer(I guess that’s ~blocked on higher layers like #717 , but what does c/storage think is the data model?)- so does
imageTopLayerForMapping - (I have no idea what the
DisableShiftingcode inStore.Mountis doing — how can the caller know whether the layer will be ID-mapped or not on return?!) ⚠️ layerMatchesMappingOptionsentirely disables thecanUseShiftingcase if the layer has ID maps set.
AFAICT this PR fixes one case at the cost of, at best, maybe, making container layers reliable but inconsistent with image layers. I don‘t know how to tell for sure.
What would help me is:
- a clear documentation of what fields in data structures mean (or a re-confirmation that the existing documentation is correct and that any deviation in code is a bug).
- If “the same values” meant different things in different layers of the call stack, have that eliminated by using separate types, or at least written down
- if the semantics is conditional, have that documented
- … ideally bounded to some specific fields / features / conditions to minimize scope.
- #735 suggests that the ID mapping semantics of overlay differs between kernel’s overlay and
fuse-overlayfs?! Is that the case?
Then it would be possible to do a repo-wide symbol search / audit for whether the rules are consistent, even for an ignoramus like me.
|
Just to add to the discussion since my PR is related:
Yes, but the difference is mostly in a slightly different but related scope. There are two scopes of question here, one is the layer creation scope where a chown on the layer file tree could happen to fix the layer permissions; the other one is the layer mount scope where a mount-time id map is performed to mount a view of the fs with a different set of ownership IDs then the underlying fs. The difference between overlay and fuse-overlayfs is mainly in the mount time scope:
#735 mainly concerns a chown operation on the upper layer ( Note that this PR, although working in the creation scope instead of mounting scope, seems to tamper with the layerOptions of the diff layer as well (changes in On the other hand I dug into the code for a while on how the ID maps are being passed around, and I feel that a lot of the confusion around the handling of this just comes from the notion of "layer id map" itself. A layer is a storage object, it is not yet instantiated into a mount, but an ID map is associated with a mount and a userns, which is a runtime object. It feels very strange that a layer could have an attribute of IDMap. And it seems that at mount time the driver does not use layer id map to do the idmapped mount at all, the maps used there are passed down from the Container object's id map attributes, and there is no data path between the maps on the layer object and the maps on the Container object. As far as I can tell, the layer ID maps seems to be only used to determine if UpdateLayerIDMap should be called to perform chown on the layers. But the purpose of this function seems very unclear to me, as well. I traced back the data path and it seems that all layer ID maps comes from library consumers. I can't really think of any legit practical case that this attribute can be used reasonably, as why is a chown needed in the first place? The linked jira ticket of this PR shows a podman failure related to this, but I feel that is just because podman called both container creation and layer creation methods with the id maps populated, it might just be fine if it does not fill the id maps for the layer creation. |
when the driver supports shifting, set the rechown target to identity instead of the container's maps.
Closes: https://redhat.atlassian.net/browse/RHEL-160859
Assisted-by: Claude Opus 4.6
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com