Skip to content

Add ErrNotPermitted error for permission denied in feature probes #1949

Open
javiercardona-work wants to merge 3 commits intocilium:mainfrom
javiercardona-work:pr2
Open

Add ErrNotPermitted error for permission denied in feature probes #1949
javiercardona-work wants to merge 3 commits intocilium:mainfrom
javiercardona-work:pr2

Conversation

@javiercardona-work
Copy link
Copy Markdown

@javiercardona-work javiercardona-work commented Feb 3, 2026

Add ErrNotPermitted to distinguish permission denied (EPERM) from feature not supported (EINVAL) in feature probes.

This informs the caller that the feature may work with proper permissions (CAP_BPF or a BPF token) instead of having to upgrade to a newer kernel.

Problem

Previously, feature probes treated EPERM and EINVAL identically, returning ErrNotSupported for both. This was imprecise:

  • EINVAL: kernel doesn't recognize the request (feature not supported)
  • EPERM: kernel recognized the request but denied permission (feature exists)

Solution

Feature probes now return:

  • nil: feature is supported and accessible
  • ErrNotSupported: kernel doesn't have the feature
  • ErrNotPermitted: feature exists but permission denied

ErrNotPermitted wraps the original EPERM error so callers can unwrap if needed.

Usage

err := features.HaveProgramType(ebpf.XDP)                                                                                                                                                                                                  
switch {                                                                                                                                                                                                                                   
case err == nil:                                                                                                                                                                                                                           
    // feature works                                                                                                                                                                                                                       
case errors.Is(err, ebpf.ErrNotSupported):                                                                                                                                                                                                 
    // kernel doesn't have it                                                                                                                                                                                                              
case errors.Is(err, ebpf.ErrNotPermitted):                                                                                                                                                                                                 
    // permission denied (but feature exists)                                                                                                                                                                                              
}                                                                                                                                                                                                                                          

Breaking Change

Code checking only for ErrNotSupported will no longer catch permission errors. To maintain previous behavior:

  // Before:                                                                                                                                                                                                                                 
  if errors.Is(err, ebpf.ErrNotSupported) { ... }                                                                                                                                                                                            
                                                                                                                                                                                                                                             
  // After:                                                                                                                                                                                                                                  
  if errors.Is(err, ebpf.ErrNotSupported) || errors.Is(err, ebpf.ErrNotPermitted) { ... }                                                                                                                                                    
                                                                                                                                                                                                                                             
  // Or simply:                                                                                                                                                                                                                              
  if err != nil { ... }                                                           

Add ErrNotPermitted error to distinguish permission denied (EPERM)
from feature not supported (EINVAL) in feature probes.

Previously, EPERM and EINVAL were both mapped to ErrNotSupported.
This was imprecise: EPERM means the kernel recognized the request
but denied permission, indicating the feature exists. EINVAL means
the kernel doesn't recognize the request.

Now feature probes return:
- nil: feature is supported and accessible
- ErrNotSupported: kernel doesn't have the feature
- ErrNotPermitted: feature exists but permission denied

This gives callers richer information for diagnostics and error
messages while maintaining backward compatibility (callers can
still check err != nil for "can't use this feature").

ErrNotPermitted wraps the original EPERM error so callers can
unwrap if needed.

BREAKING CHANGE: Code that checks for ErrNotSupported to handle
"feature unavailable" cases will no longer catch permission errors.

To maintain previous behavior, update code as follows:

Before:
    if errors.Is(err, ebpf.ErrNotSupported) {
        // handle unavailable feature
    }

After:
    if errors.Is(err, ebpf.ErrNotSupported) || errors.Is(err, ebpf.ErrNotPermitted) {
        // handle unavailable feature
    }

Or simply check for any error:
    if err != nil {
        // handle unavailable feature
    }

Signed-off-by: Javier Cardona <jcardona@meta.com>
Add TestErrNotPermitted to verify:
- ErrNotPermitted is distinct from ErrNotSupported
- Wrapped errors can be matched with errors.Is()

Update DocDetectXDP example to show how to handle
ErrNotPermitted in addition to ErrNotSupported.

Signed-off-by: Javier Cardona <jcardona@meta.com>
florianl
florianl previously approved these changes Feb 11, 2026
Copy link
Copy Markdown
Contributor

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment on the style, that a switch statement could be used in favor of multiple if statements.
Otherwise, I'm in favor for this differentiation.

Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Signed-off-by: javiercardona-work <jcardona@meta.com>
@javiercardona-work
Copy link
Copy Markdown
Author

javiercardona-work commented Feb 13, 2026

ci / Run tests (Linux) (stable) (pull_request)Failing after 1m

This looks like a pre-existing flaky test related to GC finalization timing, not something caused by the changes in this PR. Can we re-test?

if errors.Is(err, unix.EINVAL) || errors.Is(err, unix.EPERM) {
// Treat both EINVAL and EPERM as not supported: creating the map may still
// succeed without Btf* attrs.
if errors.Is(err, unix.EINVAL) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, the style as mentioned in #1949 (comment) should also be applied here.


err := probeBTF(fn)
if errors.Is(err, unix.EINVAL) || errors.Is(err, unix.EPERM) {
if errors.Is(err, unix.EINVAL) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, the style as mentioned in #1949 (comment) should also be applied here.

if errors.Is(err, unix.EBADF) {
return nil
}
// EPERM means the kernel recognized the syscall but denied permission.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, the style as mentioned in #1949 (comment) should also be applied here.

Copy link
Copy Markdown
Contributor

@rgo3 rgo3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @javiercardona-work. First things first: I agree that adding a sentinel for this makes sense to distinguish between missing permissions and a feature being not supported. However, I do have some high-level concerns that I'd like to discuss.

The PR description, commit messages, docstrings, and code comments all claim that EPERM means "the kernel recognized the request" and "the feature exists." I'm not sure that this assumption holds for all kernels that this library currently supports. In some BPF syscall paths the permission check runs before feature validation, see for example https://elixir.bootlin.com/linux/v6.1.164/source/kernel/bpf/syscall.c#L4994. An unprivileged caller gets EPERM regardless of whether the requested map or prog type feature is present.

So in my opinion the correct framing, from our POV should be: EPERM means "permission denied and feature availability is unknown." Callers (especially those that know that on newer kernels this might mean that the feature is actually available) can then interpret and deal with this in a way of their choosing.
This should be reflected in ErrNotPermitted's godoc, the features/doc.go package doc, and potentially the FeatureTestFn doc if we choose to cache the result for ErrNotPermitted.

This brings me to my second concern. ErrNotPermitted currently falls through the uncached path in internal/feature.go:execute(). Unlike kernel feature support, permissions can change during a process lifetime (capabilities added/dropped, namespace transitions, BPF tokens), so not caching may be the right choice. Currently, features/doc.go says "Probe results are cached by the library and persist throughout any changes to the process' environment, like capability changes." If EPERM is intentionally not cached, this needs updating (cc @ti-mo).

It looks like EPERM handling was added to some internal probes but not others? Is there a reason why we shouldn't include this handling on every probe in the library?

As Florian already pointed out, let's be consistent in using the switch statement style where applicable.

}
}

func TestErrNotPermitted(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test feels like we are testing Go standard library behavior instead of our own code, so I don't see a lot of value in this test.

@ti-mo
Copy link
Copy Markdown
Contributor

ti-mo commented Mar 18, 2026

@javiercardona-work I think it may be better to drop this change altogether. I assume this PR came to be after a frustrating trial-and-error session for figuring out the minimum set of capabilities the Cilium agent needed in order to run correctly. Was there anything else on your side that motivated this change?

#1953 introduces accurate reporting on exactly which permission is missing when tokens are enabled, which would eliminate this guessing game. Probes will also start failing with an exact error instead of a simple EPERM. As Robin indicated, given the inconsistent nature of the timing/location of permission checks in the kernel, I don't think we can draw probe conclusions based on EPERM alone.

If a probe fails with EPERM, in the majority of cases this should be due to running in a userns/unpriv environment where tokens are used to enforce the principle of least privilege. Returning a sentinel in this case doesn't feel super useful, since the user will need to modify the sandbox configuration to provide the necessary caps if they want a conclusive probe result. There's not much the program itself can do to remediate this situation. If it wants to fall back to a different behaviour, there's already the EPERM signal.

Have you found specific cases in the Cilium agent where this type of fallback behaviour is necessary for running in userns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants