-
Notifications
You must be signed in to change notification settings - Fork 354
Description
Problem Statement
The current policy schema only supports allow rules. Every permitted method+path combination must be explicitly enumerated. This works well for simple use cases (e.g., read-only access to one API), but becomes unmanageable when the goal is to block a small set of dangerous operations on services with large API surfaces.
Real-world example: We run AI agents that need full developer-level access to GitHub and internal Teleport servers, but must be prevented from performing admin and strictly-human operations (approving PRs, changing rulesets, etc).
For GitHub's REST API, blocking ~10 admin endpoints (branch protection, rulesets, org settings, PR approvals, workflow approvals) requires enumerating ~60+ allow rules for every legitimate developer operation. Any new GitHub API endpoint is blocked by default until we update the policy, which means agent workflows break silently on GitHub API additions.
For services like Teleport, the problem is worse. Teleport's web API has hundreds of routes that change with every release, served alongside gRPC-web on the same port. Maintaining an exhaustive allow-list is not feasible — we currently have to choose between verbose allow-lists that break on Teleport upgrades, or falling back to TCP passthrough with no L7 protection at all.
With deny rules, both policies collapse to something like:
yamlrules:
# Allow everything by default
- allow:
method: "*"
path: "/**"
# Block specific dangerous paths
- deny:
method: POST
path: "/repos/*/pulls/*/reviews"
- deny:
method: PUT
path: "/repos/*/branches/*/protection"
- deny:
method: PUT
path: "/webapi/sites/*/accessrequest/*"
Proposed Design
- Add a deny rule type alongside the existing allow rule type.
- deny rules take precedence over allow rules (deny wins on conflict).
- Evaluation order: if a request matches any deny rule, it is blocked regardless of allow rules. If no deny rule matches, existing allow logic applies.
- The access: full shorthand combined with deny rules would cover the "allow-all-except" pattern without needing to enumerate individual allows.
For example:
endpoints:
- host: api.github.com
port: 443
protocol: rest
tls: terminate
enforcement: enforce
access: full # allow all methods and paths by default
deny_rules:
- deny:
method: POST
path: "/repos/*/pulls/*/reviews"
- deny:
method: "*"
path: "/repos/*/branches/*/protection"
- deny:
method: "*"
path: "/repos/*/branches/*/protection/**"
- deny:
method: "*"
path: "/repos/*/rulesets"
- deny:
method: "*"
path: "/repos/*/rulesets/*"
- deny:
method: POST
path: "/repos/*/actions/runs/*/approve"
- deny:
method: POST
path: "/repos/*/actions/runs/*/pending-deployments"
- deny:
method: POST
path: "/graphql"
- deny:
method: "PUT"
path: "/repos/*/actions/permissions"
- deny:
method: "PUT"
path: "/repos/*/actions/permissions/**"
- deny:
method: "*"
path: "/orgs/**"
except_methods: [GET, HEAD, OPTIONS]
Alternatives Considered
Exhaustive allow-list (current approach)
This is what we're doing today. For GitHub, it requires ~60 allow rules to cover normal developer operations while omitting ~10 admin paths. For Teleport, it requires enumerating every safe POST endpoint across a web API with hundreds of routes.
Why it's insufficient: The policy is fragile. When GitHub or Teleport adds a new non-admin API endpoint, it's blocked by default until we update the allow-list. This creates silent failures in agent workflows that are hard to diagnose. For Teleport specifically, the API surface changes with every release — maintaining an exhaustive allow-list means coupling our policy update cycle to every vendor upgrade. The policy file also becomes extremely long (300+ lines for just two services), making review and auditing harder.
Rely solely on service-side RBAC
Trust that GitHub's permissions and Teleport's role-based access control will block unauthorized operations, and don't attempt L7 enforcement in OpenShell at all.
Why it's insufficient: Our threat model specifically covers the case where the user has admin permissions (granted via Terraform elevation) and forgets to de-escalate before starting an agent session. Service-side RBAC says "yes, this token is authorized" — the whole point of the OpenShell policy layer is to enforce a narrower scope than the token allows. Defense-in-depth requires that the sandbox restrict what the agent can do even when the credential is over-privileged.
Agent Investigation
No response
Checklist
- I've reviewed existing issues and the architecture docs
- This is a design proposal, not a "please build this" request