deny rules in network policy schema

### Problem Statement

The current policy schema only supports allow rules. Every permitted method+path combination must be explicitly enumerated. This works well for simple use cases (e.g., read-only access to one API), but becomes unmanageable when the goal is to block a small set of dangerous operations on services with large API surfaces.

Real-world example: We run AI agents that need full developer-level access to GitHub and internal Teleport servers, but must be prevented from performing admin and strictly-human operations (approving PRs, changing rulesets, etc).

For GitHub's REST API, blocking ~10 admin endpoints (branch protection, rulesets, org settings, PR approvals, workflow approvals) requires enumerating ~60+ allow rules for every legitimate developer operation. Any new GitHub API endpoint is blocked by default until we update the policy, which means agent workflows break silently on GitHub API additions.

For services like Teleport, the problem is worse. Teleport's web API has hundreds of routes that change with every release, served alongside gRPC-web on the same port. Maintaining an exhaustive allow-list is not feasible — we currently have to choose between verbose allow-lists that break on Teleport upgrades, or falling back to TCP passthrough with no L7 protection at all.

With deny rules, both policies collapse to something like:

```
yamlrules:
  # Allow everything by default
  - allow:
      method: "*"
      path: "/**"
  # Block specific dangerous paths
  - deny:
      method: POST
      path: "/repos/*/pulls/*/reviews"
  - deny:
      method: PUT
      path: "/repos/*/branches/*/protection"
  - deny:
      method: PUT
      path: "/webapi/sites/*/accessrequest/*"
```

### Proposed Design

- Add a deny rule type alongside the existing allow rule type.
- deny rules take precedence over allow rules (deny wins on conflict).
- Evaluation order: if a request matches any deny rule, it is blocked regardless of allow rules. If no deny rule matches, existing allow logic applies.
- The access: full shorthand combined with deny rules would cover the "allow-all-except" pattern without needing to enumerate individual allows.

For example:

```
endpoints:
  - host: api.github.com
    port: 443
    protocol: rest
    tls: terminate
    enforcement: enforce
    access: full  # allow all methods and paths by default
    deny_rules:
      - deny:
          method: POST
          path: "/repos/*/pulls/*/reviews"
      - deny:
          method: "*"
          path: "/repos/*/branches/*/protection"
      - deny:
          method: "*"
          path: "/repos/*/branches/*/protection/**"
      - deny:
          method: "*"
          path: "/repos/*/rulesets"
      - deny:
          method: "*"
          path: "/repos/*/rulesets/*"
      - deny:
          method: POST
          path: "/repos/*/actions/runs/*/approve"
      - deny:
          method: POST
          path: "/repos/*/actions/runs/*/pending-deployments"
      - deny:
          method: POST
          path: "/graphql"
      - deny:
          method: "PUT"
          path: "/repos/*/actions/permissions"
      - deny:
          method: "PUT"
          path: "/repos/*/actions/permissions/**"
      - deny:
          method: "*"
          path: "/orgs/**"
          except_methods: [GET, HEAD, OPTIONS]
```

### Alternatives Considered

### Exhaustive allow-list (current approach)

This is what we're doing today. For GitHub, it requires ~60 allow rules to cover normal developer operations while omitting ~10 admin paths. For Teleport, it requires enumerating every safe POST endpoint across a web API with hundreds of routes.

Why it's insufficient: The policy is fragile. When GitHub or Teleport adds a new non-admin API endpoint, it's blocked by default until we update the allow-list. This creates silent failures in agent workflows that are hard to diagnose. For Teleport specifically, the API surface changes with every release — maintaining an exhaustive allow-list means coupling our policy update cycle to every vendor upgrade. The policy file also becomes extremely long (300+ lines for just two services), making review and auditing harder.

### Rely solely on service-side RBAC

Trust that GitHub's permissions and Teleport's role-based access control will block unauthorized operations, and don't attempt L7 enforcement in OpenShell at all.

Why it's insufficient: Our threat model specifically covers the case where the user has admin permissions (granted via Terraform elevation) and forgets to de-escalate before starting an agent session. Service-side RBAC says "yes, this token is authorized" — the whole point of the OpenShell policy layer is to enforce a narrower scope than the token allows. Defense-in-depth requires that the sandbox restrict what the agent can do even when the credential is over-privileged.

### Agent Investigation

_No response_

### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deny rules in network policy schema #565

Problem Statement

Proposed Design

Alternatives Considered

Exhaustive allow-list (current approach)

Rely solely on service-side RBAC

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

deny rules in network policy schema #565

Description

Problem Statement

Proposed Design

Alternatives Considered

Exhaustive allow-list (current approach)

Rely solely on service-side RBAC

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions