Skip to content

Add CGROUP_SOCK_ADDR Set Support#493

Open
yaakov-stein wants to merge 8 commits intofacebook:mainfrom
yaakov-stein:add_csa_set_support
Open

Add CGROUP_SOCK_ADDR Set Support#493
yaakov-stein wants to merge 8 commits intofacebook:mainfrom
yaakov-stein:add_csa_set_support

Conversation

@yaakov-stein
Copy link
Copy Markdown
Contributor

@yaakov-stein yaakov-stein commented Mar 31, 2026

Summary

Before this change, sets (hash maps and LPM tries used for bulk matching) were only supported on packet-based hooks. This meant users who wanted to filter against a list of IPs or ports on a CONNECT/SENDMSG hook had to create one rule per value, which scales poorly.

This change adds set support for all four CGROUP_SOCK_ADDR hooks, reusing the same BPF map infrastructure (hash maps for exact match, LPM tries for prefix match) that packet hooks already use.

Refactoring: make set codegen composable/generic

Set codegen previously lived entirely in set.c and was tightly coupled to packet-based header loading. To reuse it for CGROUP_SOCK_ADDR hooks, we extracted the generic pieces (copy field to scratch, map/trie lookup) into shared helpers and moved the packet-specific logic into packet.c. Protocol checks and layer conflict detection were also hoisted out of set codegen into program.c's existing dedup loop, so they now cover set components the same way they cover individual matchers.

As a drive-by, set components are now validated individually against the hook's supported matchers.

CGROUP_SOCK_ADDR set codegen

With the shared helpers in place, CGROUP_SOCK_ADDR set codegen maps each key component to its bpf_sock_addr context field offset, copies the field to the scratch buffer, and calls the shared map lookup. No header parsing needed — r6 already points to the context.

Why can't we use the current set codegen?

Packet-based hooks receive a pointer to raw packet memory. The BPF program parses headers layer by layer (L2 -> L3 -> L4), and r6 points to the current header being inspected.

CGROUP_SOCK_ADDR hooks receive a bpf_sock_addr context struct instead. The kernel pre-extracts socket metadata into typed fields:

struct bpf_sock_addr {
    __u32 user_ip4;        // destination IPv4
    __u32 user_ip6[4];     // destination IPv6
    __u32 user_port;       // destination port (network order)
    __u32 msg_src_ip4;     // source IPv4  (sendmsg only)
    __u32 msg_src_ip6[4];  // source IPv6  (sendmsg only)
    ...
};

This difference has two consequences for set codegen:

  1. No header parsing needed. r6 already points to the context. Instead of loading a header pointer and extracting fields from parsed packet data, we read directly from context field offsets (r6 + offsetof(bpf_sock_addr, field)).

  2. BPF verifier constraints on access width. Packet memory can be read at any width. Context struct fields must be read at widths that respect field alignment. The verifier rejects misaligned reads from context pointers. The new bf_stub_load function handles this by picking the largest safe access width based on both source and destination alignment.

The diagram below shows the two paths:

Packet hooks (XDP/TC/NF/cgroup_skb):
  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
  │ Parse L2/L3  │────>│ Load header  │────>│ Copy field   │───> map lookup
  │ (dynptr)     │     │ ptr into r6  │     │ to scratch   │
  └──────────────┘     └──────────────┘     └──────────────┘

CGROUP_SOCK_ADDR:
  ┌──────────────────┐     ┌──────────────┐
  │ r6 = ctx (set    │────>│ Copy field   │───> map lookup
  │ once in prologue)│     │ to scratch   │
  └──────────────────┘     └──────────────┘

Test plan

  • Unit tests: validation of set component hook compatibility and cross-layer conflict detection
  • E2e tests: each CGROUP_SOCK_ADDR hook has dry-run parsing tests and behavioral tests covering hash sets, trie sets, and multi-component sets

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

Claude review of PR #493 (683d9a7)

Previous must-fix issues (NULL dereferences in chain.c, program.c, cgroup_sock_addr.c, packet.c) have all been addressed in this revision.

Suggestions

  • Missing asserts in _bf_program_check_protosrc/libbpfilter/cgen/program.c:289 — Function lacks assert(program) and assert(checked_layers), unlike every other static function in this file
  • ip4.proto set validation gapsrc/libbpfilter/cgen/cgroup_sock_addr.c:228ip4.proto has BF_MATCHER_IN and is not excluded by unsupported_hooks for cgroup_sock_addr hooks, but _bf_cgroup_sock_addr_ctx_offset doesn't handle it — chain validation passes but codegen fails late
  • Unit test should verify specific errnotests/unit/libbpfilter/chain.c:292set_component_unsupported_hook uses assert_err but should check for -ENOTSUP specifically
  • Missing port-only behavioral E2E testtests/e2e/hooks/cgroup_sock_addr_connect4.sh:138 — Dry-run covers (tcp.dport) in { ... } but no behavioral test exercises the standalone port narrow-read path

Nits

  • Inconsistent errno for missing metasrc/libbpfilter/cgen/matcher/packet.c:320 — Uses -ENOENT while cgroup_sock_addr.c:253 uses -EINVAL for the same condition
  • BF_MATCHER_META_DPORT unreachable in ctx_offsetsrc/libbpfilter/cgen/cgroup_sock_addr.c:226 — Cannot appear as a set key because it lacks BF_MATCHER_IN op
  • _bf_cgroup_sock_addr_ctx_offset lacks Doxygensrc/libbpfilter/cgen/cgroup_sock_addr.c:209 — Non-trivial function mapping matcher types to ctx offsets has no documentation
  • addr_size * 8 overflow assumption implicitsrc/libbpfilter/cgen/matcher/set.c:40 — Cast to uint32_t is safe only if addr_size <= 16, but this invariant is undocumented

Workflow run

@yaakov-stein yaakov-stein marked this pull request as draft March 31, 2026 03:49
@yaakov-stein yaakov-stein removed the request for review from qdeslandes March 31, 2026 03:49
@yaakov-stein yaakov-stein force-pushed the add_csa_set_support branch 2 times, most recently from 8a5b67b to 1def2a9 Compare March 31, 2026 17:27
Extract the copy-to-scratch logic from bf_stub_stx_payload into a new
function with explicit parameters. bf_stub_load copies size bytes from
R6 + src_offset to scratch[dst_scratch_offset], using min(src, dst)
alignment for access size selection.

For each chunk, the largest access width is picked where both src_off
and dst_off are aligned and remaining >= width. This ensures optimal
width for packet access (R6 = memory pointer) and verifier-safe width
for context access (R6 = ctx pointer).

bf_stub_stx_payload is rewritten as a thin wrapper around bf_stub_load.
Extract reusable set codegen helpers from the existing packet-specific
set code:

- bf_set_generate_map_lookup: the 5-instruction map-lookup tail shared
  by both trie and hash paths (load set FD, compute key pointer, call
  bpf_map_lookup_elem, jump to next rule on NULL).

- bf_set_generate_trie_lookup: complete LPM trie key assembly and
  lookup. Writes prefixlen at scratch[4], copies the address to
  scratch[8] via bf_stub_load, then calls bf_set_generate_map_lookup.
  Replaces ~30 lines of IPv4/IPv6-branching trie bytecode that would
  otherwise be duplicated between packet and cgroup_sock_addr flavors.

The trie path in bf_matcher_generate_set is inlined since it reduces to
three calls with the new helpers. The hash path calls
bf_set_generate_map_lookup instead of inlining the sequence.
- Notes
* - :rspan:`1` Source address
- :rspan:`1` ``ip6.saddr``
* - :rspan:`2` Source address
Copy link
Copy Markdown
Contributor Author

@yaakov-stein yaakov-stein Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by fix (document ip6.d/saddr works in sets)

For BF_MATCHER_SET, iterate set key components through the same
checked_layers dedup logic as regular matchers. Extract
_bf_program_check_proto to share the check between both paths.
Check each set key component against the chain's hook and track set
components in the per-layer compatibility check. Extract
_bf_rule_check_layer to share the logic between regular matchers and
set components.
@yaakov-stein yaakov-stein marked this pull request as ready for review March 31, 2026 18:55
@yaakov-stein yaakov-stein requested a review from qdeslandes March 31, 2026 18:55
static int _bf_program_check_proto(struct bf_program *program,
enum bf_matcher_type type,
uint32_t *checked_layers)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Missing assert(program) and assert(checked_layers) in _bf_program_check_proto. Every other static function in this file that takes a program pointer asserts on it (14 instances: _bf_program_fixup, _bf_program_generate_rule, _bf_program_generate_elfstubs, etc.). Since checked_layers is also a pointer parameter, both should be asserted per the style guide's parameter validation convention.

Suggested change
{
static int _bf_program_check_proto(struct bf_program *program,
enum bf_matcher_type type,
uint32_t *checked_layers)
{
assert(program);
assert(checked_layers);
const struct bf_matcher_meta *meta = bf_matcher_get_meta(type);

return offsetof(struct bpf_sock_addr, user_ip6);
case BF_MATCHER_TCP_DPORT:
case BF_MATCHER_UDP_DPORT:
case BF_MATCHER_META_DPORT:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: _bf_cgroup_sock_addr_ctx_offset does not handle BF_MATCHER_IP4_PROTO, which supports BF_MATCHER_IN (so it can be a set key component) and whose unsupported_hooks does not exclude IPv4 cgroup_sock_addr hooks. A set like (ip4.proto) in { 6; 17 } would pass the _bf_chain_check_rule component validation but hit the default branch here, returning (size_t)-1 and failing late at codegen with -ENOTSUP.

The error message is clear so this is not silent corruption, but it is a validation/codegen mismatch. Consider either:

  • Adding BF_MATCHER_IP4_PROTO to the switch (reading from bpf_sock_addr.protocol), or
  • Adding cgroup_sock_addr hooks to BF_MATCHER_IP4_PROTO's unsupported_hooks in matcher.c so the mismatch is caught at chain creation time.


# (ip4.daddr, udp.dport) multi-component hash set
${FROM_NS} ${BFCLI} chain set --from-str "chain c BF_HOOK_CGROUP_SOCK_ADDR_CONNECT4{cgpath=${CGROUP_PATH}} ACCEPT rule (ip4.daddr, udp.dport) in { ${HOST_IP_ADDR}, 9990 } DROP"
(! udp4_connect ${HOST_IP_ADDR} 9990)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: The dry-run section tests a standalone (tcp.dport) in { 80; 443 } set, but the behavioral tests only exercise ports as part of a multi-component key ((ip4.daddr, udp.dport)). Since the port code path uses a 2-byte narrow context read from the 4-byte user_port field (as documented in the code comment at cgroup_sock_addr.c:288), a standalone port-only behavioral test would exercise this narrow-read path in isolation and increase confidence that the verifier rewriting produces the expected NBO value.


if (set->use_trie) {
const struct bf_matcher_meta *meta = bf_matcher_get_meta(set->key[0]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: Error code inconsistency: _bf_matcher_pkt_generate_set returns -ENOENT for the !meta case here, while _bf_cgroup_sock_addr_generate_set at cgroup_sock_addr.c:253 returns -EINVAL for the same condition. The _bf_rule_check_layer helper in chain.c also uses -ENOENT. Consider aligning on one errno and message pattern across all three paths.

case BF_MATCHER_IP6_DADDR:
case BF_MATCHER_IP6_DNET:
return offsetof(struct bpf_sock_addr, user_ip6);
case BF_MATCHER_TCP_DPORT:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: BF_MATCHER_META_DPORT cannot appear as a set key component: bf_set_new only accepts matcher types that have BF_MATCHER_IN in their ops, and BF_MATCHER_META_DPORT only defines EQ, NE, and RANGE. This case is dead code. Consider removing it or adding a comment explaining it is kept for future-proofing.

@@ -204,6 +208,98 @@ static int _bf_cgroup_sock_addr_generate_port(struct bf_program *program,
bf_matcher_payload(matcher), 2, BPF_REG_1);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: _bf_cgroup_sock_addr_ctx_offset is a non-trivial function mapping matcher types to bpf_sock_addr field offsets, with a sentinel return convention ((size_t)-1 for unsupported types). A brief Doxygen comment documenting the purpose, the return convention, and the handled matcher types would help maintainability. Other comparable helpers in this file (e.g., _bf_cgroup_sock_addr_generate_port) have documentation.

}
assert(program);
assert(matcher);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: (uint32_t)(addr_size * 8) is safe only if addr_size is at most ~536 million, but the real constraint is addr_size <= 16 (IPv6). This invariant is implicit. A brief inline comment or assert(addr_size <= 16) would document the assumption and prevent future misuse if bf_set_generate_trie_lookup is called with unexpected sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant