[multicast] DDM multicast exchange: V4 protocol, MRIB sync#696
[multicast] DDM multicast exchange: V4 protocol, MRIB sync#696zeeshanlakhani wants to merge 4 commits intozl/mribfrom
Conversation
339f250 to
1b5996d
Compare
1b5996d to
0671b1f
Compare
Adds multicast group subscription distribution to the DDM exchange protocol with a V4 version bump (frozen V3 types for wire compat). Key changes: - V4 exchange protocol with multicast support (V3 peers are unaffected) - UnderlayMulticastIpv6 validated newtype moved to mg-common (ff04::/64) (moved from rdb types) - MRIB->DDM sync in mg-lower/mrib.rs - OPTE M2P hooks for learned multicast routes (requires OPTE #924) - Atomic update_imported_mcast on Db (single lock for import/delete/diff, which is a bit different from the tunnel work) - Collapsed send_update dispatch - Shared pull handler helpers (collect_underlay_tunnel, collect_multicast) - MulticastPathHop constructor - Some serde round-trip and validation tests, including for version handling Stacked on zl/mrib (MRIB: Multicast RIB implementation [#675](#675)).
4281301 to
5d7d89d
Compare
5d7d89d to
4133f8c
Compare
| return; | ||
| } | ||
| let resp = | ||
| rt.block_on(async { client.advertise_multicast_groups(&routes).await }); |
There was a problem hiding this comment.
Is the dendrite side of the syncronization coming later?
| impl PartialEq for MulticastOrigin { | ||
| fn eq(&self, other: &Self) -> bool { | ||
| self.overlay_group == other.overlay_group | ||
| && self.underlay_group == other.underlay_group | ||
| && self.vni == other.vni | ||
| && self.source == other.source | ||
| } | ||
| } |
There was a problem hiding this comment.
I may be overlooking it, but where is this used?
I'm just trying to think about the implications this impl will have, depending on where/how it's used. (I'm a little paranoid after #650)
There was a problem hiding this comment.
Used on Hashset comparisons, but now commented.
There was a problem hiding this comment.
My read on this is that the PartialEq impl is effectively defining what the "identity" of a MulticastOrigin is, based on which members we compare. That way we could do a replace() or something if the same MulticastOrigin (PartialEq returns Ord::Equal) is updated. Assuming I'm tracking properly, I just want to double check which members we're comparing. Is the VNI an attribute of the path or part of the identity? Put another way, should (Group X, VNI X) and (Group X, VNI Y) be considered the same MulticastOrigin or two unique ones?
There was a problem hiding this comment.
I think for some future proofing, we want them to be considered different, as different VNIs mean different network contexts, even if the overlay group address is the same. Even though we're going with the arbitrary VNI setup right now, this keeps us from having logic bugs if we carry multiple VNIs in the future.
|
I left my initial round of review comments, which are mostly nits. I think the one place where I noticed what looks to be a real issue is that we don't cleanup all the OPTE multicast state in At this point, the main questions I have are around the intended operation of DDM as a component of the overall multicast architecture. Until I have a better understanding of the architecture we plan to move forward with, I'm not sure how to give good feedback as to how well these updates to DDM align with it. I started having deeper questions when I started seeing |
This addresses review feedback on DDM multicast exchange PR (#696). Additionally, oriented towards the goal of Omicron owning all OPTE M2P (multicast-to-physical) mappings via the sled-agent, we remove direct OPTE M2P writes from DDM. This was an oversight by me using similar patterns to tunnel routing. The M2P table is global to xde, so having both DDM and Omicron's reconciler write to it creates a conflict risk where Nexus could reap DDM-written entries as stale. Our original intention has always been Omicron driving most things (-> Dpd/Dendrite, -> OPTE) due to the implicit and dynamic lifecycle of multicast groups. The work here is for DDM to distribute multicast membership and expose learned state via its admin API for Omicron to consume (and rely on for port knowledge), which is currently a TODO that will be removed in Omicron once this work is plumbed up. Other changes: - Replace String errors with UnderlayMulticastError in mg-common - Use route key VNI instead of hard-coded DEFAULT_MULTICAST_VNI - Sort ddmadm multicast output by overlay group for deterministic display - Consolidate oxide_vpc imports in sys.rs - Derive Eq on MulticastOrigin, document PartialEq/Hash exclusion of metric with #649 reference - Downgrade multicast withdrawal success log to debug - Fix MRIB diagram indentation - Remove "old" zl/mrib rdb-types UnderlayMulticastIpv6 and use mg-common one - Remove unnecessary DEFAULT_MULTICAST_VNI constant
This addresses review feedback on DDM multicast exchange PR (#696). Additionally, oriented towards the goal of Omicron owning all OPTE M2P (multicast-to-physical) mappings via the sled-agent, we remove direct OPTE M2P writes from DDM. This was an oversight by me using similar patterns to tunnel routing. The M2P table is global to xde, so having both DDM and Omicron's reconciler write to it creates a conflict risk where Nexus could reap DDM-written entries as stale. Our original intention has always been Omicron driving most things (-> Dpd/Dendrite, -> OPTE) due to the implicit and dynamic lifecycle of multicast groups. The work here is for DDM to distribute multicast membership and expose learned state via its admin API for Omicron to consume (and rely on for port knowledge), which is currently a TODO that will be removed in Omicron once this work is plumbed up. Other changes: - Replace String errors with UnderlayMulticastError in mg-common - Use route key VNI instead of hard-coded DEFAULT_MULTICAST_VNI - Sort ddmadm multicast output by overlay group for deterministic display - Consolidate oxide_vpc imports in sys.rs - Derive Eq on MulticastOrigin, document PartialEq/Hash exclusion of metric with #649 reference - Downgrade multicast withdrawal success log to debug - Fix MRIB diagram indentation - Remove "old" zl/mrib rdb-types UnderlayMulticastIpv6 and use mg-common one - Remove unnecessary DEFAULT_MULTICAST_VNI constant
fb8d473 to
1505ba3
Compare
This addresses review feedback on DDM multicast exchange PR (#696). Additionally, oriented towards the goal of Omicron owning all OPTE M2P (multicast-to-physical) mappings via the sled-agent, we remove direct OPTE M2P writes from DDM. This was an oversight by me using similar patterns to tunnel routing. The M2P table is global to xde, so having both DDM and Omicron's reconciler write to it creates a conflict risk where Nexus could reap DDM-written entries as stale. Our original intention has always been Omicron driving most things (-> Dpd/Dendrite, -> OPTE) due to the implicit and dynamic lifecycle of multicast groups. The work here is for DDM to distribute multicast membership and expose learned state via its admin API for Omicron to consume (and rely on for port knowledge), which is currently a TODO that will be removed in Omicron once this work is plumbed up. Other changes: - Replace String errors with UnderlayMulticastError in mg-common - Use route key VNI instead of hard-coded DEFAULT_MULTICAST_VNI - Sort ddmadm multicast output by overlay group for deterministic display - Consolidate oxide_vpc imports in sys.rs - Derive Eq on MulticastOrigin, document PartialEq/Hash exclusion of metric with #649 reference - Downgrade multicast withdrawal success log to debug - Fix MRIB diagram indentation - Remove "old" zl/mrib rdb-types UnderlayMulticastIpv6 and use mg-common one - Remove unnecessary DEFAULT_MULTICAST_VNI constant
1505ba3 to
c2151c6
Compare
Add an optional `if_name` field to PeerInfo (v2) so Omicron can learn which switch port a DDM peer was discovered on. This enables Omicron to use DDM as the primary source of truth for sled-to-port mapping, replacing or cross-validating the current inventory-based approach. The get_peers endpoint is versioned: v2+ returns PeerInfo with if_name, v1 returns the original PeerInfo without it.
|
Thanks @taspelund.
Actually, this was removed, as we didn't need it (more in the next response).
So, per RFD 488's API-driven approach, Omicron is entirely responsible for programming the OPTE MRIB (functionally M2P mappings) and the switch/DPD state (multicast group table state). I'm working on clearing up many bits of this RFD, as we've been working through the end-to-end implementation. So, Omicron does the work here. With the way that multicast groups can dynamically be joined at any time (implicitly), this approach vs our two-fold approach for Unicast (w.r.t. Dendrite/DPD) is preferred. DDM's job is distributing multicast group membership across the underlay, not programming the data plane. I was following bits of the tunnel routing implementation and mistakenly included OPTE M2P writes (tunnel routing writes to the V2B). I've now removed the direct OPTE M2P writes from DDM ( The data flow is now (and should be):
// TODO: Use DDM as the primary source of truth for sled→port
// mapping, with inventory as cross-validation.
//
// Currently we trust inventory (MGS/SP topology) for sled→port
// mapping. DDM (maghemite/ddmd) on switches has authoritative
// knowledge of which sleds are reachable on which ports.
//
// Future approach:
// 1. Query DDM for operational sled→port mapping
// // TODO: Add GET /peers endpoint to ddm-admin-client (`get_peers` already here)
// // returning Map<peer_addr, PeerInfo> where PeerInfo
// // includes port/interface field ---> I had this about ready & it's up now
// 2. Use DDM mapping as primary source for multicast routing
// 3. Cross-validate against inventory to detect mismatches
// 4. On mismatch: invalidate cache, log warning, potentially
// trigger inventory reconciliation
//
// This catches cases where inventory is stale or a sled moved
// but inventory hasn't updated yet.DDM distributes and does peer exchange. Omicron programs the data plane. The Omicron side of consuming DDM's learned multicast state is the next piece of work. On the MRIB entry side, static routes flow in via PUT /static/mroute and DELETE /static/mroute on the mg-admin API. The future mcastd IGMP host-proxy will provide the dynamic route path handling. Regarding peer expiry, the DB cleanup and withdrawal redistribution to peers now correctly handles multicast routes on peer expiry ( Regarding RPF, we operate at two layers. At the DDM CP level, path-vector loop detection serves as CP RPF, as MulticastPathHop carries the router ID and underlay address, and announcements where our hostname already appears in the path are dropped in |
Adds multicast group subscription distribution to the DDM exchange protocol with a V4 version.
Key changes:
References
Stacked on zl/mrib (MRIB: Multicast RIB implementation #675) and depends on oxidecomputer/opte#924.