In #359 local_pref was renamed to rib_priority and went from an Option<u32> to a u8. This has cause a misalignment with Omicron where route priorities are optional.
When nexus has a route to reconcile to maghemite that has no specified priority, since rib_priority is not optional it forces Nexus into choosing a sentinel when sending routes to maghemite. Then on the next reconciler run, nexus sees that the route it has in the database has no priority, but the route that it sees form maghemite has a priority, so it deletes the route and recreates it. We see this in the following logs
Deletes:
21:53:39.577Z INFO 1a90c686-13b4-4d8f-8ecb-7f6bd4a040cf (ServerContext): deleting static routes
background_task = switch_port_config_manager
file = nexus/src/app/background/tasks/sync_switch_configuration.rs:457
rack_id = 73cdd12c-a64a-43cc-a5c0-12c3add70029
routes = {Switch1: DeleteStaticRouteRequest { v4: DeleteStaticRoute4Request { routes: StaticRoute4List { list: [StaticRoute4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, rib_priority: 1, vlan_id: None }] } }, v6: DeleteStaticRoute6Request { routes: StaticRoute6List { list: [] } } }, Switch0: DeleteStaticRouteRequest { v4: DeleteStaticRoute4Request { routes: StaticRoute4List { list: [StaticRoute4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, rib_priority: 1, vlan_id: None }] } }, v6: DeleteStaticRoute6Request { routes: StaticRoute6List { list: [] } } }}
21:53:39.577Z INFO 1a90c686-13b4-4d8f-8ecb-7f6bd4a040cf (ServerContext): removing static routes
background_task = switch_port_config_manager
file = nexus/src/app/background/tasks/sync_switch_configuration.rs:2343
rack_id = 73cdd12c-a64a-43cc-a5c0-12c3add70029
request = DeleteStaticRouteRequest { v4: DeleteStaticRoute4Request { routes: StaticRoute4List { list: [StaticRoute4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, rib_priority: 1, vlan_id: None }] } }, v6: DeleteStaticRoute6Request { routes: StaticRoute6List { list: [] } } }
switch_location = Switch1
21:53:39.577Z INFO 1a90c686-13b4-4d8f-8ecb-7f6bd4a040cf (ServerContext): removing static routes
background_task = switch_port_config_manager
file = nexus/src/app/background/tasks/sync_switch_configuration.rs:2343
rack_id = 73cdd12c-a64a-43cc-a5c0-12c3add70029
request = DeleteStaticRouteRequest { v4: DeleteStaticRoute4Request { routes: StaticRoute4List { list: [StaticRoute4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, rib_priority: 1, vlan_id: None }] } }, v6: DeleteStaticRoute6Request { routes: StaticRoute6List { list: [] } } }
switch_location = Switch0
Recreates:
21:53:39.578Z INFO 1a90c686-13b4-4d8f-8ecb-7f6bd4a040cf (ServerContext): adding static routes
background_task = switch_port_config_manager
file = nexus/src/app/background/tasks/sync_switch_configuration.rs:463
rack_id = 73cdd12c-a64a-43cc-a5c0-12c3add70029
routes = {Switch0: AddStaticRouteRequest { v4: AddStaticRoute4Request { routes: StaticRoute4List { list: [StaticRoute4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, rib_priority: 1, vlan_id: None }] } }, v6: AddStaticRoute6Request { routes: StaticRoute6List { list: [] } } }, Switch1: AddStaticRouteRequest { v4: AddStaticRoute4Request { routes: StaticRoute4List { list: [StaticRoute4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, rib_priority: 1, vlan_id: None }] } }, v6: AddStaticRoute6Request { routes: StaticRoute6List { list: [] } } }}
21:53:39.578Z INFO 1a90c686-13b4-4d8f-8ecb-7f6bd4a040cf (ServerContext): adding static routes
background_task = switch_port_config_manager
file = nexus/src/app/background/tasks/sync_switch_configuration.rs:2388
rack_id = 73cdd12c-a64a-43cc-a5c0-12c3add70029
request = AddStaticRouteRequest { v4: AddStaticRoute4Request { routes: StaticRoute4List { list: [StaticRoute4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, rib_priority: 1, vlan_id: None }] } }, v6: AddStaticRoute6Request { routes: StaticRoute6List { list: [] } } }
switch_location = Switch0
21:53:39.579Z INFO 1a90c686-13b4-4d8f-8ecb-7f6bd4a040cf (ServerContext): adding static routes
background_task = switch_port_config_manager
file = nexus/src/app/background/tasks/sync_switch_configuration.rs:2388
rack_id = 73cdd12c-a64a-43cc-a5c0-12c3add70029
request = AddStaticRouteRequest { v4: AddStaticRoute4Request { routes: StaticRoute4List { list: [StaticRoute4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, rib_priority: 1, vlan_id: None }] } }, v6: AddStaticRoute6Request { routes: StaticRoute6List { list: [] } } }
switch_location = Switch1
Where the difference is in the priority
21:53:39.577Z INFO 1a90c686-13b4-4d8f-8ecb-7f6bd4a040cf (ServerContext): retrieved existing routes
background_task = switch_port_config_manager
file = nexus/src/app/background/tasks/sync_switch_configuration.rs:433
rack_id = 73cdd12c-a64a-43cc-a5c0-12c3add70029
routes = {Switch1: {V4(SwitchStaticRouteV4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, vlan: None, priority: Some(1) })}, Switch0: {V4(SwitchStaticRouteV4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, vlan: None, priority: Some(1) })}}
21:53:39.577Z INFO 1a90c686-13b4-4d8f-8ecb-7f6bd4a040cf (ServerContext): retrieved desired routes
background_task = switch_port_config_manager
file = nexus/src/app/background/tasks/sync_switch_configuration.rs:437
rack_id = 73cdd12c-a64a-43cc-a5c0-12c3add70029
routes = {Switch1: {V4(SwitchStaticRouteV4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, vlan: None, priority: None })}, Switch0: {V4(SwitchStaticRouteV4 { nexthop: 172.20.15.65, prefix: Prefix4 { value: 0.0.0.0, length: 0 }, vlan: None, priority: None })}}
While this could be fixed in nexus to recognize extended equality over sentinel values, that seems quite fraught. We should have maghemite line up with omicron in terms of having route priorities be optional.
In #359
local_prefwas renamed torib_priorityand went from anOption<u32>to au8. This has cause a misalignment with Omicron where route priorities are optional.When nexus has a route to reconcile to maghemite that has no specified priority, since
rib_priorityis not optional it forces Nexus into choosing a sentinel when sending routes to maghemite. Then on the next reconciler run, nexus sees that the route it has in the database has no priority, but the route that it sees form maghemite has a priority, so it deletes the route and recreates it. We see this in the following logsDeletes:
Recreates:
Where the difference is in the priority
While this could be fixed in nexus to recognize extended equality over sentinel values, that seems quite fraught. We should have maghemite line up with omicron in terms of having route priorities be optional.