Skip to content

[TAM]: Add telemetry stats APIs and maximal polling interval for stream telemetry#2214

Open
Pterosaur wants to merge 1 commit intoopencomputeproject:masterfrom
Pterosaur:enhance_stream_telemetry
Open

[TAM]: Add telemetry stats APIs and maximal polling interval for stream telemetry#2214
Pterosaur wants to merge 1 commit intoopencomputeproject:masterfrom
Pterosaur:enhance_stream_telemetry

Conversation

@Pterosaur
Copy link
Collaborator

@Pterosaur Pterosaur commented Oct 5, 2025

Description

Enhance stream telemetry (Phase 2) with the following changes:

1. TAM telemetry stats APIs (inc/saitam.h)

Add stats APIs for TAM telemetry objects for debugging purposes:

  • sai_get_tam_telemetry_stats_fn — get telemetry stats (deprecated, for backward compatibility)
  • sai_get_tam_telemetry_stats_ext_fn — get telemetry stats extended
  • sai_clear_tam_telemetry_stats_fn — clear telemetry stats

New stat counters (sai_tam_telemetry_stat_t):

  • SAI_TAM_TELEMETRY_STAT_INGESTED_RECORDS — total records ingested
  • SAI_TAM_TELEMETRY_STAT_PENDING_READ_RECORDS — records pending processing (gauge)
  • SAI_TAM_TELEMETRY_STAT_CONSUMED_RECORDS — total records consumed
  • SAI_TAM_TELEMETRY_STAT_DROPPED_RECORDS — total records dropped

2. Maximal polling interval (inc/saitypes.h)

Add maximal_polling_interval field to sai_stat_st_capability_t. The collector handles single-cycle counter rollovers, but vendors must ensure data does not roll over twice between two collection intervals. This field allows vendors to specify the upper bound.

Note: Adding a new member to sai_stat_st_capability_t is an ABI-breaking change that causes metadata CI to fail. See PR comments for discussion and proposed workaround.

@Pterosaur Pterosaur force-pushed the enhance_stream_telemetry branch from cab7604 to 0e2f3a0 Compare October 6, 2025 11:14
@opencomputeproject opencomputeproject deleted a comment from azure-pipelines bot Oct 6, 2025
@opencomputeproject opencomputeproject deleted a comment from azure-pipelines bot Oct 6, 2025
@Pterosaur Pterosaur force-pushed the enhance_stream_telemetry branch from 0e2f3a0 to ee630ed Compare October 7, 2025 23:20
@kcudnik
Copy link
Collaborator

kcudnik commented Feb 5, 2026

it will not build, you are breaking backward compatybility with structure item shift

@Pterosaur
Copy link
Collaborator Author

it will not build, you are breaking backward compatybility with structure item shift

Hi @kcudnik , long time no talk. Glad to have your eyes on this.

I’m aware that this PR breaks backward compatibility. The root cause is that the original design didn’t provide APIs for querying SAI statistics of TAM objects.

I attempted to add the new APIs at the end of _sai_tam_api_t to preserve backward compatibility. However, the PR checker rejects it because it requires functions of the same type to be placed consecutively.

Do you have any suggestion?

@kcudnik
Copy link
Collaborator

kcudnik commented Feb 12, 2026

for apis you can add then at the end of api struct, but for extended struct you would need to break compatybility and add exception

@Pterosaur Pterosaur force-pushed the enhance_stream_telemetry branch 2 times, most recently from fd6659b to eda591e Compare March 9, 2026 04:51
@opencomputeproject opencomputeproject deleted a comment from azure-pipelines bot Mar 9, 2026
@opencomputeproject opencomputeproject deleted a comment from azure-pipelines bot Mar 9, 2026
@opencomputeproject opencomputeproject deleted a comment from azure-pipelines bot Mar 9, 2026
@opencomputeproject opencomputeproject deleted a comment from azure-pipelines bot Mar 9, 2026
@opencomputeproject opencomputeproject deleted a comment from azure-pipelines bot Mar 9, 2026
@Pterosaur Pterosaur force-pushed the enhance_stream_telemetry branch from eda591e to f949373 Compare March 9, 2026 05:27
@Pterosaur
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Pterosaur Pterosaur force-pushed the enhance_stream_telemetry branch from f949373 to 90bf1aa Compare March 9, 2026 05:52
@Pterosaur
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@opencomputeproject opencomputeproject deleted a comment from azure-pipelines bot Mar 9, 2026
@Pterosaur Pterosaur force-pushed the enhance_stream_telemetry branch from 90bf1aa to a542cd5 Compare March 9, 2026 06:18
@Pterosaur
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Pterosaur Pterosaur marked this pull request as ready for review March 9, 2026 06:32
@Pterosaur Pterosaur requested a review from tjchadaga March 9, 2026 07:07
Signed-off-by: Ze Gan <ganze718@gmail.com>
@Pterosaur Pterosaur force-pushed the enhance_stream_telemetry branch from a542cd5 to c160b59 Compare March 9, 2026 07:16
@Pterosaur
Copy link
Collaborator Author

Pterosaur commented Mar 9, 2026

@tjchadaga This PR adds a new field maximal_polling_interval to sai_stat_st_capability_t, which changes the struct member count from 2 to 3. This causes the metadata CI checks (meta/structs.pl struct history validation and meta/test.pm size assertion) to fail.

We could work around this by adding sai_stat_st_capability_t to the extensible struct allowlist in both files. Here's the patch that would make CI pass:

meta/structs.pl
--- a/meta/structs.pl
+++ b/meta/structs.pl
@@ -164,18 +164,27 @@ sub BuildCommitHistory
         # of union may not increase by adding members, and actual union size
         # check is performed by sai sanity check
 
+        # Structs that are allowed to append new members at the end (ABI extension).
+        # api_t structs are always allowed. Add other structs here as needed.
+        my @extensible_structs = (
+            "sai_switch_health_data_t",
+            "sai_port_oper_status_notification_t",
+            "sai_stat_st_capability_t",
+        );
+
+        my %extensible = map { $_ => 1 } @extensible_structs;
+
         if ($currCount != $histCount and not $structTypeName =~ /^sai_\w+_api_t$/
-                and $structTypeName ne "sai_switch_health_data_t"
-                and $structTypeName ne "sai_port_oper_status_notification_t")
+                and not $extensible{$structTypeName})
         {
             LogError "FATAL: struct $structTypeName member count differs, was $histCount but is $currCount on commit $commit" if $type eq "struct";
         }
 
         if ($histCount > $currCount)
         {
-            if ($structTypeName eq "sai_port_oper_status_notification_t")
+            if ($extensible{$structTypeName})
             {
-                # we allow this to change back backward compatibility
+                # we allow extensible structs to change for backward compatibility
             }
             else
             {
meta/test.pm
--- a/meta/test.pm
+++ b/meta/test.pm
@@ -646,7 +646,14 @@ sub CreateStructUnionSizeCheckTest
             $STRUCTS{$name} = $name;
 
             next if $name =~ /^sai_\w+_api_t$/; # skip api structs
-            next if $name eq "sai_switch_health_data_t";
+
+            # Skip extensible structs that are allowed to grow (see also structs.pl)
+            my %extensible_structs = map { $_ => 1 } (
+                "sai_switch_health_data_t",
+                "sai_port_oper_status_notification_t",
+                "sai_stat_st_capability_t",
+            );
+            next if $extensible_structs{$name};
 
             my $upname = uc($name);

However, I don't think sai_stat_st_capability_t warrants a special exception in the allowlist — it's a straightforward data struct, not something that needs an extensibility mechanism. A force merge with the ABI break seems more appropriate. Could you advise on the preferred approach?

@Pterosaur
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Pterosaur Pterosaur changed the title [TAM]: Enhance stream telemetry [TAM]: Add telemetry stats APIs and maximal polling interval for stream telemetry Mar 9, 2026
@Pterosaur
Copy link
Collaborator Author

@tjchadaga I've also created a draft PR #2260 that includes the same functional changes plus the meta/structs.pl and meta/test.pm modifications to make CI pass. If you think adding sai_stat_st_capability_t to the extensible struct allowlist is acceptable, we can apply those changes from the draft PR. Otherwise, a force merge on this PR would be the alternative.

@kcudnik
Copy link
Collaborator

kcudnik commented Mar 9, 2026

just to inform, that breaking compatibility can cause some errors or read uninitialized memory, potentially a crash, if in notification struct is differetnt between what syncd expects and what library support, especially if there is a array of structs the alignement will fail, fields will shift, and garbage data will be read

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants