Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Currently supported collectors:
- [VLAN collector](internal/collector/vlan_collector.go): collects VLAN and VLAN member state from SONiC Redis.
- [LAG collector](internal/collector/lag_collector.go): collects PortChannel and member state from SONiC Redis.
- [FDB collector](internal/collector/fdb_collector.go): collects FDB summary metrics from SONiC ASIC DB.
- [System collector](internal/collector/system_collector.go): experimental collector for switch identity, software metadata, and uptime using read-only sources (disabled by default).

# Usage

Expand Down Expand Up @@ -52,6 +53,55 @@ Environment variables:
- `FDB_MAX_ENTRIES` - maximum number of ASIC FDB entries processed per refresh. Default: `50000`.
- `FDB_MAX_PORTS` - maximum number of per-port FDB series exported. Default: `1024`.
- `FDB_MAX_VLANS` - maximum number of per-VLAN FDB series exported. Default: `4096`.
- `SYSTEM_ENABLED` - enable system metadata collector (experimental). Default: `false`.
- `SYSTEM_REFRESH_INTERVAL` - system metadata cache refresh interval. Default: `60s`.
- `SYSTEM_TIMEOUT` - timeout for one system metadata refresh cycle. Default: `4s`.
- `SYSTEM_COMMAND_ENABLED` - allow read-only command fallbacks (`show platform summary`, `show version`, `show platform syseeprom`). Default: `true`.
- `SYSTEM_COMMAND_TIMEOUT` - timeout for one allowed command execution. Default: `2s`.
- `SYSTEM_COMMAND_MAX_OUTPUT_BYTES` - max output bytes read from one command. Default: `262144`.
- `SYSTEM_VERSION_FILE` - path to SONiC version metadata file. Default: `/etc/sonic/sonic_version.yml`.
- `SYSTEM_MACHINE_CONF_FILE` - path to machine config file. Default: `/host/machine.conf`.
- `SYSTEM_HOSTNAME_FILE` - path to hostname file. Default: `/etc/hostname`.
- `SYSTEM_UPTIME_FILE` - path to uptime file. Default: `/proc/uptime`.

## System Collector (Experimental)

The `system_collector` is currently experimental and is disabled by default for stability.

To enable it:
```bash
$ SYSTEM_ENABLED=true ./sonic-exporter
```

What this collector exports:

- `sonic_system_identity_info` - hostname, platform, hwsku, asic, asic_count, serial, model, revision.
- `sonic_system_software_info` - SONiC version, OS version, Debian, kernel, build metadata.
- `sonic_system_uptime_seconds` - uptime in seconds.
- `sonic_system_collector_success`, `sonic_system_scrape_duration_seconds`, `sonic_system_cache_age_seconds`.

Data sources and fallback order:

1. Redis first (`DEVICE_METADATA|localhost`, `CHASSIS_INFO|chassis 1`)
2. Local read-only files (`/etc/sonic/sonic_version.yml`, `/host/machine.conf`, `/etc/hostname`, `/proc/uptime`)
3. Optional read-only command fallback (if `SYSTEM_COMMAND_ENABLED=true`):
- `show platform summary --json`
- `show version`
- `show platform syseeprom`

Read-only and safety behavior:

- No Redis writes and no file writes.
- Command execution is allowlisted.
- Command timeout and output size limits are enforced.
- Missing fields become `unknown` instead of failing scrapes.
- Metadata refresh is cached, so `/metrics` stays responsive.

Debug mode behavior (`--log.level=debug`):

- Logs when fields are missing but expected.
- Logs which data source populated each field.
- Logs when fallback sources are skipped because a higher-priority source already set the field.

## Validated Platforms

Expand Down Expand Up @@ -109,6 +159,11 @@ These examples are synthetic and anonymized. Use them as query patterns. Labels
- `sonic_fdb_entries_by_port{port="Ethernet88"} 214`
- Query: `topk(10, sonic_fdb_entries_by_port)`

- **System collector (experimental, when enabled)** - switch identity and software metadata
- `sonic_system_identity_info{hostname="switch01.example.net",platform="x86_64-vendor_switch-r0",hwsku="Example-SKU",asic="broadcom",asic_count="1",serial="ABC123456",model="Model-X",revision="A01"} 1`
- `sonic_system_software_info{sonic_version="SONiC.202012.example",debian_version="10.13",kernel_version="4.19.0-12-2-amd64",build_commit="193959ba2"} 1`
- Query: `sonic_system_uptime_seconds`

- **node_exporter collectors** - host CPU, memory, filesystem
- `node_cpu_seconds_total{cpu="0",mode="idle"} 5.93e+06`
- `node_memory_MemAvailable_bytes 1.24e+10`
Expand Down
4 changes: 4 additions & 0 deletions cmd/sonic-exporter/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ func main() {
vlanCollector := collector.NewVlanCollector(logger)
lagCollector := collector.NewLagCollector(logger)
fdbCollector := collector.NewFdbCollector(logger)
systemCollector := collector.NewSystemCollector(logger)
prometheus.MustRegister(interfaceCollector)
prometheus.MustRegister(hwCollector)
prometheus.MustRegister(crmCollector)
Expand All @@ -76,6 +77,9 @@ func main() {
if fdbCollector.IsEnabled() {
prometheus.MustRegister(fdbCollector)
}
if systemCollector.IsEnabled() {
prometheus.MustRegister(systemCollector)
}

// Node exporter collectors
nodeCollector, err := nodecollector.NewNodeCollector(logger,
Expand Down
5 changes: 5 additions & 0 deletions fixtures/test/config_db_data.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
{
"id": "CONFIG_DB",
"data": {
"DEVICE_METADATA|localhost": {
"hostname": "switch01.example.net",
"hwsku": "Example-SKU-48X",
"platform": "x86_64-vendor_switch-r0"
},
"PORT|Ethernet0": {
"admin_status": "up",
"alias": "twentyfiveGigE1",
Expand Down
5 changes: 3 additions & 2 deletions fixtures/test/state_db_data.json
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,9 @@
},
"CHASSIS_INFO|chassis 1": {
"psu_num": "2",
"serial": "123456",
"model": "006Y6V"
"serial": "SN-TEST-0001",
"model": "Model-X",
"revision": "A01"
}
}
}
1 change: 1 addition & 0 deletions fixtures/test/system_hostname
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
switch01.example.net
1 change: 1 addition & 0 deletions fixtures/test/system_machine.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
onie_platform=x86_64-dellemc_s5232f_c3538-r0
11 changes: 11 additions & 0 deletions fixtures/test/system_sonic_version.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
build_version: SONiC.SONIC.202012.test
debian_version: '10.13'
kernel_version: '4.19.0-12-2-amd64'
asic_type: broadcom
asic_count: '1'
commit_id: 'abcdef123'
branch: 'test-branch'
release: 'test-release'
build_date: Tue Jan 02 12:34:56 UTC 2024
built_by: ubuntu@sonic-exporter.test
sonic_os_version: '10'
1 change: 1 addition & 0 deletions fixtures/test/system_uptime
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
12345.00 11111.00
70 changes: 70 additions & 0 deletions internal/collector/collector_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ func TestMain(m *testing.M) {
os.Setenv("VLAN_ENABLED", "true")
os.Setenv("LAG_ENABLED", "true")
os.Setenv("FDB_ENABLED", "true")
os.Setenv("SYSTEM_ENABLED", "true")
os.Setenv("SYSTEM_COMMAND_ENABLED", "false")
os.Setenv("SYSTEM_VERSION_FILE", "../../fixtures/test/system_sonic_version.yml")
os.Setenv("SYSTEM_MACHINE_CONF_FILE", "../../fixtures/test/system_machine.conf")
os.Setenv("SYSTEM_HOSTNAME_FILE", "../../fixtures/test/system_hostname")
os.Setenv("SYSTEM_UPTIME_FILE", "../../fixtures/test/system_uptime")
err = populateRedisData()
if err != nil {
slog.Error("failed to populate redis data", "error", err)
Expand All @@ -96,6 +102,12 @@ func TestMain(m *testing.M) {
os.Unsetenv("VLAN_ENABLED")
os.Unsetenv("LAG_ENABLED")
os.Unsetenv("FDB_ENABLED")
os.Unsetenv("SYSTEM_ENABLED")
os.Unsetenv("SYSTEM_COMMAND_ENABLED")
os.Unsetenv("SYSTEM_VERSION_FILE")
os.Unsetenv("SYSTEM_MACHINE_CONF_FILE")
os.Unsetenv("SYSTEM_HOSTNAME_FILE")
os.Unsetenv("SYSTEM_UPTIME_FILE")
os.Exit(exitCode)
}

Expand Down Expand Up @@ -472,3 +484,61 @@ func TestFdbCollector(t *testing.T) {
t.Errorf("unexpected collecting result:\n%s", err)
}
}

func TestSystemCollector(t *testing.T) {
promslogConfig := &promslog.Config{}
logger := promslog.New(promslogConfig)

systemCollector := NewSystemCollector(logger)

problems, err := testutil.CollectAndLint(systemCollector)
if err != nil {
t.Error("metric lint completed with errors")
}

for _, problem := range problems {
t.Errorf("metric %v has a problem: %v", problem.Metric, problem.Text)
}

identityMetadata := `
# HELP sonic_system_identity_info Switch identity metadata, value is always 1
# TYPE sonic_system_identity_info gauge
`

identityExpected := `
sonic_system_identity_info{asic="broadcom",asic_count="1",hostname="switch01.example.net",hwsku="Example-SKU-48X",model="Model-X",platform="x86_64-vendor_switch-r0",revision="A01",serial="SN-TEST-0001"} 1
`

if err := testutil.CollectAndCompare(systemCollector, strings.NewReader(identityMetadata+identityExpected), "sonic_system_identity_info"); err != nil {
t.Errorf("unexpected collecting result:\n%s", err)
}

softwareMetadata := `
# HELP sonic_system_software_info Switch software metadata, value is always 1
# TYPE sonic_system_software_info gauge
`

softwareExpected := `
sonic_system_software_info{branch="test-branch",build_commit="abcdef123",build_date="Tue Jan 02 12:34:56 UTC 2024",built_by="ubuntu@sonic-exporter.test",debian_version="10.13",kernel_version="4.19.0-12-2-amd64",release="test-release",sonic_os_version="10",sonic_version="SONiC.SONIC.202012.test"} 1
`

if err := testutil.CollectAndCompare(systemCollector, strings.NewReader(softwareMetadata+softwareExpected), "sonic_system_software_info"); err != nil {
t.Errorf("unexpected collecting result:\n%s", err)
}

statusMetadata := `
# HELP sonic_system_collector_success Whether system collector succeeded
# TYPE sonic_system_collector_success gauge
# HELP sonic_system_uptime_seconds Switch uptime in seconds
# TYPE sonic_system_uptime_seconds gauge
`

statusExpected := `
sonic_system_collector_success 1
sonic_system_uptime_seconds 12345
`

if err := testutil.CollectAndCompare(systemCollector, strings.NewReader(statusMetadata+statusExpected), "sonic_system_collector_success", "sonic_system_uptime_seconds"); err != nil {
t.Errorf("unexpected collecting result:\n%s", err)
}
}
Loading