Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Currently supported collectors:
- [LAG collector](internal/collector/lag_collector.go): collects PortChannel and member state from SONiC Redis.
- [FDB collector](internal/collector/fdb_collector.go): collects FDB summary metrics from SONiC ASIC DB.
- [System collector](internal/collector/system_collector.go): experimental collector for switch identity, software metadata, and uptime using read-only sources (disabled by default).
- [Docker collector](internal/collector/docker_collector.go): experimental collector for container runtime metrics from `STATE_DB` `DOCKER_STATS` (disabled by default).

# Usage

Expand Down Expand Up @@ -63,6 +64,11 @@ Environment variables:
- `SYSTEM_MACHINE_CONF_FILE` - path to machine config file. Default: `/host/machine.conf`.
- `SYSTEM_HOSTNAME_FILE` - path to hostname file. Default: `/etc/hostname`.
- `SYSTEM_UPTIME_FILE` - path to uptime file. Default: `/proc/uptime`.
- `DOCKER_ENABLED` - enable docker collector (experimental). Default: `false`.
- `DOCKER_REFRESH_INTERVAL` - docker cache refresh interval. Default: `60s`.
- `DOCKER_TIMEOUT` - timeout for one docker refresh cycle. Default: `2s`.
- `DOCKER_MAX_CONTAINERS` - maximum number of container entries exported per refresh. Default: `128`.
- `DOCKER_SOURCE_STALE_THRESHOLD` - source age threshold after which docker source is marked stale. Default: `5m`.

## System Collector (Experimental)

Expand Down Expand Up @@ -103,6 +109,32 @@ Debug mode behavior (`--log.level=debug`):
- Logs which data source populated each field.
- Logs when fallback sources are skipped because a higher-priority source already set the field.

## Docker Collector (Experimental)

The `docker_collector` is currently experimental and is disabled by default for stability.

To enable it:
```bash
$ DOCKER_ENABLED=true ./sonic-exporter
```

What this collector exports:

- `sonic_docker_container_info{container="..."}` - container identity metric (value always `1`).
- `sonic_docker_container_cpu_percent`, `sonic_docker_container_memory_usage_bytes`, `sonic_docker_container_memory_limit_bytes`, `sonic_docker_container_memory_percent`.
- `sonic_docker_container_network_receive_bytes_total`, `sonic_docker_container_network_transmit_bytes_total`.
- `sonic_docker_container_block_read_bytes_total`, `sonic_docker_container_block_write_bytes_total`, `sonic_docker_container_pids`.
- `sonic_docker_containers`, `sonic_docker_entries_skipped`, `sonic_docker_source_stale`, `sonic_docker_source_age_seconds`, `sonic_docker_source_last_update_timestamp_seconds`.
- `sonic_docker_collector_success`, `sonic_docker_scrape_duration_seconds`, `sonic_docker_cache_age_seconds`.

Data source and safety behavior:

- Reads only from `STATE_DB` keys `DOCKER_STATS|*` and `DOCKER_STATS|LastUpdateTime`.
- No Docker socket access, no command execution, no writes.
- Export uses `container` label only to keep cardinality controlled.
- Refresh is cached and capped by `DOCKER_MAX_CONTAINERS`.
- Source freshness is tracked with `DOCKER_SOURCE_STALE_THRESHOLD`.

## Validated Platforms

The exporter has been validated on the following platforms:
Expand Down Expand Up @@ -164,6 +196,11 @@ These examples are synthetic and anonymized. Use them as query patterns. Labels
- `sonic_system_software_info{sonic_version="SONiC.202012.example",debian_version="10.13",kernel_version="4.19.0-12-2-amd64",build_commit="193959ba2"} 1`
- Query: `sonic_system_uptime_seconds`

- **Docker collector (experimental, when enabled)** - container runtime metrics from SONiC `STATE_DB`
- `sonic_docker_container_cpu_percent{container="swss"} 1.5`
- `sonic_docker_container_memory_usage_bytes{container="syncd"} 2.09e+08`
- Query: `sonic_docker_source_stale == 1`

- **node_exporter collectors** - host CPU, memory, filesystem
- `node_cpu_seconds_total{cpu="0",mode="idle"} 5.93e+06`
- `node_memory_MemAvailable_bytes 1.24e+10`
Expand Down
4 changes: 4 additions & 0 deletions cmd/sonic-exporter/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ func main() {
lagCollector := collector.NewLagCollector(logger)
fdbCollector := collector.NewFdbCollector(logger)
systemCollector := collector.NewSystemCollector(logger)
dockerCollector := collector.NewDockerCollector(logger)
prometheus.MustRegister(interfaceCollector)
prometheus.MustRegister(hwCollector)
prometheus.MustRegister(crmCollector)
Expand All @@ -80,6 +81,9 @@ func main() {
if systemCollector.IsEnabled() {
prometheus.MustRegister(systemCollector)
}
if dockerCollector.IsEnabled() {
prometheus.MustRegister(dockerCollector)
}

// Node exporter collectors
nodeCollector, err := nodecollector.NewNodeCollector(logger,
Expand Down
30 changes: 30 additions & 0 deletions fixtures/test/state_db_data.json
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,36 @@
"serial": "SN-TEST-0001",
"model": "Model-X",
"revision": "A01"
},
"DOCKER_STATS|0001": {
"NAME": "swss",
"CPU%": "1.5",
"MEM_BYTES": "104857600",
"MEM_LIMIT_BYTES": "2147483648",
"MEM%": "4.88",
"NET_IN_BYTES": "123456",
"NET_OUT_BYTES": "654321",
"BLOCK_IN_BYTES": "1024",
"BLOCK_OUT_BYTES": "2048",
"PIDS": "42"
},
"DOCKER_STATS|0002": {
"NAME": "syncd",
"CPU%": "0.5",
"MEM_BYTES": "209715200",
"MEM_LIMIT_BYTES": "2147483648",
"MEM%": "9.76",
"NET_IN_BYTES": "2222",
"NET_OUT_BYTES": "3333",
"BLOCK_IN_BYTES": "4444",
"BLOCK_OUT_BYTES": "5555",
"PIDS": "17"
},
"DOCKER_STATS|0003": {
"CPU%": "0.1"
},
"DOCKER_STATS|LastUpdateTime": {
"lastupdate": "2020-01-01 00:00:00.000000"
}
}
}
93 changes: 93 additions & 0 deletions internal/collector/collector_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ func TestMain(m *testing.M) {
os.Setenv("LAG_ENABLED", "true")
os.Setenv("FDB_ENABLED", "true")
os.Setenv("SYSTEM_ENABLED", "true")
os.Setenv("DOCKER_ENABLED", "true")
os.Setenv("SYSTEM_COMMAND_ENABLED", "false")
os.Setenv("SYSTEM_VERSION_FILE", "../../fixtures/test/system_sonic_version.yml")
os.Setenv("SYSTEM_MACHINE_CONF_FILE", "../../fixtures/test/system_machine.conf")
Expand All @@ -103,6 +104,7 @@ func TestMain(m *testing.M) {
os.Unsetenv("LAG_ENABLED")
os.Unsetenv("FDB_ENABLED")
os.Unsetenv("SYSTEM_ENABLED")
os.Unsetenv("DOCKER_ENABLED")
os.Unsetenv("SYSTEM_COMMAND_ENABLED")
os.Unsetenv("SYSTEM_VERSION_FILE")
os.Unsetenv("SYSTEM_MACHINE_CONF_FILE")
Expand Down Expand Up @@ -542,3 +544,94 @@ func TestSystemCollector(t *testing.T) {
t.Errorf("unexpected collecting result:\n%s", err)
}
}

func TestDockerCollector(t *testing.T) {
promslogConfig := &promslog.Config{}
logger := promslog.New(promslogConfig)

dockerCollector := NewDockerCollector(logger)

problems, err := testutil.CollectAndLint(dockerCollector)
if err != nil {
t.Error("metric lint completed with errors")
}

for _, problem := range problems {
t.Errorf("metric %v has a problem: %v", problem.Metric, problem.Text)
}

statusMetadata := `
# HELP sonic_docker_collector_success Whether docker collector succeeded
# TYPE sonic_docker_collector_success gauge
# HELP sonic_docker_containers Number of containers with DOCKER_STATS entries
# TYPE sonic_docker_containers gauge
# HELP sonic_docker_entries_skipped Number of docker entries skipped during latest refresh
# TYPE sonic_docker_entries_skipped gauge
# HELP sonic_docker_source_stale Whether DOCKER_STATS source data is stale (1=yes, 0=no)
# TYPE sonic_docker_source_stale gauge
`

statusExpected := `
sonic_docker_collector_success 1
sonic_docker_containers 2
sonic_docker_entries_skipped 1
sonic_docker_source_stale 1
`

if err := testutil.CollectAndCompare(dockerCollector, strings.NewReader(statusMetadata+statusExpected), "sonic_docker_collector_success", "sonic_docker_containers", "sonic_docker_entries_skipped", "sonic_docker_source_stale"); err != nil {
t.Errorf("unexpected collecting result:\n%s", err)
}

containerMetadata := `
# HELP sonic_docker_container_info Container metadata from SONiC DOCKER_STATS, value is always 1
# TYPE sonic_docker_container_info gauge
`

containerExpected := `
sonic_docker_container_info{container="swss"} 1
sonic_docker_container_info{container="syncd"} 1
`

if err := testutil.CollectAndCompare(dockerCollector, strings.NewReader(containerMetadata+containerExpected), "sonic_docker_container_info"); err != nil {
t.Errorf("unexpected collecting result:\n%s", err)
}

cpuMetadata := `
# HELP sonic_docker_container_cpu_percent Container CPU usage percent
# TYPE sonic_docker_container_cpu_percent gauge
`

cpuExpected := `
sonic_docker_container_cpu_percent{container="swss"} 1.5
sonic_docker_container_cpu_percent{container="syncd"} 0.5
`

if err := testutil.CollectAndCompare(dockerCollector, strings.NewReader(cpuMetadata+cpuExpected), "sonic_docker_container_cpu_percent"); err != nil {
t.Errorf("unexpected collecting result:\n%s", err)
}
}

func TestDockerCollectorMaxContainers(t *testing.T) {
t.Setenv("DOCKER_MAX_CONTAINERS", "1")

promslogConfig := &promslog.Config{}
logger := promslog.New(promslogConfig)

dockerCollector := NewDockerCollector(logger)

metadata := `
# HELP sonic_docker_containers Number of containers with DOCKER_STATS entries
# TYPE sonic_docker_containers gauge
# HELP sonic_docker_entries_skipped Number of docker entries skipped during latest refresh
# TYPE sonic_docker_entries_skipped gauge
`

expected := `
sonic_docker_containers 1
sonic_docker_entries_skipped 2
`

if err := testutil.CollectAndCompare(dockerCollector, strings.NewReader(metadata+expected), "sonic_docker_containers", "sonic_docker_entries_skipped"); err != nil {
t.Errorf("unexpected collecting result:\n%s", err)
}
}
Loading