tracerbench

A Linux kernel module that benchmarks the latency overhead of local_irq_disable/enable(), preempt_disable/enable(), and local_irq_save/restore() operations. Its primary purpose is to quantify the performance impact of enabling the tracepoints in these kernel functions. Results are reported in raw CPU cycles (get_cycles()) for minimal timing overhead.

Prerequisites

Kernel headers for the running kernel (for out-of-tree module build)
CONFIG_DEBUG_FS enabled in the kernel configuration
Root access (for loading the module and accessing debugfs)

Building

make -C /lib/modules/$(uname -r)/build M=$PWD modules

Loading

sudo insmod tracerbench.ko

Key Features

Per-CPU benchmarking: spawns one kernel thread per online CPU
Configurable sampling: each thread performs nr_samples timing measurements of:
1. local_irq_disable() + local_irq_enable()
2. preempt_disable() + preempt_enable()
3. local_irq_save() + local_irq_restore()
Statistical analysis: median, average, maximum, and average of the top-N highest samples (max_avg)
Percentile computation: independent single-CPU operation for arbitrary percentile calculation
Results exported via debugfs

How It Works

Benchmark

Writing to the benchmark file triggers a blocking, system-wide benchmark run. The write does not return until all CPUs have finished sampling and the results have been aggregated. Only one benchmark can run at a time; concurrent writes are serialized.

Before spawning threads, the current nr_samples and nr_highest values are snapshotted so that configuration changes via debugfs do not affect a running benchmark. Note that nr_highest is clamped to nr_samples if it exceeds it.

Each CPU then simultaneously collects nr_samples timing measurements of the disable/enable pairs and computes per-CPU statistics (median, average, max). After all CPUs finish, results are aggregated into the global statistics:

median: median of per-CPU medians
avg: mean of per-CPU averages (valid because sample counts are equal across CPUs)
max: maximum across all CPUs
max_avg: average of the globally highest samples, tracked via min-heaps that each CPU feeds into

Percentile

Writing a value (1-100) to the percentile file collects fresh samples on the current CPU only, sorts them, and computes the nth percentile. This is independent of the benchmark flow and uses the live nr_samples value (not the snapshotted benchmark value), so changing nr_samples between a benchmark and a percentile run will affect the number of samples collected.

debugfs Interface

After loading the module, the following tree is created under debugfs:

/sys/kernel/debug/tracerbench/
    nr_samples          (rw)  configuration
    nr_highest          (rw)  configuration
    do_work             (rw)  configuration
    benchmark           (-w)  trigger
    percentile          (-w)  trigger
    irq/
        median          (r-)  result
        average         (r-)  result
        max             (r-)  result
        max_avg         (r-)  result
        percentile      (r-)  result
    preempt/
        median          (r-)  result
        average         (r-)  result
        max             (r-)  result
        max_avg         (r-)  result
        percentile      (r-)  result
    irq_save/
        median          (r-)  result
        average         (r-)  result
        max             (r-)  result
        max_avg         (r-)  result
        percentile      (r-)  result

Configuration Files

File	Description
`nr_samples`	Number of samples per CPU (default: 10,000)
`nr_highest`	Number of highest samples to track for `max_avg` (default: 100)
`do_work`	Simulate critical section work between disable/enable (default: 0)

nr_samples and nr_highest are readable and writable. Zero values are rejected with -EINVAL. do_work is a boolean toggle (0 or 1).

Trigger Files (write-only)

File	Description
`benchmark`	Write anything to start the full per-CPU benchmark run
`percentile`	Write a value between 1 and 100 to compute that percentile

Result Files (read-only)

Results are organized in three subdirectories: irq/, preempt/, and irq_save/.

Each contains:

File	Description
`median`	Median latency (cycles)
`average`	Average latency (cycles)
`max`	Maximum single-sample latency (cycles)
`max_avg`	Average of the top-N highest samples (cycles)
`percentile`	Nth percentile from the last `percentile` run

Example Usage

# Mount debugfs if not already done
mount -t debugfs none /sys/kernel/debug

cd /sys/kernel/debug/tracerbench/

# Configure parameters
echo 50000 > nr_samples
echo 250 > nr_highest

# Run the benchmark (blocks until complete)
echo 1 > benchmark

# View results (values are in CPU cycles)
cat irq/average
cat preempt/max
cat irq_save/average

# Run again with simulated critical section work
echo 1 > do_work
echo 1 > benchmark
cat irq/average        # compare with previous run

# Calculate the 99th percentile (runs on current CPU only)
echo 99 > percentile
cat irq/percentile
cat preempt/percentile
cat irq_save/percentile

Design

The module uses smpboot_register_percpu_thread() to create worker threads and holds cpus_read_lock for the entire run to prevent CPU hotplug from changing the set of online CPUs mid-benchmark.

All threads wait on a completion barrier so they begin sampling at the same time. The time_diff() macro uses token-pasting to expand local_irq into local_irq_disable()/local_irq_enable() (and likewise for preempt), measuring the elapsed time via get_cycles(). A separate time_diff_save_restore() macro handles the local_irq_save()/local_irq_restore() pair, which requires a flags argument.

When do_work is enabled, a noinline function simulate_critical_section() is called between each disable/enable pair. It performs a percpu read-modify-write with a data-dependent branch and a current->pid access, representative of what real critical sections do. The cost of this simulated work is included in the timer overhead measurement and subtracted from each sample, so results still reflect only the disable/enable cost.

Per-CPU statistics are computed locally. Sorting (for median and max) is done in a single pass via median_and_max(). Each CPU then feeds its top nr_highest samples into shared min-heaps under a mutex. The min-heap root is always the smallest of the top-N values, so new samples only replace it if they are larger, efficiently tracking the globally worst-case latencies without requiring O(total_samples) global memory.

Global aggregation computes median-of-medians, mean-of-means (valid because all CPUs contribute equal sample counts), and max-of-maxes. The max_avg statistic is the arithmetic mean of the min-heap contents.

Memory management uses RAII-style __free(kvfree) annotations for automatic cleanup of per-thread buffers. Heap memory is managed manually in benchmark_write() because ownership is transferred out via no_free_ptr() in init_heaps(). Overflow-safe arithmetic (check_add_overflow(), check_mul_overflow()) is used throughout sample accumulation.

Unloading

sudo rmmod tracerbench

License

GPL-2.0-only

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
gpl-2.0.txt		gpl-2.0.txt
tracerbench.c		tracerbench.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tracerbench

Prerequisites

Building

Loading

Key Features

How It Works

Benchmark

Percentile

debugfs Interface

Configuration Files

Trigger Files (write-only)

Result Files (read-only)

Example Usage

Design

Unloading

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tracerbench

Prerequisites

Building

Loading

Key Features

How It Works

Benchmark

Percentile

debugfs Interface

Configuration Files

Trigger Files (write-only)

Result Files (read-only)

Example Usage

Design

Unloading

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages