Skip to content

r-lapins/Process-Data-Toolkit

Repository files navigation

Process Data Toolkit (PDT)

CI

Modern C++20 library and CLI tools for CSV time-series processing and WAV signal analysis.


Related project

This library powers a desktop application:

👉 Process Data Viewer (Qt)

The viewer provides an interactive GUI for:

  • CSV anomaly analysis
  • WAV signal and spectrum analysis
  • visualization and export tools

Project goals

This project demonstrates modern C++ development practices and serves as a portfolio example.

Key aspects:

  • Modern C++20 design
  • Clean separation between CLI and reusable core library
  • Reproducible builds using CMake presets
  • CI (GCC + Clang)
  • Sanitizers (ASan + UBSan)
  • Static analysis (clang-tidy)
  • Debugging and memory analysis (GDB + Valgrind)
  • Unit testing

Development

Development notes and CI instructions are available here: docs/DEVELOPMENT.md.


Features

CSV data processing

Notes and instructions are available here: docs/CSV.md.

  • CLI data processing tool for CSV files (pdt_csv_cli)
  • CSV parser with import summary (parsed_ok, skipped)
  • Optional display of skipped CSV rows with line numbers (--skipped)
  • ISO 8601 timestamp parsing using std::chrono
  • Data filtering by sensor and time range
  • Domain model based on DataSet class
  • Statistical analysis (count, mean, min, max, stddev)
  • Per-sensor statistics mode (--per-sensor)
  • Configurable anomaly detection (zscore, iqr, mad) with threshold and top-N output (--z, --method, --top)
  • JSON report export (--out)
  • CSV export with anomaly markers for top detected anomalies (--out-marked-csv)

WAV signal analysis

Notes and instructions are available here: docs/WAV.md.

  • CLI spectrum analysis tool for WAV files (pdt_wav_cli)
  • Discrete Fourier Transform (DFT)
  • Radix-2 Fast Fourier Transform (FFT)
  • Automatic DFT / FFT selection depending on segment length
  • Single-sided spectrum computation
  • Window functions: Hann and Hamming
  • Spectral peak detection (ThresholdOnly, LocalMaxima)
  • Detection of peaks and selection of the dominant peak separately
  • WAV reader (RIFF/WAVE PCM16 mono)
  • Synthetic signal spectrum analysis demo (pdt_wav_synth_demo)
  • CSV export of computed spectrum (--out)
  • Text report export (--out-r)
  • DFT vs FFT runtime benchmark tool (fft_benchmark)

Example outputs

CSV CLI can:

  • print import/skipped row summaries
  • generate JSON reports
  • export anomaly-marked CSV files

WAV CLI can:

  • print spectral peak reports
  • export spectrum CSV
  • export text reports

Quick start

cmake --preset debug
cmake --build --preset debug

Run CSV CLI:

./build/debug/pdt_csv_cli --in examples/sample.csv

Run WAV CLI:

./build/debug/pdt_wav_cli --in examples/HDSDR_20230515_072359Z_15047kHz_AF.wav

Project structure

include/pdt/        public API
include/pdt/csv/    CSV data processing API
include/pdt/wav/    WAV signal processing API

src/csv/            CSV data processing implementation
src/wav/            WAV signal processing implementation

app/                CLI applications
tests/              unit tests
examples/           sample CSV and WAV inputs and outputs
bench/              performance benchmarks
.github/            CI workflows

Modules

  • pdt/csv — CSV parsing, filtering, statistics, anomaly detection, report/output helpers
  • pdt/wav — WAV parsing, windowing, DFT/FFT, spectrum computation, peak detection, spectrum export

Requirements

  • CMake 3.25+
  • Ninja
  • C++20 compatible compiler
  • Linux environment is recommended

Algorithms

Standard deviation:

σ = sqrt( Σ(x - μ)² / N )

Anomaly detection methods:

The CSV CLI supports three anomaly detection methods:

- Z-score

z = (x - μ) / σ

Samples with |z| > threshold are reported as anomalies.

- IQR

The interquartile range method uses:

IQR = Q3 - Q1

Samples outside the interval

[Q1 - threshold · IQR, Q3 + threshold · IQR]

are reported as anomalies.

- MAD

The median absolute deviation method uses:

MAD = median(|x - median(x)|)

A robust anomaly score is computed:

score = (x - median(x)) / MAD

Samples with |score| > threshold are reported as anomalies.

WAV signal processing methods:

- Discrete Fourier Transform (DFT)

X[k] = Σ x[n] · e^(−j2πkn/N),  k = 0..N−1

Current implementation is O(N²) and serves as a reference implementation.

- Fast Fourier Transform (FFT)

The project implements a radix-2 Cooley–Tukey FFT algorithm.

The FFT recursively decomposes the DFT into even and odd indexed samples:

X[k] = E[k] + W_N^k · O[k]
X[k + N/2] = E[k] - W_N^k · O[k]

where:

W_N^k = e^(−j2πk/N)

The algorithm requires the input size to be a power of two and has time complexity O(N log N)

- Spectral peak detection

Two strategies:

ThresholdOnly

X[i] >= threshold_ratio · max(X)

LocalMaxima

X[i] > X[i-1] && X[i] > X[i+1]

Library usage

Example:

#include <pdt/csv/dataset.h>
#include <pdt/csv/csv_reader.h>

#include <fstream>

int main() {
    std::ifstream in("examples/sample.csv");

    auto import = pdt::read_csv(in);

    pdt::DataSet ds{std::move(import.samples)};

    auto stats = ds.stats();

    return 0;
}

Future work

Possible next steps:

  • Streaming / online anomaly detection
  • Additional window functions
  • Spectrogram computation

License

MIT License

About

Modern C++20 project implementing time-series data processing and basic signal analysis (DFT, anomaly detection). Emphasis on clean architecture, reproducible builds (CMake), testing, sanitizers, static analysis, and CI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors