Skip to content

scope-ml/periodfind

Repository files navigation

PeriodFind

A collection of CUDA-accelerated periodicity detection algorithms, with both C++ and Python APIs. Includes a Rust-based CPU backend for environments without GPU hardware.

Algorithms

Period-Finding

Algorithm Unified API GPU (CUDA) CPU (Rust)
Conditional Entropy periodfind.ConditionalEntropy periodfind.gpu.ConditionalEntropy periodfind.cpu.ConditionalEntropy
Analysis of Variance periodfind.AOV periodfind.gpu.AOV periodfind.cpu.AOV
Lomb-Scargle periodfind.LombScargle periodfind.gpu.LombScargle periodfind.cpu.LombScargle
Fast Phase-folding Weighted periodfind.FPW periodfind.gpu.FPW periodfind.cpu.FPW
Box Least Squares periodfind.BoxLeastSquares periodfind.gpu.BoxLeastSquares periodfind.cpu.BoxLeastSquares

Feature Extraction

Algorithm Unified API CPU (Rust)
Fourier Decomposition periodfind.FourierDecomposition periodfind.cpu.FourierDecomposition

Fourier decomposition computes weighted linear least-squares Fourier fits with BIC model selection (0-5 harmonics) for a batch of light curves given pre-determined periods. Returns 14 features per curve: [power, BIC, offset, slope, A1, B1, A2, B2, A3, B3, A4, B4, A5, B5]. This replaces the per-source scipy.optimize.curve_fit approach with a direct Cholesky solve, giving identical results orders of magnitude faster.

Device API

Periodfind provides a PyTorch-style device abstraction so you can write device-agnostic code. When no device is set, it auto-detects GPU availability (tries to import the CUDA extensions and runs nvidia-smi).

import periodfind

# Set the global default device
periodfind.set_device('cpu')   # or 'gpu'
print(periodfind.get_device()) # 'cpu'

# Factory functions dispatch to the right backend
ce  = periodfind.ConditionalEntropy(n_phase=10, n_mag=10)
aov = periodfind.AOV(n_phase=15)
ls  = periodfind.LombScargle()
fpw = periodfind.FPW(n_bins=10)
bls = periodfind.BoxLeastSquares(n_bins=50, qmin=0.01, qmax=0.5)
fd  = periodfind.FourierDecomposition()  # CPU-only for now

# Per-call override (ignores the global default)
ce_gpu = periodfind.ConditionalEntropy(n_phase=10, n_mag=10, device='gpu')

You can still import backends directly:

from periodfind.gpu import ConditionalEntropy  # CUDA backend
from periodfind.cpu import ConditionalEntropy  # Rust CPU backend
from periodfind.cpu import FourierDecomposition  # Rust CPU only

Box Least Squares Usage

BLS searches for periodic box-shaped (flat-bottom) transit dips in time-series data (Kovacs, Zucker & Mazeh 2002). It is particularly well-suited for detecting eclipsing binaries and transiting exoplanets.

import numpy as np
import periodfind

bls = periodfind.BoxLeastSquares(
    n_bins=50,     # number of phase bins
    qmin=0.01,     # minimum transit duration (fraction of period)
    qmax=0.5,      # maximum transit duration (fraction of period)
)

# times, mags: lists of float32 arrays (one per light curve)
# errs: optional list of float32 uncertainty arrays
periods = np.linspace(0.5, 10.0, 5000, dtype=np.float32)
period_dts = np.array([0.0], dtype=np.float32)

# Get best-period statistics
stats = bls.calc(times, mags, periods, period_dts, errs=errs, output="stats")
print(stats[0].params[0])  # detected period

# Get full periodogram
pgrams = bls.calc(times, mags, periods, period_dts, output="periodogram")

# Get top-N peaks (memory-efficient for large grids)
peaks = bls.calc(times, mags, periods, period_dts, output="peaks", n_peaks=32)

Fourier Decomposition Usage

import numpy as np
import periodfind

fd = periodfind.FourierDecomposition()

# times, mags, errs: lists of float32 arrays (one per light curve)
# periods: float32 array with one period per curve
features = fd.calc(times, mags, errs, periods)
# features.shape == (n_curves, 14)

Throughput Benchmarks

Measured on a batch of 100 light curves over 1000 trial periods (single period_dt). CPU = Rust/Rayon (28 cores, Skylake Xeon); GPU = NVIDIA Tesla P100 (12 GB). Times are median of 3 runs after warmup.

Throughput table (points/sec)

pts/curve Backend CE AOV LS FPW BLS
256 CPU 140K 184K 146K 245K 121K
256 1x P100 1.1M 1.1M 1.2M 1.1M 1.0M
256 2x P100 1.1M 1.2M 1.4M 1.2M 1.2M
1024 CPU 176K 211K 181K 290K 228K
1024 1x P100 3.8M 3.1M 4.5M 2.7M 3.2M
1024 2x P100 4.1M 3.9M 5.1M 3.6M 4.1M
4096 CPU 185K 217K 194K 307K 293K
4096 1x P100 9.8M 3.2M 13.2M 3.5M 6.2M
4096 2x P100 12.7M 5.6M 16.5M 6.1M 9.6M
8192 CPU 186K 219K 199K 309K 307K
8192 1x P100 13.7M 3.7M 19.8M 5.6M 5.5M
8192 2x P100 19.6M 6.8M 27.6M 9.9M 9.8M

GPU kernels use a hybrid atomic/privatization strategy — shared-memory atomics for small point counts (low overhead, no register pressure) and per-thread register privatization with warp-shuffle reduction for large point counts (no atomic contention). This eliminates the throughput dip that pure privatization caused at small N, while preserving scalability at large N.

Throughput plot (log-log scale)

Throughput benchmark

Solid lines = 1x P100, dash-dot = 2x P100, dashed lines = CPU (Rust). All algorithms benefit from the GPU across the full range of point counts. LS reaches 20M pts/sec on 1x P100 at 8K points (100x over CPU). BLS reaches 5.5M pts/sec on 1x P100 (18x over CPU).

See the full benchmarks page for the full table, 2x P100 data, and methodology.

To reproduce, run python benchmarks/throughput_bench.py followed by python benchmarks/plot_throughput.py. Use sbatch benchmarks/run_bench.sh for multi-GPU benchmarks on a SLURM cluster.

Installing

GPU backend (CUDA)

Requires CUDA installed with nvcc on your PATH (or set $CUDA_HOME).

pip install cython numpy
pip install -e .

CPU backend (Rust)

Requires a Rust toolchain and maturin:

pip install maturin
cd rust && maturin develop --release

This builds the periodfind.cpu module using Rayon for multithreaded parallelism. No GPU needed.

Python API

Ensure that Cython and numpy are both installed. Then, simply run:

python setup.py install

And periodfind should be installed!

Testing

Run the full test suite with pytest:

pytest tests/ -v

Tests are organized into four categories:

  • Unit tests (test_periodfind.py): Statistics, Periodogram, and utility tests (no GPU or Rust needed)
  • CPU standalone tests (test_cpu_standalone.py): Tests for the Rust CPU backend (period-finding algorithms)
  • Fourier tests (test_fourier.py): Tests for Fourier decomposition (output shape, known signal recovery, edge cases, input validation)
  • GPU integration tests (test_cpu_vs_cuda.py): CUDA algorithm tests (auto-skipped if no GPU is available)

To run only CPU tests (no GPU required):

pytest tests/test_periodfind.py tests/test_cpu_standalone.py tests/test_fourier.py -v

CI

GitHub Actions runs CPU tests automatically on every push and PR. See .github/workflows/tests.yml. GPU tests run on self-hosted runners when available.

Compatibility

This package has been tested only on Linux hosts running CUDA 10.2 and CUDA 11. Other operating systems and versions of CUDA may work, but it is not guaranteed.

Acknowledgements

Funding for this project was provided by the Larson Scholar Fellowship as part of the SURF program.

License

This package is licensed under the BSD 3-clause license. The copyright holder is the California Institute of Technology (Caltech).

setup.py and MANIFEST.in are based off of an example project at https://github.com/rmcgibbo/npcuda-example/, licensed under the BSD 2-clause license.

About

CUDA accelerated periodicity detection algorithms

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors