BeatIt

Machine Learning assisted Beat and Downbeat Tracking Framework for macOS.

Core Values

Speed

BeatIt is designed to stay fast on full-song material without giving up the sparse correction stages that make the result drift-resistant.

On macOS, BeatIt makes heavy use of Accelerate where that is a good fit for the DSP and feature pipeline. The project is intentionally built as a native C++/Objective-C++ pipeline rather than as a Python-first stack that orchestrates model code and feature math from the interpreter layer. In practice, that matters: even when Python/PyTorch solutions eventually call optimized math kernels underneath, the overall application path still carries Python runtime, tensor plumbing, and export/runtime indirection that BeatIt avoids in its default path.

Example Run with CoreML: Beat This! on Apple Silicon (M1max)

input: 19052559 decoded samples at 44100 Hz, playtime 432.031 sec
total BeatIt runtime: 3.171 sec

That is roughly 136x faster than real time for a 7.2 minute song in this configuration.

Accuracy

BeatIt is not just a thin wrapper around the underlying models. The current pipeline adds window selection, tempo fitting, sparse edge correction, downbeat phase handling, and projected-grid cleanup on top of the raw model output.

In practice, that postprocessing is where a large part of the final quality comes from:

it reduces drift that is still present in the raw model/default decode
it fixes beat-phase and downbeat-phase mistakes that the model alone still makes
it produces a much more stable projected grid over the full file

The maintained reference set in training/test_wavs/ documents this explicitly file by file. Several canonical files are currently exact or perceptually exact in tempo, drift, and bar phase after BeatIt postprocessing, while the same material was visibly worse with the model output alone.

Flexibility

BeatIt is intentionally not locked to one inference stack. The project currently supports two entirely different backend families:

CoreML for the native macOS path
Torch for the PyTorch model path

Those backends are loaded through plugins rather than hard-linked into one monolithic binary. That matters in practice:

the default CoreML path stays lightweight
the heavy Torch runtime closure, easily well above 200 MB, remains optional
the project can ship one product while still tracking multiple model ecosystems

This is not just a packaging detail. It is a deliberate architectural choice:

BeatIt can stay fast and native on the default macOS path
while still being able to adopt new model exports, new research code, or new runtime stacks when machine-learning approaches to tempo, beat, and downbeat detection move forward

That flexibility is what lets BeatIt combine a strongly optimized native path with the ability to follow the latest model trends instead of freezing the project around one runtime forever.

Features

Sparse Selection

For normal single-song analysis, BeatIt does not trust one dense window blindly. The high-level sparse workflow is:

pick widely separated measurement regions
- BeatIt starts from left and right anchor regions rather than dense full-file inference
- this reduces long-range drift risk immediately
quality-gate those regions
- weak intros, weak outros, long breaks, and unstable onset regions are rejected or moved
- the goal is to measure where the rhythm is actually observable
estimate a global tempo from the anchor evidence
- the separated windows are fused into one constant-tempo hypothesis
- tempo-mode mismatches such as half/double-time are normalized before the final choice
validate the interior
- BeatIt inspects interior regions to check whether the projected grid still matches the file away from the edges
- those interior checks are also quality-gated instead of assuming the geometric middle is usable
refine the projected grid
- sparse edge correction adjusts the global grid using the left and right usable evidence
- downbeat phase is then preserved or corrected on top of that beat grid
emit one constant, drift-resistant result
- final output is a single BPM plus projected beat/downbeat events across the whole file

That is the main reason BeatIt can outperform the raw model/default decode on tempo drift, beat phase, and downbeat phase without resorting to dense full-file processing.

Tempo Fitting

BeatIt assumes constant tempo for normal single-song material and fits one global tempo from multiple sparse measurements instead of trusting a single local estimate.

The high-level fitting process is:

measure tempo candidates in separated windows
- left and right anchor regions provide independent local tempo evidence
normalize tempo modes
- half-time, double-time, triplet-like, and related mode errors are mapped into the same tempo neighborhood before comparison
fuse the anchor evidence
- BeatIt chooses the tempo that best explains both anchors together rather than whichever window looked strongest in isolation
project a constant grid across the file
- the chosen BPM is used to generate one global beat grid over the full duration
validate and lightly correct
- interior checks and sparse edge fitting verify whether that constant grid still lands on the actual rhythmic material
- if not, BeatIt applies small corrections instead of re-running a dense decode over the whole file

This is why BeatIt can be exact in tempo and still stay drift-resistant over the full playtime: tempo is treated as a global fit problem, not as one local guess copied across the file.

Sparse Edge Correction

Once BeatIt has a constant-tempo hypothesis, it still does not assume the projected grid is finished. The sparse edge-correction stage is responsible for removing the remaining long-range phase error.

The high-level correction process is:

anchor the grid on usable left and right evidence
- BeatIt compares the projected beat grid against the actually measurable material near the left and right sparse windows
estimate edge error
- if the projected grid is slightly early or late at either edge, BeatIt measures that as a timing offset rather than immediately throwing away the whole tempo fit
refit the global line
- the start and end offsets are used to slightly retune the projected constant grid across the full file
- this is the main mechanism that removes slow accumulated drift
validate the interior again
- after edge refitting, BeatIt checks whether the interior still agrees with the corrected grid
- weak interior regions are not trusted blindly; they are quality-gated first
guard against phase mistakes
- BeatIt can try small phase alternatives when the corrected beat grid still looks wrong
- those alternatives are accepted only if they improve the full-file alignment rather than just one local window
preserve the musical result
- beat drift is corrected first
- downbeat/bar phase is then preserved or repaired on top of that corrected beat grid

This stage is the reason BeatIt can often turn a "nearly right" model decode into a grid that holds from first beat to last beat without visibly drifting away.

Downbeat Phase Handling

BeatIt treats downbeat phase as a separate problem from plain beat placement. A beat grid can have the correct tempo and still mark the wrong beat of the bar as beat one.

The high-level downbeat process is:

decode beat and downbeat evidence separately
- beat activations establish the pulse
- downbeat activations provide the bar-phase evidence
build the beat grid first
- BeatIt first solves tempo and beat drift
- downbeat handling is applied on top of that corrected beat grid rather than mixed into the initial tempo fit
score bar-phase candidates
- candidate bar phases are compared against the available downbeat evidence
- if downbeat evidence is weak, BeatIt falls back to beat-structure heuristics instead of blindly trusting noise
reject musically implausible starts
- the first projected full bar is checked against the observed material
- this prevents obvious one-beat-early or one-beat-late bar starts from surviving just because the pulse itself looked plausible
preserve bar phase through later correction
- once a bar phase has been chosen, BeatIt tries to keep that phase stable while sparse edge correction refines the beat grid
- this avoids fixing drift only to lose the correct downbeat again
emit beat and downbeat events together
- final output keeps one constant beat grid plus a consistent bar-phase assignment across the file

This separation is important: tempo accuracy, beat phase, and downbeat phase are related, but they are not the same problem and they are not solved by the same evidence.

Build

cmake -S . -B build -G Ninja
cmake --build build

Backend Plugins

BeatIt loads inference backends through a plugin layer instead of hard-linking every backend into the main binary/framework.

Current backend plugins:

CoreML
Torch

This has two practical benefits:

the default CLI/framework can start without dragging optional backend runtimes into process startup
backend-specific dependencies, especially the Torch runtime closure, stay isolated behind the backend that actually needs them

That plugin split is the reason the CoreML path can stay lightweight while Torch remains available as an optional backend.

Usage

CLI

Basic:

./build/beatit --input /path/to/audio.wav

Model selection:

# BeatThis CoreML (default)
./build/beatit --input training/manucho.wav

# BeatTrack CoreML
./build/beatit --input training/manucho.wav --beattrack

# BeatThis Torch
./build/beatit --input training/manucho.wav --backend torch --torch-model models/BeatThis_small0.pt

Important options:

-i, --input <path>
--backend <coreml|torch>
--preset <beattrack|beatthis>
--device <auto|cpu|gpu|neural>
--model <path>
--dump-events
--min-bpm <bpm> / --max-bpm <bpm> (validated in [70,180])
--dbn / --no-dbn
--log-level <error|warn|info|debug>
--model-info

Use ./build/beatit --help for the authoritative option list.

Device guidance:

CoreML:
- default: --device auto
- recommendation: keep auto for normal use
- auto maps to CoreML MLComputeUnitsAll, so CoreML may choose CPU, GPU, Neural Engine, or a mixed execution plan
- use --device gpu or --device neural only for explicit benchmarking or backend comparison
- use --device cpu for debugging, deterministic fallback, or CI-style CPU validation
Torch:
- default: --device auto
- recommendation: use --device gpu on Apple Silicon when validating Torch MPS explicitly
- current Torch mapping treats auto and gpu as "prefer MPS"
- if MPS is unavailable or unsupported for a given model/runtime combination, BeatIt falls back to CPU
- use --device cpu when you want the most conservative Torch path
- --device neural has no Torch equivalent and currently falls back to CPU with a warning

Library

#include "beatit/stream.h"

beatit::BeatitConfig cfg;
if (auto preset = beatit::make_coreml_preset("beatthis")) {
    preset->apply(cfg);
}

beatit::BeatitStream stream(sample_rate, cfg, true);
double start_s = 0.0;
double duration_s = 0.0;
if (stream.request_analysis_window(&start_s, &duration_s)) {
    beatit::AnalysisResult result =
        stream.analyze_window(start_s, duration_s, total_duration_s, provider);
}

Contract notes:

request_analysis_window(...) returns the preferred seed window size/start.
analyze_window(...) is a single call from the integrator side.
In sparse mode, BeatIt may call provider(start, duration, out_samples) multiple times internally (left/right/interior probes) before returning the final result.
The provider must therefore be re-entrant for arbitrary (start, duration) requests within the file bounds.

Provider contract:

BeatitStream::SampleProvider:

using SampleProvider =
    std::function<std::size_t(double start_seconds,
                              double duration_seconds,
                              std::vector<float>* out_samples)>;

What BeatIt expects:

start_seconds / duration_seconds are absolute requests on the original file timeline.
Returned samples must be mono float PCM at the same sample rate used to construct BeatitStream.
The callback must fill *out_samples and return the exact valid sample count.
On out-of-range or read failure: clear *out_samples and return 0.
The callback can be called multiple times per analyze_window(...) call in sparse mode, so keep it re-entrant and deterministic.

Framework

The packaged BeatIt.framework ships with both backend plugins:

CoreML plugin
Torch plugin

It also ships the Torch runtime closure needed by the Torch plugin.

If you only use the default CoreML path, you can remove the Torch side from the framework and still keep full CoreML functionality. Concretely, the CoreML-only framework can drop:

Resources/plugins/libbeatit_backend_torch.dylib
the bundled Torch dylibs in Resources/plugins/ such as:
- libtorch.dylib
- libtorch_cpu.dylib
- libtorch_global_deps.dylib
- libc10.dylib
- and any additional Torch dependency closure copied beside them
Resources/models/BeatThis_small0.pt

What must remain for the CoreML path:

Resources/plugins/libbeatit_backend_coreml.dylib
Resources/models/BeatThis_small0.mlpackage

So if your app never selects the Torch backend, you can trim the framework footprint substantially without losing the default BeatIt experience.

Tests

Run all:

ctest --test-dir build --output-on-failure

CPU-only (for environments where GPU/MPS is unavailable):

BEATIT_TEST_CPU_ONLY=1 ctest --test-dir build --output-on-failure

Credits

BeatTrack — Matthew Rice — https://github.com/mhrice/BeatTrack — MIT
Beat This! — Francesco Foscarin, Jan Schlueter, Gerhard Widmer — https://github.com/CPJKU/beat_this — MIT

Name		Name	Last commit message	Last commit date
Latest commit History 408 Commits
.github/workflows		.github/workflows
Formula		Formula
include/beatit		include/beatit
models		models
scripts		scripts
site/images		site/images
src		src
tests		tests
third_party		third_party
tools		tools
training		training
.gitignore		.gitignore
.gitmodules		.gitmodules
BACKLOG.md		BACKLOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BeatIt

Core Values

Speed

Example Run with CoreML: Beat This! on Apple Silicon (M1max)

Accuracy

Flexibility

Features

Sparse Selection

Tempo Fitting

Sparse Edge Correction

Downbeat Phase Handling

Build

Backend Plugins

Usage

CLI

Library

Framework

Tests

Credits

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BeatIt

Core Values

Speed

Example Run with CoreML: Beat This! on Apple Silicon (M1max)

Accuracy

Flexibility

Features

Sparse Selection

Tempo Fitting

Sparse Edge Correction

Downbeat Phase Handling

Build

Backend Plugins

Usage

CLI

Library

Framework

Tests

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages