Machine Learning assisted Beat and Downbeat Tracking Framework for macOS.
BeatIt is designed to stay fast on full-song material without giving up the sparse correction stages that make the result drift-resistant.
On macOS, BeatIt makes heavy use of Accelerate where that is a good fit for the DSP and feature pipeline. The project is intentionally built as a native C++/Objective-C++ pipeline rather than as a Python-first stack that orchestrates model code and feature math from the interpreter layer. In practice, that matters: even when Python/PyTorch solutions eventually call optimized math kernels underneath, the overall application path still carries Python runtime, tensor plumbing, and export/runtime indirection that BeatIt avoids in its default path.
- input:
19052559decoded samples at44100 Hz, playtime432.031 sec - total BeatIt runtime:
3.171 sec
That is roughly 136x faster than real time for a 7.2 minute song in this configuration.
BeatIt is not just a thin wrapper around the underlying models. The current pipeline adds window selection, tempo fitting, sparse edge correction, downbeat phase handling, and projected-grid cleanup on top of the raw model output.
In practice, that postprocessing is where a large part of the final quality comes from:
- it reduces drift that is still present in the raw model/default decode
- it fixes beat-phase and downbeat-phase mistakes that the model alone still makes
- it produces a much more stable projected grid over the full file
The maintained reference set in training/test_wavs/ documents this explicitly file by file.
Several canonical files are currently exact or perceptually exact in tempo, drift, and bar
phase after BeatIt postprocessing, while the same material was visibly worse with the model
output alone.
BeatIt is intentionally not locked to one inference stack. The project currently supports two entirely different backend families:
- CoreML for the native macOS path
- Torch for the PyTorch model path
Those backends are loaded through plugins rather than hard-linked into one monolithic binary. That matters in practice:
- the default CoreML path stays lightweight
- the heavy Torch runtime closure, easily well above
200 MB, remains optional - the project can ship one product while still tracking multiple model ecosystems
This is not just a packaging detail. It is a deliberate architectural choice:
- BeatIt can stay fast and native on the default macOS path
- while still being able to adopt new model exports, new research code, or new runtime stacks when machine-learning approaches to tempo, beat, and downbeat detection move forward
That flexibility is what lets BeatIt combine a strongly optimized native path with the ability to follow the latest model trends instead of freezing the project around one runtime forever.
For normal single-song analysis, BeatIt does not trust one dense window blindly. The high-level sparse workflow is:
-
pick widely separated measurement regions
- BeatIt starts from left and right anchor regions rather than dense full-file inference
- this reduces long-range drift risk immediately
-
quality-gate those regions
- weak intros, weak outros, long breaks, and unstable onset regions are rejected or moved
- the goal is to measure where the rhythm is actually observable
-
estimate a global tempo from the anchor evidence
- the separated windows are fused into one constant-tempo hypothesis
- tempo-mode mismatches such as half/double-time are normalized before the final choice
-
validate the interior
- BeatIt inspects interior regions to check whether the projected grid still matches the file away from the edges
- those interior checks are also quality-gated instead of assuming the geometric middle is usable
-
refine the projected grid
- sparse edge correction adjusts the global grid using the left and right usable evidence
- downbeat phase is then preserved or corrected on top of that beat grid
-
emit one constant, drift-resistant result
- final output is a single BPM plus projected beat/downbeat events across the whole file
That is the main reason BeatIt can outperform the raw model/default decode on tempo drift, beat phase, and downbeat phase without resorting to dense full-file processing.
BeatIt assumes constant tempo for normal single-song material and fits one global tempo from multiple sparse measurements instead of trusting a single local estimate.
The high-level fitting process is:
-
measure tempo candidates in separated windows
- left and right anchor regions provide independent local tempo evidence
-
normalize tempo modes
- half-time, double-time, triplet-like, and related mode errors are mapped into the same tempo neighborhood before comparison
-
fuse the anchor evidence
- BeatIt chooses the tempo that best explains both anchors together rather than whichever window looked strongest in isolation
-
project a constant grid across the file
- the chosen BPM is used to generate one global beat grid over the full duration
-
validate and lightly correct
- interior checks and sparse edge fitting verify whether that constant grid still lands on the actual rhythmic material
- if not, BeatIt applies small corrections instead of re-running a dense decode over the whole file
This is why BeatIt can be exact in tempo and still stay drift-resistant over the full playtime: tempo is treated as a global fit problem, not as one local guess copied across the file.
Once BeatIt has a constant-tempo hypothesis, it still does not assume the projected grid is finished. The sparse edge-correction stage is responsible for removing the remaining long-range phase error.
The high-level correction process is:
-
anchor the grid on usable left and right evidence
- BeatIt compares the projected beat grid against the actually measurable material near the left and right sparse windows
-
estimate edge error
- if the projected grid is slightly early or late at either edge, BeatIt measures that as a timing offset rather than immediately throwing away the whole tempo fit
-
refit the global line
- the start and end offsets are used to slightly retune the projected constant grid across the full file
- this is the main mechanism that removes slow accumulated drift
-
validate the interior again
- after edge refitting, BeatIt checks whether the interior still agrees with the corrected grid
- weak interior regions are not trusted blindly; they are quality-gated first
-
guard against phase mistakes
- BeatIt can try small phase alternatives when the corrected beat grid still looks wrong
- those alternatives are accepted only if they improve the full-file alignment rather than just one local window
-
preserve the musical result
- beat drift is corrected first
- downbeat/bar phase is then preserved or repaired on top of that corrected beat grid
This stage is the reason BeatIt can often turn a "nearly right" model decode into a grid that holds from first beat to last beat without visibly drifting away.
BeatIt treats downbeat phase as a separate problem from plain beat placement. A beat grid can have the correct tempo and still mark the wrong beat of the bar as beat one.
The high-level downbeat process is:
-
decode beat and downbeat evidence separately
- beat activations establish the pulse
- downbeat activations provide the bar-phase evidence
-
build the beat grid first
- BeatIt first solves tempo and beat drift
- downbeat handling is applied on top of that corrected beat grid rather than mixed into the initial tempo fit
-
score bar-phase candidates
- candidate bar phases are compared against the available downbeat evidence
- if downbeat evidence is weak, BeatIt falls back to beat-structure heuristics instead of blindly trusting noise
-
reject musically implausible starts
- the first projected full bar is checked against the observed material
- this prevents obvious one-beat-early or one-beat-late bar starts from surviving just because the pulse itself looked plausible
-
preserve bar phase through later correction
- once a bar phase has been chosen, BeatIt tries to keep that phase stable while sparse edge correction refines the beat grid
- this avoids fixing drift only to lose the correct downbeat again
-
emit beat and downbeat events together
- final output keeps one constant beat grid plus a consistent bar-phase assignment across the file
This separation is important: tempo accuracy, beat phase, and downbeat phase are related, but they are not the same problem and they are not solved by the same evidence.
cmake -S . -B build -G Ninja
cmake --build buildBeatIt loads inference backends through a plugin layer instead of hard-linking every backend into the main binary/framework.
Current backend plugins:
- CoreML
- Torch
This has two practical benefits:
- the default CLI/framework can start without dragging optional backend runtimes into process startup
- backend-specific dependencies, especially the Torch runtime closure, stay isolated behind the backend that actually needs them
That plugin split is the reason the CoreML path can stay lightweight while Torch remains available as an optional backend.
Basic:
./build/beatit --input /path/to/audio.wavModel selection:
# BeatThis CoreML (default)
./build/beatit --input training/manucho.wav
# BeatTrack CoreML
./build/beatit --input training/manucho.wav --beattrack
# BeatThis Torch
./build/beatit --input training/manucho.wav --backend torch --torch-model models/BeatThis_small0.ptImportant options:
-i, --input <path>--backend <coreml|torch>--preset <beattrack|beatthis>--device <auto|cpu|gpu|neural>--model <path>--dump-events--min-bpm <bpm>/--max-bpm <bpm>(validated in[70,180])--dbn/--no-dbn--log-level <error|warn|info|debug>--model-info
Use ./build/beatit --help for the authoritative option list.
Device guidance:
-
CoreML:
- default:
--device auto - recommendation: keep
autofor normal use automaps to CoreMLMLComputeUnitsAll, so CoreML may choose CPU, GPU, Neural Engine, or a mixed execution plan- use
--device gpuor--device neuralonly for explicit benchmarking or backend comparison - use
--device cpufor debugging, deterministic fallback, or CI-style CPU validation
- default:
-
Torch:
- default:
--device auto - recommendation: use
--device gpuon Apple Silicon when validating Torch MPS explicitly - current Torch mapping treats
autoandgpuas "prefer MPS" - if MPS is unavailable or unsupported for a given model/runtime combination, BeatIt falls back to CPU
- use
--device cpuwhen you want the most conservative Torch path --device neuralhas no Torch equivalent and currently falls back to CPU with a warning
- default:
#include "beatit/stream.h"
beatit::BeatitConfig cfg;
if (auto preset = beatit::make_coreml_preset("beatthis")) {
preset->apply(cfg);
}
beatit::BeatitStream stream(sample_rate, cfg, true);
double start_s = 0.0;
double duration_s = 0.0;
if (stream.request_analysis_window(&start_s, &duration_s)) {
beatit::AnalysisResult result =
stream.analyze_window(start_s, duration_s, total_duration_s, provider);
}Contract notes:
request_analysis_window(...)returns the preferred seed window size/start.analyze_window(...)is a single call from the integrator side.- In sparse mode, BeatIt may call
provider(start, duration, out_samples)multiple times internally (left/right/interior probes) before returning the final result. - The provider must therefore be re-entrant for arbitrary
(start, duration)requests within the file bounds.
Provider contract:
BeatitStream::SampleProvider:
using SampleProvider =
std::function<std::size_t(double start_seconds,
double duration_seconds,
std::vector<float>* out_samples)>;What BeatIt expects:
start_seconds/duration_secondsare absolute requests on the original file timeline.- Returned samples must be mono float PCM at the same sample rate used to construct
BeatitStream. - The callback must fill
*out_samplesand return the exact valid sample count. - On out-of-range or read failure: clear
*out_samplesand return0. - The callback can be called multiple times per
analyze_window(...)call in sparse mode, so keep it re-entrant and deterministic.
The packaged BeatIt.framework ships with both backend plugins:
- CoreML plugin
- Torch plugin
It also ships the Torch runtime closure needed by the Torch plugin.
If you only use the default CoreML path, you can remove the Torch side from the framework and still keep full CoreML functionality. Concretely, the CoreML-only framework can drop:
Resources/plugins/libbeatit_backend_torch.dylib- the bundled Torch dylibs in
Resources/plugins/such as:libtorch.dyliblibtorch_cpu.dyliblibtorch_global_deps.dyliblibc10.dylib- and any additional Torch dependency closure copied beside them
Resources/models/BeatThis_small0.pt
What must remain for the CoreML path:
Resources/plugins/libbeatit_backend_coreml.dylibResources/models/BeatThis_small0.mlpackage
So if your app never selects the Torch backend, you can trim the framework footprint substantially without losing the default BeatIt experience.
Run all:
ctest --test-dir build --output-on-failureCPU-only (for environments where GPU/MPS is unavailable):
BEATIT_TEST_CPU_ONLY=1 ctest --test-dir build --output-on-failure- BeatTrack — Matthew Rice — https://github.com/mhrice/BeatTrack — MIT
- Beat This! — Francesco Foscarin, Jan Schlueter, Gerhard Widmer — https://github.com/CPJKU/beat_this — MIT
