An audio fingerprinting and recognition system built from scratch in C and Python. It identifies songs by computing spectral fingerprints from WAV audio files and matching them against a reference database — the same core algorithm behind Shazam.
The system follows a classic audio fingerprinting pipeline:
- WAV Parsing — Reads raw PCM audio data from
.wavfiles (16-bit, mono). - STFT (Short-Time Fourier Transform) — Splits the audio into overlapping windows (2048 samples, 75% overlap) and applies the Cooley-Tukey radix-2 FFT with a Hann window to each frame.
- Max Filter — A 2D max filter (radius 20) is applied across time and frequency axes to suppress noise and emphasize dominant features.
- Peak Detection — Points where the original magnitude equals the max-filtered value are identified as spectral peaks (constellation points).
- Hashing — Peak pairs are formed using a target zone (fan-out of up to 10 pairs per anchor) and encoded into 30-bit hashes:
(f_anchor << 20) | (f_target << 10) | delta_t. - Matching — A Python script compares hashes from a sample against reference hashes. A strong spike in the
delta_thistogram confirms a match.
The C program also renders an interactive spectrogram visualization using SDL2, with detected peaks highlighted in cyan.
shazam-clone/
├── main.c # Main program: fingerprinting pipeline + spectrogram visualization
├── fourier_transform.c/h # DFT and Cooley-Tukey FFT implementations
├── helpers.c/h # Hann window and complex magnitude utilities
├── matches.py # Python script: hash matching + delta_t histogram plots
├── requirements.txt # Python dependencies (matplotlib)
├── fonts/
│ └── Helvetica.ttc # Font for spectrogram axis labels
├── songs/ # Full reference songs (WAV format)
│ ├── 0.wav
│ └── 1.wav
├── samples/ # Short audio clips to identify
│ └── 0.wav
├── hashes0.json # Pre-computed fingerprints for songs/0.wav
├── hashes1.json # Pre-computed fingerprints for songs/1.wav
└── hashes0_sample.json # Pre-computed fingerprints for samples/0.wav
- C compiler (gcc or clang)
- SDL2 and SDL2_ttf — for spectrogram visualization
- Python 3 with matplotlib — for hash matching
brew install sdl2 sdl2_ttf
pip install -r requirements.txtEdit the song_dir, hashes_file, and song_id variables at the top of main() in main.c to point to the WAV file you want to fingerprint:
char song_dir[256] = "songs/0.wav";
char hashes_file[256] = "hashes0.json";
int song_id = 0;Compile and run:
gcc main.c helpers.c fourier_transform.c \
-I$(brew --prefix sdl2)/include/SDL2 \
-I$(brew --prefix sdl2_ttf)/include/SDL2 \
-L$(brew --prefix sdl2)/lib \
-L$(brew --prefix sdl2_ttf)/lib \
-lSDL2 -lSDL2_ttf -o main
./mainThis will:
- Compute the STFT, detect peaks, and generate fingerprint hashes
- Save them to the specified JSON file (e.g.,
hashes0.json) - Open a spectrogram window showing the frequency content and detected peaks (cyan dots)
Close the spectrogram window to exit.
Repeat this for each song in your database, incrementing song_id and changing the file paths.
Do the same for a short audio clip you want to identify:
char song_dir[256] = "samples/0.wav";
char hashes_file[256] = "hashes0_sample.json";
int song_id = 0;Recompile and run ./main.
python matches.py -sh hashes0_sample.json -h hashes0.json hashes1.json-sh— Path to the sample's hash file-h— One or more reference hash files to compare against
The script will display a histogram of delta_t values (time offset differences) for each song. A tall, narrow spike in the histogram indicates the sample matches that song — the spike's position corresponds to where in the song the sample was taken from.
Input files must be WAV format with:
- 16-bit signed integer PCM
- Mono channel (single channel)
You can convert audio files using ffmpeg:
ffmpeg -i input.mp3 -ar 44100 -ac 1 -sample_fmt s16 output.wav