Shazam's Clone Algorithm

An audio fingerprinting and recognition system built from scratch in C and Python. It identifies songs by computing spectral fingerprints from WAV audio files and matching them against a reference database — the same core algorithm behind Shazam.

How It Works

The system follows a classic audio fingerprinting pipeline:

WAV Parsing — Reads raw PCM audio data from .wav files (16-bit, mono).
STFT (Short-Time Fourier Transform) — Splits the audio into overlapping windows (2048 samples, 75% overlap) and applies the Cooley-Tukey radix-2 FFT with a Hann window to each frame.
Max Filter — A 2D max filter (radius 20) is applied across time and frequency axes to suppress noise and emphasize dominant features.
Peak Detection — Points where the original magnitude equals the max-filtered value are identified as spectral peaks (constellation points).
Hashing — Peak pairs are formed using a target zone (fan-out of up to 10 pairs per anchor) and encoded into 30-bit hashes: (f_anchor << 20) | (f_target << 10) | delta_t.
Matching — A Python script compares hashes from a sample against reference hashes. A strong spike in the delta_t histogram confirms a match.

The C program also renders an interactive spectrogram visualization using SDL2, with detected peaks highlighted in cyan.

Project Structure

shazam-clone/
├── main.c                  # Main program: fingerprinting pipeline + spectrogram visualization
├── fourier_transform.c/h   # DFT and Cooley-Tukey FFT implementations
├── helpers.c/h             # Hann window and complex magnitude utilities
├── matches.py              # Python script: hash matching + delta_t histogram plots
├── requirements.txt        # Python dependencies (matplotlib)
├── fonts/
│   └── Helvetica.ttc       # Font for spectrogram axis labels
├── songs/                  # Full reference songs (WAV format)
│   ├── 0.wav
│   └── 1.wav
├── samples/                # Short audio clips to identify
│   └── 0.wav
├── hashes0.json            # Pre-computed fingerprints for songs/0.wav
├── hashes1.json            # Pre-computed fingerprints for songs/1.wav
└── hashes0_sample.json     # Pre-computed fingerprints for samples/0.wav

Prerequisites

C compiler (gcc or clang)
SDL2 and SDL2_ttf — for spectrogram visualization
Python 3 with matplotlib — for hash matching

Install dependencies (macOS with Homebrew)

brew install sdl2 sdl2_ttf
pip install -r requirements.txt

Usage

Step 1: Fingerprint a Song

Edit the song_dir, hashes_file, and song_id variables at the top of main() in main.c to point to the WAV file you want to fingerprint:

char song_dir[256] = "songs/0.wav";
char hashes_file[256] = "hashes0.json";
int song_id = 0;

Compile and run:

gcc main.c helpers.c fourier_transform.c \
    -I$(brew --prefix sdl2)/include/SDL2 \
    -I$(brew --prefix sdl2_ttf)/include/SDL2 \
    -L$(brew --prefix sdl2)/lib \
    -L$(brew --prefix sdl2_ttf)/lib \
    -lSDL2 -lSDL2_ttf -o main

./main

This will:

Compute the STFT, detect peaks, and generate fingerprint hashes
Save them to the specified JSON file (e.g., hashes0.json)
Open a spectrogram window showing the frequency content and detected peaks (cyan dots)

Close the spectrogram window to exit.

Repeat this for each song in your database, incrementing song_id and changing the file paths.

Step 2: Fingerprint a Sample

Do the same for a short audio clip you want to identify:

char song_dir[256] = "samples/0.wav";
char hashes_file[256] = "hashes0_sample.json";
int song_id = 0;

Recompile and run ./main.

Step 3: Match the Sample Against Reference Songs

python matches.py -sh hashes0_sample.json -h hashes0.json hashes1.json

-sh — Path to the sample's hash file
-h — One or more reference hash files to compare against

The script will display a histogram of delta_t values (time offset differences) for each song. A tall, narrow spike in the histogram indicates the sample matches that song — the spike's position corresponds to where in the song the sample was taken from.

Audio Format Requirements

Input files must be WAV format with:

16-bit signed integer PCM
Mono channel (single channel)

You can convert audio files using ffmpeg:

ffmpeg -i input.mp3 -ar 44100 -ac 1 -sample_fmt s16 output.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shazam's Clone Algorithm

How It Works

Project Structure

Prerequisites

Install dependencies (macOS with Homebrew)

Usage

Step 1: Fingerprint a Song

Step 2: Fingerprint a Sample

Step 3: Match the Sample Against Reference Songs

Audio Format Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
fonts		fonts
samples		samples
songs		songs
.gitignore		.gitignore
README.md		README.md
The Math of Sound - research.pdf		The Math of Sound - research.pdf
fourier_transform.c		fourier_transform.c
fourier_transform.h		fourier_transform.h
hashes0.json		hashes0.json
hashes0_sample.json		hashes0_sample.json
hashes1.json		hashes1.json
helpers.c		helpers.c
helpers.h		helpers.h
main.c		main.c
matches.py		matches.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Shazam's Clone Algorithm

How It Works

Project Structure

Prerequisites

Install dependencies (macOS with Homebrew)

Usage

Step 1: Fingerprint a Song

Step 2: Fingerprint a Sample

Step 3: Match the Sample Against Reference Songs

Audio Format Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages