Skip to content

BernardoOlisan/shazam-algorithm

Repository files navigation

Shazam's Clone Algorithm

An audio fingerprinting and recognition system built from scratch in C and Python. It identifies songs by computing spectral fingerprints from WAV audio files and matching them against a reference database — the same core algorithm behind Shazam.

How It Works

The system follows a classic audio fingerprinting pipeline:

  1. WAV Parsing — Reads raw PCM audio data from .wav files (16-bit, mono).
  2. STFT (Short-Time Fourier Transform) — Splits the audio into overlapping windows (2048 samples, 75% overlap) and applies the Cooley-Tukey radix-2 FFT with a Hann window to each frame.
  3. Max Filter — A 2D max filter (radius 20) is applied across time and frequency axes to suppress noise and emphasize dominant features.
  4. Peak Detection — Points where the original magnitude equals the max-filtered value are identified as spectral peaks (constellation points).
  5. Hashing — Peak pairs are formed using a target zone (fan-out of up to 10 pairs per anchor) and encoded into 30-bit hashes: (f_anchor << 20) | (f_target << 10) | delta_t.
  6. Matching — A Python script compares hashes from a sample against reference hashes. A strong spike in the delta_t histogram confirms a match.

The C program also renders an interactive spectrogram visualization using SDL2, with detected peaks highlighted in cyan.

Project Structure

shazam-clone/
├── main.c                  # Main program: fingerprinting pipeline + spectrogram visualization
├── fourier_transform.c/h   # DFT and Cooley-Tukey FFT implementations
├── helpers.c/h             # Hann window and complex magnitude utilities
├── matches.py              # Python script: hash matching + delta_t histogram plots
├── requirements.txt        # Python dependencies (matplotlib)
├── fonts/
│   └── Helvetica.ttc       # Font for spectrogram axis labels
├── songs/                  # Full reference songs (WAV format)
│   ├── 0.wav
│   └── 1.wav
├── samples/                # Short audio clips to identify
│   └── 0.wav
├── hashes0.json            # Pre-computed fingerprints for songs/0.wav
├── hashes1.json            # Pre-computed fingerprints for songs/1.wav
└── hashes0_sample.json     # Pre-computed fingerprints for samples/0.wav

Prerequisites

  • C compiler (gcc or clang)
  • SDL2 and SDL2_ttf — for spectrogram visualization
  • Python 3 with matplotlib — for hash matching

Install dependencies (macOS with Homebrew)

brew install sdl2 sdl2_ttf
pip install -r requirements.txt

Usage

Step 1: Fingerprint a Song

Edit the song_dir, hashes_file, and song_id variables at the top of main() in main.c to point to the WAV file you want to fingerprint:

char song_dir[256] = "songs/0.wav";
char hashes_file[256] = "hashes0.json";
int song_id = 0;

Compile and run:

gcc main.c helpers.c fourier_transform.c \
    -I$(brew --prefix sdl2)/include/SDL2 \
    -I$(brew --prefix sdl2_ttf)/include/SDL2 \
    -L$(brew --prefix sdl2)/lib \
    -L$(brew --prefix sdl2_ttf)/lib \
    -lSDL2 -lSDL2_ttf -o main

./main

This will:

  • Compute the STFT, detect peaks, and generate fingerprint hashes
  • Save them to the specified JSON file (e.g., hashes0.json)
  • Open a spectrogram window showing the frequency content and detected peaks (cyan dots)

Close the spectrogram window to exit.

Repeat this for each song in your database, incrementing song_id and changing the file paths.

Step 2: Fingerprint a Sample

Do the same for a short audio clip you want to identify:

char song_dir[256] = "samples/0.wav";
char hashes_file[256] = "hashes0_sample.json";
int song_id = 0;

Recompile and run ./main.

Step 3: Match the Sample Against Reference Songs

python matches.py -sh hashes0_sample.json -h hashes0.json hashes1.json
  • -sh — Path to the sample's hash file
  • -h — One or more reference hash files to compare against

The script will display a histogram of delta_t values (time offset differences) for each song. A tall, narrow spike in the histogram indicates the sample matches that song — the spike's position corresponds to where in the song the sample was taken from.

Audio Format Requirements

Input files must be WAV format with:

  • 16-bit signed integer PCM
  • Mono channel (single channel)

You can convert audio files using ffmpeg:

ffmpeg -i input.mp3 -ar 44100 -ac 1 -sample_fmt s16 output.wav

About

Shazam's Clone Algorithm (From Scratch). An audio fingerprinting and recognition system built from scratch in C and Python, the same core algorithm behind Shazam.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors