Skip to content

cucuwritescode/conformer-acr

Repository files navigation

Conformer-Based Long-Tail Automatic Chord Recognition

acr_with_Bede

Python PyTorch License: MIT

Overview & Architecture

This library implements a multi-headed Conformer network designed to solve the "long-tail" problem in Automatic Chord Recognition (ACR).

Architecture Diagram
Figure 1: The multi-headed Conformer architecture branching into Root, Bass, and Quality predictions.

To counteract this, conformer-acr makes use of:

  • The Conformer Backbone: Combines Convolutional Neural Networks (CNNs) to capture local acoustic texture/timbre with Transformers (self-attention) to maintain global harmonic context.
  • Structured Multi-Task Heads: Instead of predicting a single monolithic chord string, the network branches into three distinct classification heads: Root, Bass, and Quality. This explicitly forces the model to understand inversions without causing a combinatorial explosion in the target vocabulary.
  • Synthetic Pre-Training (Harmonic Prior): Because Conformers are memory and data-hungry, the model is pre-trained on perfectly annotated synthetic multitracks (the AAM dataset) using the Bede NVLink GPU cluster. This establishes a mathematically pure "harmonic prior" before the model is fine-tuned on noisy, real-world acoustic audio.

Install

# editable install (for development)
pip install -e .

# with dev tools (pytest, etc)
pip install -e ".[dev]"


To counteract this, `conformer-acr` makes use of:
* **The Conformer Backbone:** Combines Convolutional Neural Networks (CNNs) to capture local acoustic texture/timbre with Transformers (self-attention) to maintain global harmonic context.
* **Structured Multi-Task Heads:** Instead of predicting a single monolithic chord string, the network branches into three distinct classification heads: **Root**, **Bass**, and **Quality**. This explicitly forces the model to understand inversions without causing a combinatorial explosion in the target vocabulary.
* **Synthetic Pre-Training (Harmonic Prior):** Because Conformers are memory and data-hungry, the model is pre-trained on perfectly annotated synthetic multitracks (the AAM dataset) using the Bede NVLink GPU cluster. This establishes a mathematically pure "harmonic prior" before the model is fine-tuned on noisy, real-world acoustic audio.

## Install

```bash
# editable install (for development)
pip install -e .

# with dev tools (pytest, etc)
pip install -e ".[dev]"
<img width="621" height="724" alt="Screenshot 2026-03-13 at 16 27 13" src="https://github.com/user-attachments/assets/2f7dbddb-84fc-4edd-8163-e88447432c65" />



## Install

```bash
#editable install (for development)
pip install -e .

#with dev tools (pytest, etc)
pip install -e ".[dev]"

Quick Start

import conformer_acr as acr

#feature extraction
cqt = acr.preprocess_audio("song.mp3")

#inference (requires a trained checkpoint)
chords = acr.predict("song.mp3", checkpoint_path="model.pt")

#model
model = acr.ConformerACR(d_model=256, n_heads=4, n_layers=4)

#chord vocabulary
idx   = acr.chord_to_index("C:maj")   # → 0
label = acr.index_to_chord(0)          # → 'C:maj'

Library Structure

conformer_acr/
├── __init__.py          #flat public API
├── config.py            #constants (SR, CQT bins, hop length)
├── core.py              #high-level inference pipeline
├── models/
│   └── conformer.py     #ConformerACR (encoder + 3 heads)
├── data/
│   ├── dataset.py       #AAM & Isophonics Dataset classes
│   └── preprocess.py    #audio loading & CQT extraction
├── theory/
│   └── vocabulary.py    #chord ↔ integer mappings
├── training/
│   ├── trainer.py       #training loop
│   └── losses.py        #focal Loss
└── utils/
    └── distributed.py   #Bede/DDP helpers

lit_review/

The lit_review/ directory contains standalone research scripts and datasets used during the literature review phase. It is not part of the library.

Acknowledgements

This work is part of the N8 Centre for Computationally Intensive Research project "Deep Learning Models for Automatic Chord Recognition in Polyphonic Audio” for the EPSRC-funded Bede Supercomputer studentship. Supervised by Dr. Karolina Prawda.

About

"Deep Learning Models for Automatic Chord Recognition in Polyphonic Audio” for the EPSRC-funded Bede Supercomputer studentship.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors