FlexSED: Towards Open-Vocabulary Sound Event Detection

FlexSED is an easy-to-use, open-vocabulary sound event detection (SED) system. It can be used for data annotation, labeling, and developing evaluation metrics for audio generation.

News

Oct 2025: 📦 Released code and pretrained checkpoint
Sep 2025: 🎉 FlexSED Spotlighted at WASPAA 2025

Installation

Clone the repository:

git clone https://github.com/JHU-LCAP/FlexSED.git

Install the dependencies:

cd FlexSED
pip install -r requirements.txt

Usage

from api import FlexSED
import torch
import soundfile as sf

# load model
flexsed = FlexSED(device='cuda')

# run inference
events = ["Door", "Male Speech", "Laughter", "Dog"]
preds = flexsed.run_inference("example.wav", events)

# visualize prediciton
flexsed.to_multi_plot(preds, events, fname="example")

# (Optional) visualize prediciton by video
# flexsed.to_multi_video(preds, events, audio_path="example.wav", fname="example")

Training

Download the AudioSet-Strong subset. The dataset is available from both WavCaps and HF-AS-Strong. Thanks to the contributors for providing these resources.
Prepare metadata following the preprocessing steps. Feel free to check processed metadata.

(If you wish to create a validation split, remove a subset of samples from the training metadata and format them the same as the test metadata. Recommended: ~2000 samples across ~50 sound classes.)
Update file paths for both metadata and audio in src/configs.
Extract CLAP embeddings
```
python src/prepare_clap.py
```
Run training:
```
python src/train.py
```

Reference

If you find the code useful for your research, please consider citing:

@article{hai2025flexsed,
  title={FlexSED: Towards Open-Vocabulary Sound Event Detection},
  author={Hai, Jiarui and Wang, Helin and Guo, Weizhe and Elhilali, Mounya},
  journal={arXiv preprint arXiv:2509.18606},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
preprocessing		preprocessing
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api.py		api.py
app.py		app.py
example.wav		example.wav
example2.wav		example2.wav
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlexSED: Towards Open-Vocabulary Sound Event Detection

News

Installation

Usage

Training

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlexSED: Towards Open-Vocabulary Sound Event Detection

News

Installation

Usage

Training

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages