Skip to content

BF667-IDLE/vsep

Repository files navigation

vsep logo
vsep β€” Lightning-Fast Audio Stem Separator
Wiki Β· Colab Demo Β· MIT License

Python 3.10+ License: MIT Colab Demo

vsep is a high-performance audio stem separator that splits music into individual components β€” vocals, drums, bass, and other instruments β€” using state-of-the-art AI models originally developed for Ultimate Vocal Remover (UVR). It supports 100+ pre-trained models across four major architectures (Demucs, MDX-Net, VR, and Roformer/MDXC), runs on any hardware from CPUs to NVIDIA GPUs and Apple Silicon, and offers parallel model downloads with resume support for a seamless experience. Whether you are a music producer isolating vocals for a remix, a podcaster cleaning up background noise, or a researcher experimenting with source separation, vsep provides a simple CLI and Python API to get results fast.


Table of Contents


πŸš€ Features

Feature Description
⚑ Fast Parallel Downloads Downloads models with up to 8 parallel threads and automatic resume, achieving 4–8Γ— speedup over sequential downloads.
πŸ”„ Resume Support Automatically detects and resumes interrupted downloads using HTTP Range headers.
🎯 Four Architectures First-class support for MDX-Net, VR Band Split, Demucs v4 (Hybrid Transformer), and Roformer/MDXC models.
🎚️ Ensemble Mode Combine multiple models with weighted or algorithmic ensembling (11 algorithms) for higher-quality output.
πŸ’» Universal Hardware Runs on NVIDIA CUDA, Apple MPS (Metal), AMD/Intel DirectML, or plain CPU β€” auto-detected at startup.
πŸ”Š Audio Chunking Process arbitrarily long audio files in fixed-length chunks, keeping memory usage bounded.
πŸ”§ Centralized Config All repository URLs, download settings, and tuning knobs live in config/variables.py.
🌐 Remote API Deploy as a cloud service on Modal or Google Cloud Run and call it via REST client.
πŸ“Š Model Scoring Each model ships with SDR/SIR/SAR/ISR benchmark scores to help you choose the right one.

🎯 Quick Start

Prerequisites

  • Python 3.10+ (3.13 supported with audioop-lts fallback)
  • FFmpeg β€” required by pydub for audio I/O. Install via your system package manager:
    • Ubuntu/Debian: sudo apt install ffmpeg
    • macOS: brew install ffmpeg
    • Windows: download from ffmpeg.org or choco install ffmpeg
  • PyTorch 2.3+ β€” installed automatically with requirements.txt; for GPU support see the platform-specific install guide.
  • Git β€” to clone the repository.

Installation

# 1. Clone the repository
git clone https://github.com/BF667-IDLE/vsep.git
cd vsep

# 2. Install core dependencies
pip install -r requirements.txt

# 3. Verify the installation
python -c "from separator import Separator; print('vsep installed successfully!')"
Development installation
# Includes pytest, black, and other dev tooling
pip install -r requirements-dev.txt
GPU acceleration (NVIDIA CUDA)
# Replace the CPU onnxruntime with the GPU variant
pip uninstall onnxruntime -y && pip install onnxruntime-gpu

# Ensure PyTorch was installed with CUDA support
python -c "import torch; print(torch.cuda.is_available())"

See INSTALL.md for full platform-specific instructions (Windows NVIDIA/AMD, macOS Apple Silicon, Linux CUDA).

Basic Usage (CLI)

The command-line interface lives at utils/cli.py. All examples below assume you are in the vsep/ project root.

# Separate a song using the default model (BS-Roformer)
python utils/cli.py your_song.mp3

# Use a specific model
python utils/cli.py your_song.mp3 -m UVR-MDX-NET-Inst_1.onnx

# Specify output format and directory
python utils/cli.py your_song.mp3 -m ht-demucs_ft.yaml --output_format WAV --output_dir ./output

# Extract only the vocals stem
python utils/cli.py your_song.mp3 --single_stem Vocals

# List all 100+ supported models with scores
python utils/cli.py --list_models

# Filter model list by stem type (e.g., only vocal models)
python utils/cli.py --list_models --list_stem vocals

# Filter by architecture (e.g., only Roformer models)
python utils/cli.py --list_models --list_type MDXC

# Show models grouped by task category
python utils/cli.py --list_models --list_format categories

# Download a model without separating
python utils/cli.py --download_model_only BS-Roformer-Viperx-1297.ckpt

# Enable debug logging
python utils/cli.py your_song.mp3 -d

Basic Usage (Python API)

from separator import Separator

# Initialize with defaults (auto-detects GPU, uses BS-Roformer model)
separator = Separator()

# Separate an audio file β€” returns list of output file paths
output_files = separator.separate("your_song.mp3")
print(f"Separated files: {output_files}")
# Example output: ['your_song_(Vocals).flac', 'your_song_(Instrumental).flac']

Advanced usage with custom settings:

from separator import Separator
import config.variables as cfg

# Tweak download performance
cfg.MAX_DOWNLOAD_WORKERS = 8       # More parallel download threads
cfg.DOWNLOAD_CHUNK_SIZE = 524288   # 512 KB chunks (default: 256 KB)

# Initialize with custom parameters
separator = Separator(
    model_file_dir="./models",      # Where to store/download models
    output_dir="./output",          # Where to write separated stems
    output_format="FLAC",           # Output audio format
    sample_rate=44100,              # Output sample rate
    normalization_threshold=0.9,    # Peak normalization
    use_autocast=True,              # Faster GPU inference (FP16)
    mdx_params={
        "segment_size": 256,
        "overlap": 0.25,
        "batch_size": 4,
        "enable_denoise": True,
    },
)

output_files = separator.separate("your_song.mp3")

πŸ—οΈ Supported Architectures

vsep wraps four major source-separation architectures, each with its own model format and parameter space:

Architecture File Extension Description Typical Stems
Demucs v4 (Hybrid Transformer) .yaml Meta's hybrid transformer model with multi-band processing. Highest quality for 4-stem separation. vocals, drums, bass, other
MDX-Net .onnx Open Neural Network Exchange models trained by UVR community. Fast and well-tested. vocals, instrumental
VR Band Split .pth Band-split RNN models from early UVR versions. Good compatibility, supports TTA. vocals, instrumental
Roformer / MDXC .ckpt Rotary-former models (BS-Roformer, Mel-Band-Roformer). State-of-the-art vocal quality with SDR scores up to 13+. vocals, instrumental

The architecture is auto-detected from the model filename. You do not need to specify it manually.


🎡 Model Gallery

vsep supports 100+ models from the UVR ecosystem. Here is a curated selection of the most popular and highest-scoring models:

Model Filename Architecture Stems SDR (Vocals) Notes
model_bs_roformer_ep_317_sdr_12.9755.ckpt Roformer vocals, inst 12.98 Default model β€” best overall vocal quality
Mel-Roformer-Viperx-1053.ckpt Roformer vocals, inst 12.61 Mel-band variant, excels on complex mixes
ht-demucs_ft.yaml Demucs v4 vocals, drums, bass, other 11.27 Best 4-stem separation
MDX23C-8KFFT-InstVoc_HQ.ckpt MDXC vocals, inst 11.95 High-quality instrumental extraction
UVR-MDX-NET-Inst_1.onnx MDX-Net vocals, inst 10.65 Fast and reliable classic
2_HP-UVR.pth VR vocals, inst β€” Lightweight, supports TTA & post-processing

Tip: Run python utils/cli.py --list_models to see the full list with SDR/SIR/SAR/ISR scores and filter by stem type, filename, or architecture.


βš™οΈ Configuration

All configurable values are centralized in config/variables.py. You can override them at runtime before importing the Separator:

import config.variables as cfg

# Repository URLs
cfg.UVR_PUBLIC_REPO_URL = "https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models"
cfg.UVR_VIP_REPO_URL = "https://github.com/Anjok0109/ai_magic/releases/download/v5"

# Download behavior
cfg.MAX_DOWNLOAD_WORKERS = 4      # Parallel download threads (default: 4)
cfg.DOWNLOAD_CHUNK_SIZE = 262144  # Bytes per chunk (default: 256 KB)
cfg.DOWNLOAD_TIMEOUT = 300        # Seconds before timeout (default: 300)
cfg.HTTP_POOL_CONNECTIONS = 10    # Connection pool size (default: 10)
cfg.HTTP_POOL_MAXSIZE = 10        # Max pool connections (default: 10)

You can also set the model directory via the VSEP_MODEL_DIR environment variable:

export VSEP_MODEL_DIR=/path/to/my/models
python utils/cli.py song.mp3

See config/README.md for the full configuration reference.


πŸ› οΈ Advanced Features

Ensemble Separation

Ensembling combines the outputs of multiple models to produce a single, higher-quality result. vsep supports 11 ensemble algorithms and named presets.

# Use a built-in ensemble preset
python utils/cli.py song.mp3 --ensemble_preset vocals_ensemble

# Custom ensemble: specify primary + extra models with an algorithm
python utils/cli.py song.mp3 \
  -m model_bs_roformer_ep_317_sdr_12.9755.ckpt \
  --extra_models UVR-MDX-NET-Inst_1.onnx Mel-Roformer-Viperx-1053.ckpt \
  --ensemble_algorithm median_wave

# Weighted ensemble (weights must match number of models)
python utils/cli.py song.mp3 \
  -m model1.ckpt --extra_models model2.onnx \
  --ensemble_algorithm avg_wave \
  --ensemble_weights 0.6 0.4

# List available presets
python utils/cli.py --list_presets

Available ensemble algorithms:

Algorithm Domain Behavior
avg_wave Time Average the waveforms (default)
median_wave Time Median waveform β€” removes outlier artifacts
min_wave Time Minimum amplitude per sample
max_wave Time Maximum amplitude per sample
avg_fft Frequency Average in the frequency domain
median_fft Frequency Median in the frequency domain
min_fft Frequency Minimum magnitude spectrum
max_fft Frequency Maximum magnitude spectrum
uvr_max_spec Frequency UVR-style max spectral magnitude
uvr_min_spec Frequency UVR-style min spectral magnitude
ensemble_wav Time UVR's native waveform ensembling

Chunking for Long Files

Processing a 2-hour DJ set or podcast in one pass can exhaust GPU memory. Enable chunking to split the input into fixed-length segments, process them independently, and concatenate the results:

separator = Separator(
    chunk_duration=600,  # Split into 10-minute chunks
)
output_files = separator.separate("long_podcast.mp3")
python utils/cli.py long_mix.mp3 --chunk_duration 600

Note: Chunks are concatenated without overlap or crossfade. For most use cases this produces imperceptible seams. If you hear artifacts at chunk boundaries, try shorter durations (e.g., 120–300 seconds).

Output Customization

# Output only the vocals stem (skip instrumental)
python utils/cli.py song.mp3 --single_stem Vocals

# Normalize peak amplitude to 0.9 (default)
python utils/cli.py song.mp3 --normalization 0.9

# Amplify quiet output to at least 0.5 peak
python utils/cli.py song.mp3 --amplification 0.5

# Custom output format and bitrate
python utils/cli.py song.mp3 --output_format MP3 --output_bitrate 320k

# Custom output stem names
python utils/cli.py song.mp3 --custom_output_names '{"Vocals": "lead_vocals", "Instrumental": "backing_track"}'

# Use soundfile for output (avoids OOM on very large files)
python utils/cli.py song.mp3 --use_soundfile

# Invert secondary stem via spectrogram (may improve quality)
python utils/cli.py song.mp3 --invert_spect

Remote Deployment

Deploy vsep as a cloud API to offload GPU work to a server. Two deployment targets are supported:

Modal (recommended β€” $30/month free GPU credits):

pip install modal
modal setup
python remote/deploy_modal.py deploy

Google Cloud Run:

python remote/deploy_cloudrun.py deploy

Once deployed, use the Python API client or CLI to send jobs remotely:

from remote import AudioSeparatorAPIClient

client = AudioSeparatorAPIClient("https://your-deployment.modal.run")
result = client.separate_audio_and_wait("song.mp3", model="model_bs_roformer_ep_317_sdr_12.9755.ckpt")

See remote/README.md for full deployment and API documentation.


πŸ“Š Performance Benchmarks

All benchmarks measured on a single song (~4 minutes, stereo, 44100 Hz).

Download Speed (100 MB model, 100 Mbps connection):

Method Time Speedup
Sequential (single-thread) ~60 s 1Γ—
vsep parallel (4 workers) ~15 s 4Γ—
vsep parallel (8 workers) ~8 s 7.5Γ—

Separation Speed (per 4-minute song):

Model CPU (Intel i7) GPU (NVIDIA RTX 3060)
BS-Roformer ~60 s ~15 s
Demucs v4 (ht-demucs-ft) ~30 s ~8 s
MDX-Net (UVR-MDX-NET-Inst_1) ~45 s ~12 s
VR Band Split (2_HP-UVR) ~50 s ~10 s

Your actual performance will vary depending on the specific model, audio length, sample rate, hardware, and driver versions. GPU acceleration requires CUDA (NVIDIA) or MPS (Apple Silicon) support.


πŸ“ Project Structure

vsep/
β”œβ”€β”€ config/                    # Centralized configuration
β”‚   β”œβ”€β”€ variables.py           #   All tunable settings & URLs
β”‚   β”œβ”€β”€ example_usage.py       #   Config usage examples
β”‚   β”œβ”€β”€ __init__.py            #   Package exports
β”‚   └── README.md              #   Configuration reference
β”œβ”€β”€ separator/                 # Core separation engine
β”‚   β”œβ”€β”€ separator.py           #   Main Separator class (entry point)
β”‚   β”œβ”€β”€ common_separator.py    #   Shared logic for all architectures
β”‚   β”œβ”€β”€ ensembler.py           #   Ensemble algorithm implementations
β”‚   β”œβ”€β”€ audio_chunking.py      #   Chunk-based processing for long files
β”‚   β”œβ”€β”€ architectures/         #   Per-architecture implementations
β”‚   β”‚   β”œβ”€β”€ mdx_separator.py   #     MDX-Net architecture
β”‚   β”‚   β”œβ”€β”€ vr_separator.py    #     VR Band Split architecture
β”‚   β”‚   β”œβ”€β”€ demucs_separator.py#     Demucs v4 architecture
β”‚   β”‚   └── mdxc_separator.py  #     MDXC / Roformer architecture
β”‚   β”œβ”€β”€ roformer/              #   Roformer model loader & validation
β”‚   β”‚   β”œβ”€β”€ roformer_loader.py
β”‚   β”‚   β”œβ”€β”€ parameter_validator.py
β”‚   β”‚   β”œβ”€β”€ configuration_normalizer.py
β”‚   β”‚   └── ...
β”‚   └── uvr_lib_v5/            #   UVR processing library (STFT, spectrograms, etc.)
β”‚       β”œβ”€β”€ demucs/            #     Demucs model implementations
β”‚       └── roformer/          #     Roformer network implementations
β”œβ”€β”€ remote/                    # Cloud deployment
β”‚   β”œβ”€β”€ deploy_modal.py        #   Modal.com deployment script
β”‚   β”œβ”€β”€ deploy_cloudrun.py     #   Google Cloud Run deployment script
β”‚   β”œβ”€β”€ api_client.py          #   Python API client for remote service
β”‚   β”œβ”€β”€ cli.py                 #   Remote CLI tool
β”‚   └── README.md              #   Deployment guide
β”œβ”€β”€ utils/                     # Utilities
β”‚   └── cli.py                 #   Command-line interface
β”œβ”€β”€ notebooks/                 # Jupyter / Google Colab demos
β”‚   └── vsep_demo.ipynb
β”œβ”€β”€ docs/                      # Detailed documentation
β”‚   β”œβ”€β”€ logo.svg               #   App logo (vector)
β”‚   β”œβ”€β”€ logo.png               #   App logo (raster)
β”‚   β”œβ”€β”€ API-Reference.md       #   Full API reference (Separator, CLI, config, ensemble)
β”‚   β”œβ”€β”€ Architecture.md        #   Architecture overview with diagrams
β”œβ”€β”€ wiki/                      # Wiki pages (mirrored to GitHub Wiki)
β”‚   β”œβ”€β”€ Home.md                #   Wiki home page
β”‚   β”œβ”€β”€ _Sidebar.md            #   Wiki navigation sidebar
β”‚   └── ...                    #   Installation, CLI, Models, API, etc.
β”œβ”€β”€ tools/                     # Development tools
β”‚   β”œβ”€β”€ calculate-model-hashes.py
β”‚   └── sync-to-github.py
β”œβ”€β”€ requirements.txt           # Core dependencies
β”œβ”€β”€ requirements-dev.txt       # Dev + test dependencies
β”œβ”€β”€ pyproject.toml             # Project config (black, pytest)
β”œβ”€β”€ pytest.ini                 # Test runner configuration
β”œβ”€β”€ ensemble_presets.json      # Named ensemble presets
β”œβ”€β”€ models.json                # Model registry
β”œβ”€β”€ models-scores.json         # Benchmark scores for all models
β”œβ”€β”€ model-data.json            # Model parameter metadata
β”œβ”€β”€ LICENSE                    # MIT License
β”œβ”€β”€ INSTALL.md                 # Platform-specific installation guide
β”œβ”€β”€ CONTRIBUTING.md            # Contribution guidelines
└── README.md                  # This file

πŸ”§ Development

Setting Up a Development Environment

# Clone and enter the project
git clone https://github.com/BF667-IDLE/vsep.git
cd vsep

# Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate   # Linux/macOS
# .venv\Scripts\activate    # Windows

# Install all dependencies (including dev tools)
pip install -r requirements-dev.txt

Running Tests

# Run the full test suite with coverage
pytest tests/ -v --cov=separator --cov-report=term-missing

# Run only a specific test file
pytest tests/unit/test_parameter_validator.py -v

Code Formatting

vsep uses Black with a 140-character line length:

# Format all Python files
black . --line-length 140

# Check without modifying
black . --line-length 140 --check

Building from Source

poetry build

See CONTRIBUTING.md for detailed contribution guidelines, code of conduct, and PR workflow.

Documentation

For in-depth documentation beyond this README:

Document Description
Wiki Complete documentation β€” Installation, CLI, Models, API, Config, Architecture, Colab, Troubleshooting
docs/API-Reference.md Full API reference β€” Separator class, CLI arguments, configuration variables, ensemble algorithms, remote API client
docs/Architecture.md Architecture overview with diagrams β€” system design, model download pipeline, hardware acceleration, all 4 separation architectures

🀝 Contributing

We welcome contributions from everyone! Whether it's a bug fix, a new feature, improved documentation, or a model compatibility patch β€” every contribution helps.

Please read CONTRIBUTING.md for the full contribution workflow, but here's a quick summary:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Make your changes and add tests where applicable
  4. Format code with black . --line-length 140
  5. Commit with a clear message (git commit -m 'Add feature X')
  6. Push to your fork (git push origin feature/my-feature)
  7. Open a Pull Request against the main branch

πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for the full text. You are free to use, modify, and distribute this software for personal and commercial purposes.

Note on models: Individual AI models downloaded through vsep may have their own licenses. Please check the license of each model before using it in commercial projects. VIP models require a paid subscription to Anjok07's Patreon.


πŸ™ Acknowledgments

vsep is built on the shoulders of giants. We gratefully acknowledge:

  • Anjok07 β€” Primary model trainer and creator of Ultimate Vocal Remover, whose models form the backbone of vsep. Please consider supporting UVR on Patreon to fund ongoing model training.
  • TRvlvr β€” Maintainer of the UVR model repository and application data.
  • NomadKaraoke β€” Creator of the python-audio-separator project, which vsep extends.
  • Meta Research β€” Developers of the Demucs architecture.
  • The UVR community β€” The many model trainers and contributors who make these models freely available.

πŸ’¬ Support

Channel Link
Bug Reports GitHub Issues
Feature Requests GitHub Issues
Discussions GitHub Discussions
Try It Free Google Colab Demo
Support UVR Patreon ❀️

Support original UVR project

About

Lightning-fast audio stem separator. 100+ AI models across 4 architectures (Demucs, MDX-Net, VR, Roformer). CLI, Python API, and Google Colab.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors