Enhanced RVC Training System with 20 optimizers, full TypeScript WebUI, Python FastAPI backend, WebSocket real-time monitoring, and Google Colab support.
Colab ·
Install ·
WebUI ·
Optimizers ·
Workflow
Built on PolTrain · Side project of RVC Starter
┌────────────────────────────────────────┐
│ TypeScript Frontend │
│ Next.js 16 · Port 3000 │
│ Dashboard · Config · Monitor · Guide │
├────────────────────────────────────────┤
│ WebSocket Bridge │
│ Bun · Port 3003 │
│ REST Proxy + WS Relay + Auto-Reconnect│
├────────────────────────────────────────┤
│ Python Backend │
│ FastAPI · Port 7861 │
│ Training · GPU · System · WebSocket │
└────────────────────────────────────────┘
Layer
Technology
Port
Purpose
Frontend
Next.js 16, React 19, TypeScript 5, Tailwind CSS 4, shadcn/ui
3000
WebUI with 4 tabs
Bridge
Bun, native WebSocket + ws library
3003
REST proxy + WebSocket relay
Backend
Python 3.8+, FastAPI, Uvicorn, PyTorch 2.0+
7861
Training pipeline + GPU management
Open colab_webui.ipynb in Google Colab and run all cells. It automatically handles everything:
Step
What happens
GPU Check
Detects GPU name, VRAM, and temperature
Install Deps
Installs PyTorch (CUDA), FastAPI, Node.js, and all dependencies
Download Models
Fetches RMVPE + ContentVec pre-trained models
Upload Dataset
Connect Google Drive or upload audio files directly
Start Backend
Launches Python FastAPI server on port 7861
Build Frontend
Installs npm packages and starts Next.js on port 3000
ngrok Tunnels
Creates public URLs for remote access from any device
The entire process is idempotent — safe to re-run any cell. Works with Colab's free T4 GPU (16GB VRAM).
Use T4 GPU (free) for models up to batch size 8
Set optimizer to AdamW or Ranger for best results on T4
Use Adafactor if you hit OOM errors (lowest VRAM usage)
Connect Google Drive for persistent storage across sessions
300 epochs takes roughly 1-2 hours on T4 depending on dataset size
Python 3.8+
Node.js 18+ / Bun 1.0+
PyTorch 2.0+ with CUDA support (optional, CPU/MPS also works)
GPU : NVIDIA with CUDA 11.7+ (optional)
RAM : 8GB+ recommended
Step 1: Clone & Install Python Dependencies
git clone https://github.com/BF667-IDLE/VCTrain.git
cd VCTrain
# Install Python ML dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
# Install Python backend dependencies
pip install fastapi uvicorn[standard] websockets
Step 2: Install Frontend Dependencies
# Using bun (recommended)
bun install
# Or using npm
npm install
Step 3: Install WebSocket Bridge
cd mini-services/ws-bridge
bun install
cd ../..
Step 4: Start All Services
Open 3 terminal windows :
# Terminal 1 — Python Backend (port 7861)
python -m webui.server
# Terminal 2 — WebSocket Bridge (port 3003)
cd mini-services/ws-bridge && bun run dev
# Terminal 3 — Next.js Frontend (port 3000)
bun run dev
Navigate to http://localhost:3000 in your browser.
Note: The WebUI works in demo mode even without the Python backend running. When the backend is offline, it shows mock data with a "Backend Not Connected" banner. Start the backend to switch to live training mode.
Real-time experiment list fetched from the filesystem
Active training jobs with live status (running / completed / failed)
GPU monitoring — GPU name, memory usage, CUDA version
Connection indicator — green dot when backend is online, gray when offline
Quick action buttons — New Training, Compare Models, GPU Monitor
Complete form matching all train.py CLI arguments :
Experiment directory, model name, total epochs, save interval, batch size
Sample rate (32k / 40k / 48k), vocoder (HiFi-GAN / MRF / RefineGAN)
All 19 optimizer selection with live info panel
Pretrained model paths, GPU device IDs, save-to-ZIP toggle
Live CLI command preview — see the exact command that will run, with copy button
Starts real training via FastAPI backend when connected
Shows job ID on success and auto-switches to Monitor tab
Real-time WebSocket metrics — losses, mel similarity, gradient norms, learning rate
4 interactive Recharts :
Loss Curves (discriminator, generator, mel, KL)
Mel Spectrogram Similarity (%) over epochs
Gradient Norms (generator vs discriminator)
Learning Rate Schedule with cosine decay visualization
Scrollable training log viewer — see raw training output in real-time
Demo mode — shows realistic mock data when backend is offline
6 quick recommendation cards : Best Overall, Fastest, Memory Efficient, Zero LR Tuning, Maximum Quality, Large Batch
All 19 optimizers organized by category with expandable detail cards
Star ratings for Speed, Quality, Memory Efficiency, and Stability
Click any optimizer card to see full details: description, recommended LR range, key feature, best use case
VCTrain/
├── rvc/ # Core training code (Python)
│ ├── train/
│ │ ├── train.py # Main training script (20 optimizers)
│ │ ├── utils/
│ │ │ ├── optimizers/ # 20 optimizer implementations
│ │ │ │ ├── Adam.py
│ │ │ │ ├── AdamW.py
│ │ │ │ ├── AdaBelief.py
│ │ │ │ ├── AdaBeliefV2.py
│ │ │ │ ├── Adafactor.py
│ │ │ │ ├── AMSGrad.py
│ │ │ │ ├── Apollo.py
│ │ │ │ ├── CAME.py
│ │ │ │ ├── DAdaptAdam.py
│ │ │ │ ├── LAMB.py
│ │ │ │ ├── Lion.py
│ │ │ │ ├── Lookahead.py
│ │ │ │ ├── NovoGrad.py
│ │ │ │ ├── Prodigy.py
│ │ │ │ ├── RAdam.py
│ │ │ │ ├── Ranger.py
│ │ │ │ ├── SignSGD.py
│ │ │ │ ├── SGD.py
│ │ │ │ └── Sophia.py
│ │ │ ├── train_utils.py
│ │ │ └── data_utils.py
│ │ ├── preprocess/ # Audio preprocessing
│ │ ├── losses.py # GAN loss functions
│ │ ├── mel_processing.py # Mel spectrogram processing
│ │ └── visualization.py # TensorBoard logging
│ ├── lib/ # Model architectures
│ │ ├── algorithm/ # Synthesizer, discriminator, generator
│ │ └── configs/ # Sample rate configs (32k/40k/48k)
│ └── configs/ # JSON config templates
│
├── webui/ # Python Backend (FastAPI)
│ ├── __init__.py
│ ├── server.py # FastAPI server (port 7861)
│ └── requirements.txt # fastapi, uvicorn, websockets
│
├── src/ # TypeScript Frontend (Next.js)
│ ├── app/
│ │ ├── page.tsx # Main page with tab navigation
│ │ ├── layout.tsx # Root layout with QueryProvider
│ │ ├── globals.css # Theme colors (amber/orange)
│ │ └── api/training/route.ts # CLI command generator API
│ ├── components/
│ │ ├── vctrain/ # Main tab components
│ │ │ ├── dashboard-tab.tsx
│ │ │ ├── training-config-tab.tsx
│ │ │ ├── training-monitor-tab.tsx
│ │ │ └── optimizer-guide-tab.tsx
│ │ └── ui/ # shadcn/ui components
│ ├── lib/
│ │ ├── api.ts # REST client + WebSocket hook
│ │ ├── store.ts # Zustand state management
│ │ ├── query-provider.tsx # React Query configuration
│ │ └── training-data.ts # Optimizer definitions + mock data
│ └── types/
│ └── vctrain.ts # TypeScript interfaces
│
├── mini-services/ # WebSocket Bridge
│ └── ws-bridge/
│ ├── index.ts # Bridge server (port 3003)
│ ├── package.json
│ └── tsconfig.json
│
├── colab.ipynb # Original Colab notebook (CLI)
├── colab_webui.ipynb # New Colab notebook (WebUI)
├── package.json # Frontend dependencies
├── requirements.txt # Python ML dependencies
└── download_files.py # Pre-trained model downloader
# Default training with AdamW
python rvc/train/train.py \
--experiment_dir " experiments" \
--model_name " my_voice" \
--optimizer " AdamW" \
--total_epoch 300 \
--batch_size 8 \
--sample_rate 48000 \
--gpus " 0"
# With Ranger for best generalization
python rvc/train/train.py \
--experiment_dir " experiments" \
--model_name " my_voice" \
--optimizer " Ranger" \
--total_epoch 300 \
--batch_size 8
# With Prodigy (no LR tuning needed!)
python rvc/train/train.py \
--experiment_dir " experiments" \
--model_name " my_voice" \
--optimizer " Prodigy" \
--total_epoch 300
# Memory-efficient with Adafactor
python rvc/train/train.py \
--experiment_dir " experiments" \
--model_name " my_voice" \
--optimizer " Adafactor" \
--total_epoch 300
# Multi-GPU training (GPUs 0 and 1)
python rvc/train/train.py \
--experiment_dir " experiments" \
--model_name " my_voice" \
--optimizer " Sophia" \
--gpus " 0-1"
Open http://localhost:3000 (or the Colab ngrok URL)
Go to Training Config tab
Fill in model name and adjust parameters
Select your preferred optimizer from the dropdown
Click Start Training — it switches to the Monitor tab automatically
Watch real-time charts and logs update live
Method
Endpoint
Description
GET
/api/health
Backend health check
POST
/api/training/start
Start a new training job
GET
/api/training/status
Get all job statuses
POST
/api/training/stop/{job_id}
Stop a running job
DELETE
/api/training/job/{job_id}
Delete a job record
GET
/api/experiments
List filesystem experiments
GET
/api/system/info
GPU and system info
GET
/api/optimizers
Available optimizers list
WS
/ws/training/{job_id}
Real-time training metrics stream
🎯 20 Optimizers with Gradient Centralization
All custom optimizers support:
torch._foreach acceleration for fast vectorized operations
Optional Gradient Centralization (GC) for improved GAN training stability
Decoupled weight decay following Loshchilov & Hutter (2019)
Both single-tensor and foreach step implementations
Optimizer
Description
LR Range
Key Feature
AdamW
Adam + decoupled weight decay + GC
1e-4 to 3e-4
Custom impl with GC
Adam
Classic adaptive optimizer + GC
1e-4 to 3e-4
Fast convergence
AMSGrad
Adam with max variance tracking + GC
1e-4 to 3e-4
Prevents oscillations
RAdam
Rectified Adam + GC
1e-4 to 3e-4
Stable early training
AdaBelief
Belief-based adaptive LR + GC
1e-4 to 3e-4
Better generalization
AdaBeliefV2
AdaBelief + AMSGrad
1e-4 to 3e-4
Very stable, long training
Adafactor
Factored moments, memory efficient
Auto (relative step)
Lowest VRAM usage
NovoGrad
Normalized gradient, per-layer LR
1e-4 to 3e-4
Naturally per-layer adaptive
LAMB
Layer-wise Adaptive Moments
1e-4 to 3e-4
Large-batch training
DAdaptAdam
D-Adaptation for automatic LR
Auto (set lr=1.0)
No LR tuning needed
Optimizer
Description
LR Range
Key Feature
Lion
Evolved Sign Momentum + GC
1e-5 to 5e-5
Only stores momentum
SignSGD
Sign of momentum + GC
1e-5 to 5e-5
Ultra memory-efficient
Optimizer
Description
LR Range
Key Feature
Sophia
Second-order clipping (Sophia-G) + GC
5e-5 to 2e-4
Curvature-aware
CAME
Clipped Absolute Moment Estimation + GC
5e-4 to 1e-3
Dual variance estimates
Apollo
Curvature-aware near-optimal + GC
1e-3 to 1e-2
Approx. second-order
Optimizer
Description
LR Range
Key Feature
AdamP
Adam with perturbation projection + GC
1e-4 to 3e-4
Anti-filter-noise
Ranger
RAdam + Lookahead + GC
1e-4 to 3e-4
Best generalization
SGD
Nesterov momentum + GC
1e-3 to 1e-2
Strong regularization
Lookahead
Wrapper for any base optimizer
N/A
Enhances any optimizer
Optimizer
Description
LR Range
Key Feature
Prodigy
Automatic LR via D-Adaptation + GC
Auto (set lr=1.0)
Zero-tuning
DAdaptAdam
D-Adaptation for Adam
Auto (set lr=1.0)
Self-adjusting
Use Case
Optimizer
Why
Default / General
AdamW or Ranger
Best overall for RVC
Low VRAM
Adafactor
Factored moments, least memory
Best Quality
Sophia or CAME
Fast convergence, stable
No LR Tuning
Prodigy or DAdaptAdam
Auto-finds optimal LR
Large Batch
LAMB
Trust ratio prevents divergence
Fast Training
Lion or SignSGD
Minimal memory, fast per-step
GAN Stability
Ranger or AdamW + GC
Lookahead + GC
Quick Test
SGD Nesterov
Simple, strong regularization
Optimizer
Speed
Quality
Memory
Stability
AdamW
⭐⭐⭐
⭐⭐⭐
⭐⭐
⭐⭐⭐
Adam
⭐⭐⭐
⭐⭐⭐
⭐⭐
⭐⭐⭐
AMSGrad
⭐⭐⭐
⭐⭐⭐
⭐⭐
⭐⭐⭐⭐
RAdam
⭐⭐⭐
⭐⭐⭐
⭐⭐
⭐⭐⭐⭐
Ranger
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐⭐⭐
AdaBelief
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐
⭐⭐⭐
AdaBeliefV2
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐
⭐⭐⭐⭐
Adafactor
⭐⭐
⭐⭐⭐
⭐⭐⭐⭐⭐
⭐⭐⭐
Apollo
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐⭐⭐
⭐⭐⭐
CAME
⭐⭐⭐
⭐⭐⭐⭐⭐
⭐⭐
⭐⭐⭐⭐⭐
DAdaptAdam
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐
⭐⭐⭐⭐
LAMB
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐
⭐⭐⭐
NovoGrad
⭐⭐⭐
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐⭐
Prodigy
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐
⭐⭐⭐⭐
Lion
⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐⭐
⭐⭐⭐
SignSGD
⭐⭐⭐⭐⭐
⭐⭐⭐
⭐⭐⭐⭐⭐
⭐⭐⭐
Sophia
⭐⭐⭐⭐
⭐⭐⭐⭐
⭐⭐
⭐⭐⭐⭐
Prepare Data — Collect clean audio files (WAV, 32kHz+). Minimum 10 minutes of speech recommended.
Preprocess — Slice audio, extract features, build filelist. Use command line or WebUI.
Configure — Set parameters in Training Config tab. Choose from 19 optimizers, set epochs, batch size, sample rate, vocoder.
Train — Click Start Training . The backend launches as a subprocess and streams metrics via WebSocket.
Monitor — Watch real-time loss curves, mel similarity, gradient norms, and learning rate in the Monitor tab.
Export — Download trained model weights for inference with RVC.
Use clean audio without background noise
Minimum 10 minutes of speech recommended
Consistent volume levels across samples
Remove silence and breaths for best results
Start with 100 epochs for quick testing
Use 300+ epochs for production quality
Monitor mel similarity (target: 70%+)
Save checkpoints regularly (every 25 epochs by default)
VRAM
Batch Size
Recommended Optimizer
4 GB
2-4
Adafactor or SignSGD
8 GB
4-8
AdamW or Lion
12 GB
8-16
Any optimizer
16+ GB
16-32
Sophia or CAME
Prodigy / DAdaptAdam : Set lr=1.0, the optimizer auto-adjusts
Lion / SignSGD : Use lower LR than Adam (10x lower typically)
Sophia : Update period of 2-3 steps works best
Ranger : Good default choice, no tuning needed
Adafactor : Uses relative_step=True for automatic LR
CAME : Higher LR (10x base) works best due to clipping
Component
Technology
Frontend Framework
Next.js 16, React 19, TypeScript 5
Styling
Tailwind CSS 4, shadcn/ui (New York)
Charts
Recharts
Animations
Framer Motion
State Management
Zustand, React Query (TanStack Query)
WebSocket Bridge
Bun, native WebSocket + ws library
Python Backend
FastAPI, Uvicorn, PyTorch 2.0+
ML Training
PyTorch DDP, TensorBoard
PolTrain — Base project
RVC — Voice conversion technology
PyTorch — Deep learning framework
Next.js — React framework
FastAPI — Python web framework
AdamW (Loshchilov & Hutter, 2019)
Lion (Chen et al., 2023)
Sophia (Liu et al., 2023)
RAdam (Liu et al., 2020)
Ranger (Less Wright, 2020)
AdaBelief (Zhuang et al., 2020)
Lookahead (Zhang et al., 2019)
Prodigy (Defazio & Jelassi, 2023)
D-Adapt (Defazio, 2023)
CAME (Luo et al., 2023)
Apollo (Shi et al., 2022)
LAMB (You et al., 2019)
NovoGrad (Golovneva et al., 2019)
SignSGD (Bernstein et al., 2018)
Same license as the original PolTrain project.
Happy Training! 🎤