A real-time audio denoising system utilizing deep learning, featuring a custom UNet architecture designed for quality noise reduction
Features β’ Installation β’ Usage β’ Architecture β’ Demo
- π― Real-Time Processing: Live audio recording and denoising capabilities with minimal latency
- π§ Advanced Architecture: Custom UNet design optimized for 1D audio signal processing
- β‘ Flexible Deployment: Supports both CPU and GPU inference with automatic device detection
- π Seamless Audio: Processes audio in overlapping chunks for artifact-free output
- π Complete Pipeline: Includes training, inference, evaluation, and visualization scripts
- π¨ Modern Web Interface: Beautiful, responsive frontend for easy interaction
- π Training Monitoring: Real-time loss tracking with early stopping and learning rate scheduling
| Metric | Value |
|---|---|
| Sampling Rate | 16 kHz |
| Processing Latency | ~100ms |
| Chunk Size | 32,000 samples (2 seconds) |
| Overlap | 1,600 samples (10%) |
| Model Parameters | ~2.5M trainable |
| Model Size | ~10 MB |
| Platform Support | Windows, Linux, macOS |
- Python 3.8 or higher
- CUDA-compatible GPU (optional, for training acceleration)
- 4GB+ RAM recommended
- Clone the repository:
git clone https://github.com/aadi611/ANC-M1.git
cd ANC-M1- Create virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Verify installation:
python -c "import torch; print(f'PyTorch {torch.__version__} - CUDA Available: {torch.cuda.is_available()}')"from denosiedaudiofinal import process_audio, load_model
import soundfile as sf
# Load the trained model
model, device = load_model('best_model.pth')
# Denoise an audio file
denoised_audio = process_audio(model, "noisy_audio.wav", device)
# Save the result
sf.write("clean_audio.wav", denoised_audio, 16000)python denosiedaudiofinal.pyThis will:
- Record 15 seconds of audio from your microphone
- Process the audio through the UNet model
- Save denoised output as
denoised_output.wav - Generate comparison plots
python training_final.pyTraining Configuration:
- Modify
DATASET_PATHSintraining_final.pyto point to your dataset - Adjust hyperparameters (learning rate, batch size, epochs)
- Monitor training progress in real-time
- Best model automatically saved based on validation loss
- Open
audio_denoiser_frontend.htmlin a web browser - Choose between:
- Upload Tab: Drag & drop audio files
- Record Tab: Record directly from microphone
- Click "Denoise Audio" to process
- Download the cleaned audio
ANC-M1 is a modular audio denoising system built on PyTorch, implementing a UNet-based encoder-decoder architecture for real-time noise suppression.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ANC-M1 System Architecture β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββ β
β β Input βββββββΆβ PreprocessingβββββββΆβ UNet β β
β β Audio β β (Chunking) β β Model β β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββ β
β β Denoised ββββββββ Postprocessingββββββββ Inference β β
β β Output β β (Stitching) β β Engine β β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Responsibilities:
- Load paired noisy-clean audio files
- Handle audio preprocessing (resampling, normalization)
- Implement data augmentation for training
- Provide batched data to training pipeline
Key Classes:
AudioDataset(Dataset)
βββ __init__(clean_folder, noisy_folder, sr, target_length)
βββ __len__()
βββ __getitem__(idx)
βββ _process_audio(audio)Features:
- Automatic file pair matching
- Dynamic padding/cropping to target length
- Mono audio conversion
- Path validation with meaningful errors
Architecture: UNet for 1D Audio Signals
Input: (batch, 1, 32000)
β
ββ[Encoder 1]β> (batch, 64, 32000) βββββ
β β
ββ[Encoder 2]β> (batch, 128, 16000) βββ β
β β β
ββ[Encoder 3]β> (batch, 256, 8000) ββ β β
β β β β
ββ[Encoder 4]β> (batch, 512, 4000) ββ€ β β
β β β β β
ββ[Bottleneck]β> (batch, 512, 2000) β β β β
β β β β β
[Upsample + Skip]βββββ β β β
β β β β
[Decoder 3]βββββββββββββ β β
β β β
[Decoder 2]βββββββββββββββ β
β β
[Decoder 1]βββββββββββββββββ
β
[Output Conv]
β
Output: (batch, 1, 32000) βββ
Key Components:
UNetANC(nn.Module)
βββ Encoder Blocks (4 levels)
β βββ DoubleConv (Conv1d β BatchNorm β LeakyReLU β Dropout)
β βββ MaxPool1d (downsampling)
β
βββ Bottleneck
β βββ DoubleConv (feature extraction at lowest resolution)
β
βββ Decoder Blocks (4 levels)
β βββ Upsample (linear interpolation)
β βββ Skip Connection (concatenation)
β βββ DoubleConv (reconstruction)
β
βββ Output Layer
βββ Conv1d (channel reduction)
βββ Tanh (output normalization to [-1, 1])Design Rationale:
- Skip Connections: Preserve high-frequency details lost in downsampling
- LeakyReLU: Prevent dying ReLU problem, better gradient flow
- BatchNorm: Stabilize training, allow higher learning rates
- Dropout: Regularization to prevent overfitting
- 1D Convolutions: Optimized for temporal audio data
Training Pipeline:
Dataset Loading
β
Data Splitting (80/20)
β
DataLoader Creation
β
Model Initialization
β
βββββββββββββββββββββββββββββββββββ
β Training Loop (per epoch) β
β β
β 1. Forward Pass β
β 2. Loss Calculation (MSE) β
β 3. Backward Pass β
β 4. Gradient Clipping β
β 5. Optimizer Step β
β 6. Validation β
β 7. Learning Rate Scheduling β
β 8. Checkpoint Saving β
β 9. Early Stopping Check β
β 10. Progress Visualization β
βββββββββββββββββββββββββββββββββββ
β
Best Model Selection
Key Features:
- Early Stopping: Prevents overfitting (patience=5, min_delta=1e-4)
- Learning Rate Scheduling: ReduceLROnPlateau (factor=0.5, patience=3)
- Gradient Clipping: Prevents exploding gradients (max_norm=1.0)
- Checkpoint Management: Saves best model based on validation loss
- Real-time Monitoring: Loss curves plotted every epoch
Hyperparameters:
{
'learning_rate': 0.001,
'batch_size': 32,
'num_epochs': 50,
'weight_decay': 1e-5,
'optimizer': 'Adam',
'loss_function': 'MSELoss'
}Processing Pipeline:
Audio Input (any length)
β
Chunk Splitting (32000 samples)
β
ββββββββββββββββββββββββββ
β For each chunk: β
β 1. Normalize β
β 2. To Tensor β
β 3. Model Forward β
β 4. Post-process β
ββββββββββββββββββββββββββ
β
Chunk Stitching (overlap handling)
β
Trim to Original Length
β
Denoised Output
Key Functions:
# Model loading with error handling
load_model(model_path, device) -> (model, device)
# Audio processing in chunks
process_audio(model, audio_file, device, chunk_size, sr) -> np.ndarray
# Recording from microphone
record_audio(filename, duration, sr) -> None
# Visualization
plot_audio_comparison(noisy, denoised, sr, save_path) -> NoneUser Interface Architecture:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Web Interface (HTML5) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ ββββββββββββββββ β
β β Upload Tab β β Record Tab β β
β βββββββββββββββ€ ββββββββββββββββ€ β
β β - Drag/Drop β β - Mic Access β β
β β - Preview β β - Timer β β
β β - Waveform β β - Controls β β
β βββββββββββββββ ββββββββββββββββ β
β β β β
β βββββββββββ¬ββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββ β
β β Audio Processor β β
β β (JavaScript) β β
β ββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββ β
β β Results Display β β
β β - Comparison β β
β β - Download β β
β ββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Features:
- Responsive design (mobile & desktop)
- Real-time waveform visualization
- Drag-and-drop file upload
- Microphone recording with timer
- Side-by-side audio comparison
- One-click download
βββββββββββββββ
β Raw Audio β
β (Noisy) β
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββββββββββ
β Preprocessing β
β - Load (librosa) β
β - Resample (16kHz) β
β - Normalize β
ββββββββ¬βββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Chunking β
β - Split into 32k β
β - Add padding β
β - Convert to tensor β
ββββββββ¬βββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Model Inference β
β - Encoder β
β - Bottleneck β
β - Decoder β
β - Skip connections β
ββββββββ¬βββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Postprocessing β
β - Concatenate β
β - Trim to length β
β - Denormalize β
ββββββββ¬βββββββββββββββ
β
βΌ
βββββββββββββββ
β Clean Audio β
β (Denoised) β
βββββββββββββββ
| Layer | Technology | Purpose |
|---|---|---|
| Deep Learning | PyTorch 2.0+ | Model implementation & training |
| Audio Processing | librosa, soundfile | Audio I/O and manipulation |
| Numerical Computing | NumPy | Array operations |
| Visualization | Matplotlib | Training curves & spectrograms |
| Frontend | HTML5, CSS3, JavaScript | Web interface |
| Recording | sounddevice | Microphone input |
| Data Loading | torch.utils.data | Efficient batching |
Each component (dataset, model, training, inference) is self-contained with clear interfaces.
def load_model(model_path, device=None):
"""Factory function for model instantiation"""
if device is None:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = UNetANC().to(device)
# ... load weights
return model, deviceDifferent audio processing strategies (chunk-based, streaming) can be swapped.
Training callbacks for early stopping, learning rate scheduling, and checkpointing.
Robust error handling at every layer:
# Input validation
if not Path(audio_file).exists():
raise FileNotFoundError(f"Audio file not found: {audio_file}")
# Device compatibility
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Graceful degradation
try:
# GPU processing
except RuntimeError as e:
# Fallback to CPU- Batch Processing: Process multiple audio chunks simultaneously
- GPU Acceleration: Automatic CUDA utilization when available
- Memory Management: Chunk-based processing for large files
- Mixed Precision: Optional FP16 training for 2x speedup
- Pin Memory: Faster data transfer between CPU and GPU
- Gradient Accumulation: Handle larger effective batch sizes
Current Limitations:
- Single-threaded inference
- No distributed training support
- Fixed sampling rate (16kHz)
Future Enhancements:
- Multi-GPU training with DistributedDataParallel
- REST API with FastAPI for remote inference
- Docker containerization
- Real-time streaming support with WebRTC
- Support for multiple sampling rates
- Model quantization for edge deployment
ANC-M1/
βββ π dataset.py # Dataset handling and preprocessing
βββ π― training_final.py # Training pipeline with monitoring
βββ π€ denosiedaudiofinal.py # Inference engine and recording
βββ π§ unet_anc_model.py # UNet model architecture
βββ π¨ audio_denoiser_frontend.html # Web interface
βββ π requirements.txt # Python dependencies
βββ π best_model.pth # Trained model checkpoint
βββ π training_progress.png # Loss curves
βββ π README.md # Documentation
# Record and denoise
$ python denosiedaudiofinal.py
Recording for 15 seconds...
β Recording saved as recorded_noisy.wav
Processing audio...
β Model loaded on cuda
β Denoised audio saved as denoised_output.wav
β Comparison plot saved as audio_comparison.pngfrom denosiedaudiofinal import process_audio, load_model
import soundfile as sf
# Load model
model, device = load_model('best_model.pth')
# Process audio
denoised = process_audio(model, "noisy.wav", device)
# Save result
sf.write("clean.wav", denoised, 16000)- Open
audio_denoiser_frontend.html - Upload or record audio
- Click "Denoise Audio"
- Compare and download results
dataset/
βββ clean/
β βββ audio_001.wav
β βββ audio_002.wav
β βββ ...
βββ noisy/
βββ audio_001.wav
βββ audio_002.wav
βββ ...
Edit training_final.py:
DATASET_PATHS = {
'clean_testset': '/path/to/clean',
'noisy_dataset': '/path/to/noisy'
}
# Hyperparameters
BATCH_SIZE = 32
LEARNING_RATE = 0.001
NUM_EPOCHS = 50python training_final.pyExpected Output:
Device: cuda
GPU: NVIDIA GeForce RTX 3080
β Dataset loaded: 5000 samples
Train samples: 4000, Val samples: 1000
β Model parameters: 2,547,201
============================================================
Epoch 1/50
============================================================
Epoch 1 [100/125] Loss: 0.023456
...
Validation loss improved: 0.034567 β 0.028901
β Checkpoint saved: best_model.pth
Epoch 1 Summary:
Train Loss: 0.028234
Val Loss: 0.028901
LR: 0.001000
# Modify unet_anc_model.py
model = UNetANC(
in_channels=1,
base_channels=64, # Increase for more capacity
dropout=0.2 # Adjust regularization
)# In training_final.py
criterion = torch.nn.L1Loss() # MAE instead of MSE
# or
from torch.nn import MultiTaskLoss
criterion = CustomSpectralLoss() # Frequency-domain loss# In dataset.py
def __getitem__(self, idx):
noisy, clean = self.load_audio_pair(idx)
# Add augmentation
if self.augment:
noisy = add_gaussian_noise(noisy, snr=random.uniform(0, 20))
noisy, clean = random_time_shift(noisy, clean)
return noisy, cleanWe welcome contributions! Here's how you can help:
- Check existing issues
- Create detailed bug report with:
- System information
- Steps to reproduce
- Expected vs actual behavior
- Error logs
- Open an issue with
[Feature Request]tag - Describe the enhancement
- Explain use case and benefits
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Make your changes with clear commit messages
- Add tests if applicable
- Update documentation
- Push to branch:
git push origin feature/amazing-feature - Open Pull Request with detailed description
- Follow PEP 8 for Python code
- Use type hints
- Add docstrings to functions
- Keep functions focused and modular
This project is licensed under the MIT License - see below for details:
MIT License
Copyright (c) 2024 Aadityan Gupta
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
- π Documentation: Read this README thoroughly
- π Bug Reports: Open an issue
- π‘ Feature Requests: Start a discussion
- π¬ Questions: Ask in Discussions
Special thanks to:
- PyTorch Team for the incredible deep learning framework
- librosa developers for audio processing tools
- Open-source community for inspiration and support
- Contributors who help improve this project
- Shiv Nadar University for providing research resources
If you use this project in your research, please cite:
@software{anc_m1_2024,
author = {Gupta, Aadityan},
title = {ANC-M1: Deep Learning-Based Active Noise Cancellation},
year = {2024},
publisher = {GitHub},
url = {https://github.com/aadi611/ANC-M1}
}- Basic UNet architecture
- Training pipeline
- Real-time inference
- Web interface
- Documentation
- REST API with FastAPI
- Real-time streaming support
- Multiple sampling rates
- Advanced loss functions
- Model ensemble
- Transformer-based architecture
- Multi-speaker separation
- Docker deployment
- Cloud integration
- Mobile app
β Star this repo if you find it helpful! β
Made with β€οΈ by Aadityan Gupta