Skip to content

harrywillss/ManuAI

Repository files navigation

ManuAI 🦜 - New Zealand Bird Sound Classification

A machine learning project that classifies New Zealand bird sounds using Vision Transformer (ViT) models fine-tuned with LoRA (Low-Rank Adaptation) on mel spectrogram representations of audio recordings.

🎯 Project Overview

ManuAI transforms bird audio recordings into mel spectrograms and uses computer vision techniques to classify different New Zealand bird species. The project leverages the unique approach of treating audio classification as an image classification problem, using Google's Vision Transformer (ViT) model fine-tuned with LoRA for efficient training.

Key takeaway: Visually classifying sound --> seeing sounds...

Key Features

  • Audio-to-Image Conversion: Converts bird audio recordings to mel spectrograms for visual processing
  • LoRA Fine-tuning: Efficient parameter-efficient fine-tuning of pre-trained ViT models
  • Class Imbalance Handling: Implements augmentation to handle imbalanced datasets
  • Automated Data Pipeline: Complete pipeline from data download to model training
  • Early Stopping: Prevents overfitting with configurable early stopping callbacks

πŸš€ Getting Started

πŸ§‘β€πŸ’» How to Use

Follow these steps to train and use ManuAI:

  1. Download Data

    • Run download_data.py to fetch New Zealand bird recordings from Xeno-canto and Kaggle.
    • Example:
      python download_data.py
  2. Preprocess Data

    • Run preprocess_data.py to segment and convert audio files into mel spectrograms.
    • Example:
      python preprocess_data.py
  3. Fine-tune the Model

    • Open and run all cells in lora-finetune.ipynb to fine-tune the Vision Transformer model using LoRA.
  4. Run Inference

    • Use inference.py to classify new bird audio samples.
    • Example:
    python inference.py

See each script/notebook for additional options and configuration details.

πŸ“Š Dataset

The project uses bird recordings from Xeno-canto, a citizen science project focused on sharing bird sounds from around the world, as well as the 'New Zealand Bird Sound' Kaggle dataset

Data Processing Pipeline

  1. Download: Fetches New Zealand bird recordings via Xeno-canto API and Kaggle.
  2. Segmentation: Splits recordings into 4-second segments
  3. Quality Filtering: Removes low-quality or silent segments
  4. Spectrogram Conversion: Converts audio to mel spectrograms
  5. Augmentation: Applies data augmentation techniques

Supported Bird Species

The model currently supports classification of 20 New Zealand bird species including:

  • Tui (Prosthemadera novaeseelandiae)
  • Bellbird (Anthornis melanura)
  • Kaka (Nestor meridionalis)
  • Robin (Petroica species)
  • Morepork (Ninox novaeseelandiae)
  • Fantail (Rhipidura fuliginosa)
  • And many more...

πŸ€– Model Architecture

Base Model

  • Google ViT-Base-Patch16-224: Pre-trained Vision Transformer
  • Input Size: 224x224 RGB images (mel spectrograms)
  • Patch Size: 16x16 pixels

Training Features

  • Class Weighting: Handles imbalanced datasets
  • Early Stopping: Prevents overfitting
  • Learning Rate Scheduling: Warmup and decay
  • Mixed Precision: Optional FP16 training

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License.

πŸ™ Acknowledgments

  • Xeno-canto for providing the bird sound database
  • Hugging Face for the Transformers library and model hosting
  • Google Research for the Vision Transformer architecture
  • Microsoft for the LoRA (Low-Rank Adaptation) technique

πŸ“š Citation

If you use this project in your research, please cite:

@misc{manuai2025,
  title={ManuAI: New Zealand Bird Sound Classification using Vision Transformers},
  author={Harry Wills},
  year={2025},
  url={https://github.com/harrywillss/ManuAI}
}

πŸ“ž Contact

Harry Wills - @harrywillss

Project Link: https://github.com/harrywillss/ManuAI


Made with ❀️ for New Zealand's native bird conservation

Releases

No releases published

Packages

 
 
 

Contributors