ManuAI 🦜 - New Zealand Bird Sound Classification

A machine learning project that classifies New Zealand bird sounds using Vision Transformer (ViT) models fine-tuned with LoRA (Low-Rank Adaptation) on mel spectrogram representations of audio recordings.

🎯 Project Overview

ManuAI transforms bird audio recordings into mel spectrograms and uses computer vision techniques to classify different New Zealand bird species. The project leverages the unique approach of treating audio classification as an image classification problem, using Google's Vision Transformer (ViT) model fine-tuned with LoRA for efficient training.

Key takeaway: Visually classifying sound --> seeing sounds...

Key Features

Audio-to-Image Conversion: Converts bird audio recordings to mel spectrograms for visual processing
LoRA Fine-tuning: Efficient parameter-efficient fine-tuning of pre-trained ViT models
Class Imbalance Handling: Implements augmentation to handle imbalanced datasets
Automated Data Pipeline: Complete pipeline from data download to model training
Early Stopping: Prevents overfitting with configurable early stopping callbacks

🚀 Getting Started

🧑‍💻 How to Use

Follow these steps to train and use ManuAI:

Download Data
- Run download_data.py to fetch New Zealand bird recordings from Xeno-canto and Kaggle.
- Example:
```
python download_data.py
```
Preprocess Data
- Run preprocess_data.py to segment and convert audio files into mel spectrograms.
- Example:
```
python preprocess_data.py
```
Fine-tune the Model
- Open and run all cells in lora-finetune.ipynb to fine-tune the Vision Transformer model using LoRA.
Run Inference
- Use inference.py to classify new bird audio samples.
- Example:
```
python inference.py
```

See each script/notebook for additional options and configuration details.

📊 Dataset

The project uses bird recordings from Xeno-canto, a citizen science project focused on sharing bird sounds from around the world, as well as the 'New Zealand Bird Sound' Kaggle dataset

Data Processing Pipeline

Download: Fetches New Zealand bird recordings via Xeno-canto API and Kaggle.
Segmentation: Splits recordings into 4-second segments
Quality Filtering: Removes low-quality or silent segments
Spectrogram Conversion: Converts audio to mel spectrograms
Augmentation: Applies data augmentation techniques

Supported Bird Species

The model currently supports classification of 20 New Zealand bird species including:

Tui (Prosthemadera novaeseelandiae)
Bellbird (Anthornis melanura)
Kaka (Nestor meridionalis)
Robin (Petroica species)
Morepork (Ninox novaeseelandiae)
Fantail (Rhipidura fuliginosa)
And many more...

🤖 Model Architecture

Base Model

Google ViT-Base-Patch16-224: Pre-trained Vision Transformer
Input Size: 224x224 RGB images (mel spectrograms)
Patch Size: 16x16 pixels

Training Features

Class Weighting: Handles imbalanced datasets
Early Stopping: Prevents overfitting
Learning Rate Scheduling: Warmup and decay
Mixed Precision: Optional FP16 training

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License.

🙏 Acknowledgments

Xeno-canto for providing the bird sound database
Hugging Face for the Transformers library and model hosting
Google Research for the Vision Transformer architecture
Microsoft for the LoRA (Low-Rank Adaptation) technique

📚 Citation

If you use this project in your research, please cite:

@misc{manuai2025,
  title={ManuAI: New Zealand Bird Sound Classification using Vision Transformers},
  author={Harry Wills},
  year={2025},
  url={https://github.com/harrywillss/ManuAI}
}

📞 Contact

Harry Wills - @harrywillss

Project Link: https://github.com/harrywillss/ManuAI

Made with ❤️ for New Zealand's native bird conservation

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
__pycache__		__pycache__
logs		logs
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
analyse_dataset.ipynb		analyse_dataset.ipynb
ast_finetune.ipynb		ast_finetune.ipynb
bird_types.yaml		bird_types.yaml
brainstorm.md		brainstorm.md
confusion_matrix.png		confusion_matrix.png
download_data.py		download_data.py
finetune.ipynb		finetune.ipynb
for_final.md		for_final.md
inference.py		inference.py
preprocess_data.py		preprocess_data.py
show_weights.py		show_weights.py
training_metrics.json		training_metrics.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ManuAI 🦜 - New Zealand Bird Sound Classification

🎯 Project Overview

Key Features

🚀 Getting Started

🧑‍💻 How to Use

📊 Dataset

Data Processing Pipeline

Supported Bird Species

🤖 Model Architecture

Base Model

Training Features

🤝 Contributing

Development Setup

📝 License

🙏 Acknowledgments

📚 Citation

📞 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ManuAI 🦜 - New Zealand Bird Sound Classification

🎯 Project Overview

Key Features

🚀 Getting Started

🧑‍💻 How to Use

📊 Dataset

Data Processing Pipeline

Supported Bird Species

🤖 Model Architecture

Base Model

Training Features

🤝 Contributing

Development Setup

📝 License

🙏 Acknowledgments

📚 Citation

📞 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages