GitHub - Halleck45/OpenPronounce: AI-powered open-source toolkit for real-time English pronunciation feedback.

This project provides tools for analyzing and improving English pronunciation using AI models. It includes a web application and a command-line interface (CLI) for comparing audio files against expected text, providing detailed feedback on phonemes and prosody, and visualizing visemes. It leverages Wav2Vec 2.0 for audio feature extraction and DTW for phoneme alignment.

See my blog post for more details about the approach.

Demo notebook

You can explore the project and test the pronunciation analysis directly in the provided Jupyter notebook:

➡️ Open in Jupyter Notebook

This notebook walks through:

loading an audio sample,
transcribing speech with Wav2Vec2,
comparing predicted and expected phonemes,
visualizing pronunciation and prosody scores.

Installation

pip install -r requirements.txt
python -m spacy download en_core_web_sm

Usage

As library

import audio
import speech

audio_path = "example.mp3"
expected_text = "Hello I am a developer"

sound = audio.load(audio_path)
prediction = speech.compare_audio_with_text(sound, expected_text)

print(prediction.score)

As CLI

python cli.py <file.mp3 or file.wav> "Expected text"

Predicted output

You will get JSON with the following fields:

Phonemes

score: [0-100] pronunciation score
differences.phoneme_distance: DTW distance between expected and predicted phonemes
differences.phonemes: list of phonemes with their start and end time,
differences.errors: array of errors, with:
- position: index of the word in the expected text
- expected: expected phonemes
- actual: predicted phonemes
- word: the expected word
For example, an error can be:
```
{
    "position": 0,
    "expected": "hæloʊ",
    "actual": "hɛl",
    "word": "Hell"
},
```
transcribe: the transcription of the audio

Prosody

You also get prosody information, with:

prosody.f0: the fundamental frequency (pitch) contour
prosody.energy: the energy (loudness) contour

Visemes

This application comes with a (very!) basic phoneme to viseme javascript implementation, for English language. An better implementation could be done using dedicated models.

import { Viseme } from "/static/viseme.js";
const mouthImage = document.getElementById('the-img-node-you-want-to-use');
const viseme = new Viseme(mouthImage);

// Play the phonemes
viseme.play(['həloʊ', 'huː', 'ɑːɹ', 'juː']);

Web application

FastAPI Server

Mount the server:

python -m uvicorn server:app --host 0.0.0.0 --port 8000 --reload

Note: recording using your micro is not possible in Chrome with a no-https local environment.

Streamlit Application

You can also run the application using Streamlit:

streamlit run streamlit_app.py

The Streamlit app provides:

Text input for the expected pronunciation
Audio file upload (WAV, MP3, M4A, OGG, WEBM)
Text-to-speech to listen to the correct pronunciation
Detailed analysis results with scores, transcription, and word-by-word feedback
Interactive charts for prosody (F0 and energy) and phoneme comparison

Contributing

Please keep pytests up-to-date:

pytest -v

References

Visemes

The viseme images come from the HumanBeanCMU39 viseme set.

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
assets		assets
docs		docs
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
OpenPronounce-demo.ipynb		OpenPronounce-demo.ipynb
README.md		README.md
audio.py		audio.py
cli.py		cli.py
packages.txt		packages.txt
pytest.ini		pytest.ini
requirements.txt		requirements.txt
server.py		server.py
speech.py		speech.py
streamlit_app.py		streamlit_app.py
test_speech.py		test_speech.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Demo notebook

Installation

Usage

As library

As CLI

Predicted output

Visemes

Web application

FastAPI Server

Streamlit Application

Contributing

References

Visemes

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Demo notebook

Installation

Usage

As library

As CLI

Predicted output

Visemes

Web application

FastAPI Server

Streamlit Application

Contributing

References

Visemes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages