context-based-captioning

A simple ASR pipeline that uses Shallow Fusion to fix domain-specific words (hotwords) in lecture audio. It catches things Whisper usually misses (like "eigenvalue" vs "icon value") by combining phonetic matching with a local language model.

How it works

The code listens to audio in chunks and transcribes them via Whisper. If Whisper is unsure about a word, the system:

Scans hotwords.txt for similar-sounding terms.
Uses GPT-2 to check if a candidate hotword actually makes sense in the current sentence.
Swaps the word if the combined confidence (ASR + LM) is higher.

Setup

Install dependencies:

pip install sounddevice openai-whisper transformers jellyfish Metaphone numpy torch

Put whatever jargon you need in hotwords.txt.
Run the live listener:
```
python3 main.py
```

Files

main.py: Entry point for the live audio stream.
fusion_processor.py: The actual rescoring logic.
phonetic_matcher.py: Metaphone + Levenshtein fuzzy matching.
asr_engine.py / lm_rescorer.py: Model wrappers for Whisper and GPT-2.
test_fusion.py: Quick script to verify rescoring logic without needing a mic.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
rnd		rnd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
asr_engine.py		asr_engine.py
audio_listener.py		audio_listener.py
fusion_processor.py		fusion_processor.py
hotwords.txt		hotwords.txt
lm_rescorer.py		lm_rescorer.py
main.py		main.py
phonetic_matcher.py		phonetic_matcher.py
test_fusion.py		test_fusion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

context-based-captioning

How it works

Setup

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

context-based-captioning

How it works

Setup

Files

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages