A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
-
Updated
Mar 17, 2026 - Python
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.
TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks
FunASR实时语音识别版,识别麦克风和电脑内播放的声音,电脑语音打字软件
[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.
SHALLOW, the first hallucination benchmark for ASR models
🎤 Enable voice recognition for the Doubao input method using Python; ideal for learning and research with a focus on audio processing.
Provide accurate voice activity and audio event detection in 100+ languages with high-performance streaming and non-streaming capabilities.
🎨 Explore clip-path techniques in HTML and CSS to create interactive menus and dynamic shapes without JavaScript for responsive design.
Add a description, image, and links to the speechllm topic page so that developers can more easily learn about it.
To associate your repository with the speechllm topic, visit your repo's landing page and select "manage topics."