Feature Request: Automatic Model Selection Based on Language Detection

Description

In the current proof of concept, the language model is selected manually by the user—either one of the default Whisper models or a KB-Whisper variant.
To streamline the user experience, we propose an automated default setting where the system selects the most suitable model based on language detection.

⸻

Expected Behavior
• If the user does not select a specific model, the system should automatically detect the language of the input audio.
• Based on the detected language, the system selects the optimal model:
• If the language is Swedish → Use KB-Whisper Large
• If another language is detected → Use Whisper Large (OpenAI)
• If language is unknown or confidence is low → Fallback to Whisper Large

⸻

Technical Guidelines
1. Perform fast language detection on the first 30 seconds of the audio
• Whisper has a built-in method (language field in the result dict) for this.
• Alternatively, faster detection tools like langdetect or fastText can be considered.
2. Select the model based on the detected language
• If sv (Swedish) is detected with high confidence (e.g., >90%), load KB-Whisper Large.
• Otherwise, default to Whisper Large (OpenAI).
3. Log the model selection decision
• Store this information in a sidecar file for traceability and to allow analysis/improvement of the fallback logic over time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Automatic Model Selection Based on Language Detection #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Automatic Model Selection Based on Language Detection #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions