Skip to content

Feature Request: Automatic Model Selection Based on Language Detection #3

@erklu

Description

@erklu

Description

In the current proof of concept, the language model is selected manually by the user—either one of the default Whisper models or a KB-Whisper variant.
To streamline the user experience, we propose an automated default setting where the system selects the most suitable model based on language detection.

Expected Behavior
• If the user does not select a specific model, the system should automatically detect the language of the input audio.
• Based on the detected language, the system selects the optimal model:
• If the language is Swedish → Use KB-Whisper Large
• If another language is detected → Use Whisper Large (OpenAI)
• If language is unknown or confidence is low → Fallback to Whisper Large

Technical Guidelines

  1. Perform fast language detection on the first 30 seconds of the audio
    • Whisper has a built-in method (language field in the result dict) for this.
    • Alternatively, faster detection tools like langdetect or fastText can be considered.
  2. Select the model based on the detected language
    • If sv (Swedish) is detected with high confidence (e.g., >90%), load KB-Whisper Large.
    • Otherwise, default to Whisper Large (OpenAI).
  3. Log the model selection decision
    • Store this information in a sidecar file for traceability and to allow analysis/improvement of the fallback logic over time.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions