From a5458fceb2c363ac80ff741933e220f90e499b7c Mon Sep 17 00:00:00 2001 From: Bearclaw Date: Fri, 20 Mar 2026 23:54:07 +0000 Subject: [PATCH] Add troubleshooting section for ALD accented speech misdetection Accented English speech can be misidentified as other languages (Finnish, Latvian, Latin, Arabic), causing the model to transcribe IN that language rather than just mislabeling it. This is a recurring source of customer confusion. The new section explains the failure mode and recommends mitigations: constraining expected_languages, using confidence thresholds, and monitoring language_confidence in responses. Co-Authored-By: Claude Opus 4.6 --- .../automatic-language-detection.mdx | 34 +++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/fern/pages/speech-to-text/pre-recorded-audio/automatic-language-detection.mdx b/fern/pages/speech-to-text/pre-recorded-audio/automatic-language-detection.mdx index f4ed9cbd..3ce298e9 100644 --- a/fern/pages/speech-to-text/pre-recorded-audio/automatic-language-detection.mdx +++ b/fern/pages/speech-to-text/pre-recorded-audio/automatic-language-detection.mdx @@ -938,3 +938,37 @@ while (true) { an error message like `detected language 'bg', confidence 0.2949, is below the requested confidence threshold value of '0.4'`. + +## Troubleshooting + +### Accented speech detected as the wrong language + +Automatic Language Detection uses Whisper-based language identification, which can misidentify heavily accented speech as a different language. For example, English spoken with a strong accent may be detected as Finnish, Latvian, Latin, or Arabic. + +When this happens, the model doesn't just return a wrong language label -- it **transcribes the audio in the incorrectly detected language**. This effectively translates the speech rather than transcribing it, producing output in a language the speaker wasn't using. + + + This can occur even with high confidence scores. A misdetection at 93% confidence still results in transcription in the wrong language. + + +### Recommended mitigations + +**Use `expected_languages` to constrain detection (most effective).** If you know which languages your audio may contain, set `expected_languages` to only those languages. This prevents the model from selecting an unexpected language entirely. + +For example, if your application processes interviews in English, Spanish, and French: + +```json +{ + "language_detection": true, + "language_detection_options": { + "expected_languages": ["en", "es", "fr"], + "fallback_language": "en" + } +} +``` + +Setting `fallback_language` to your most common language (e.g., `"en"`) ensures that if the model can't confidently choose between the expected languages, it defaults to the language most likely to produce a useful transcript. + +**Use `language_confidence_threshold` to reject low-confidence detections.** Setting a threshold (e.g., `0.7`) causes the API to return an error instead of a transcript when confidence is low. This helps catch some misdetections, but not cases where the model is confidently wrong. + +**Monitor `language_confidence` in responses.** Log the `language_code` and `language_confidence` fields from your transcript responses. Unexpected language codes or unusual confidence patterns can help you identify misdetection issues early and decide whether to retry with `expected_languages` or flag the transcript for review.