From 456510929949d7722602a4e74e4c515f262f34c2 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Wed, 18 Mar 2026 02:27:40 +0000 Subject: [PATCH] docs: add warning that speaker_options apply per-channel with multichannel transcription Co-Authored-By: Ryan Seams --- .../pre-recorded-audio/multichannel-transcription.mdx | 4 ++++ .../speech-to-text/pre-recorded-audio/speaker-diarization.mdx | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/fern/pages/speech-to-text/pre-recorded-audio/multichannel-transcription.mdx b/fern/pages/speech-to-text/pre-recorded-audio/multichannel-transcription.mdx index 65f8d3a5..570c7ca1 100644 --- a/fern/pages/speech-to-text/pre-recorded-audio/multichannel-transcription.mdx +++ b/fern/pages/speech-to-text/pre-recorded-audio/multichannel-transcription.mdx @@ -184,6 +184,10 @@ while (true) { If you have a multichannel audio file where individual channels may contain multiple speakers, you can combine `multichannel` and `speaker_labels` to perform diarization within each channel. + + When using `multichannel` with `speaker_labels`, the `speaker_options` parameters (`min_speakers_expected` and `max_speakers_expected`) are applied **per channel**, not globally across the entire file. For example, setting `min_speakers_expected: 5` and `max_speakers_expected: 7` on a 5-channel file means the model will find 5–7 speakers on _each_ channel, resulting in 25–35 total speakers. Adjust your speaker options accordingly when using multichannel transcription. + + When both parameters are enabled: - Channels are labeled numerically (1, 2, 3, etc.) diff --git a/fern/pages/speech-to-text/pre-recorded-audio/speaker-diarization.mdx b/fern/pages/speech-to-text/pre-recorded-audio/speaker-diarization.mdx index 81bcb86d..3f0f77e5 100644 --- a/fern/pages/speech-to-text/pre-recorded-audio/speaker-diarization.mdx +++ b/fern/pages/speech-to-text/pre-recorded-audio/speaker-diarization.mdx @@ -430,6 +430,10 @@ This parameter is suitable for use cases where there is a known minimum/maximum labels. + + When using `multichannel` with `speaker_labels`, the `speaker_options` parameters are applied **per channel**, not globally across the entire file. For example, setting `min_speakers_expected: 5` and `max_speakers_expected: 7` on a 5-channel file means the model will find 5–7 speakers on _each_ channel, resulting in 25–35 total speakers. Adjust your speaker options accordingly when using [multichannel transcription](/docs/speech-to-text/pre-recorded-audio/multichannel-transcription). + + Building on the [Quickstart](#quickstart) above, add `speaker_options` to your transcription config: