Skip to content

[Inworld] Flush to drain decoder on every audio chunk from server#4983

Open
ianbbqzy wants to merge 1 commit intolivekit:mainfrom
ianbbqzy:ian/inworld-flush
Open

[Inworld] Flush to drain decoder on every audio chunk from server#4983
ianbbqzy wants to merge 1 commit intolivekit:mainfrom
ianbbqzy:ian/inworld-flush

Conversation

@ianbbqzy
Copy link
Contributor

@ianbbqzy ianbbqzy commented Mar 3, 2026

The emitter's non-PCM path still routes through _decode_task, which on FlushSegment does audio_decoder.end_input() + await decode_atask + audio_decoder = None. This means:

  1. Without per-chunk flush(): All audio bytes accumulate in the decoder. The final output_emitter.flush() at line 1137 is the only FlushSegment. TTFB = time to receive all audio from server, not just the first chunk.

  2. With per-chunk flush(): Each chunk triggers FlushSegment → decoder drains → _flush_frame()SynthesizedAudio is emitted. TTFB = time to first chunk from server (correct).

This is an Inworld-specific issue because Inworld's HTTP API returns one JSON-line per audio chunk (each with its own WAV payload), unlike other providers which returns a single continuous raw PCM byte stream. For Inworld, each JSON line is a discrete audio segment, and delaying the flush until the end defeats the purpose of streaming.

====

Also updated timestamps parsing logic to always add a trailing space regardless if it's the end of a chunk or not. Because in Inworld case, end of a chunk could just be end of a phrase rather than end of a sentence

devin-ai-integration[bot]

This comment was marked as resolved.

@ianbbqzy ianbbqzy force-pushed the ian/inworld-flush branch from 839b7ba to daf64bb Compare March 3, 2026 01:26
@ianbbqzy
Copy link
Contributor Author

ianbbqzy commented Mar 3, 2026

@tinalenguyen @davidzhao PTAL, Thanks!

@tinalenguyen
Copy link
Member

Hi @ianbbqzy, thanks for the PR! Few notes and Q's:

  • Could you update the TTS docstring with the new defaults?
  • If each chunk is already flushed, might be worth removing the outer flush here

Also updated timestamps parsing logic to always add a trailing space regardless if it's the end of a chunk or not. Because in Inworld case, end of a chunk could just be end of a phrase rather than end of a sentence

Could you elaborate more on this? I also noticed that the last punctuation is not present, not sure if that was pre-existing behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants