Fix Spark-TTS model hanging issue and add further optimzation by huwenjie333 · Pull Request #7 · SunbirdAI/worker-vllm

huwenjie333 · 2026-02-25T08:36:23Z

This PR fixed the hanging issue of Spark-TTS model on modal platform, by the following fixes:

replaced vLLM's LLM class with AsyncLLMEngine for thread safety. The hanging issue is because the default LLM class from vLLM is not thread-safe. When multiple requests come in, they might cause race condition and deadlock. I replaced it with AsyncLLMEngine which ensures the thread safety, which is also the one used by vllm serve cli command.
Update @modal.fastapi_endpoint(...) implementation of generate to async def generate(...).
Update the prompt generation step to asynchronously interact with AsyncLLMEngine.generate, gathering multiple results fully asynchronously using asyncio.gather.
offload the audio detokenization operation via asyncio.to_thread to ensure that standard PyTorch GPU operations do not inadvertently block Modal's Python event loop for prolonged durations.

In addition, future optimizations was implemented:

increase the maximum concurrent requests to the same container from 10 to 100, as there are sufficient GPU memory (24 GB from Nvidia L4) to handle a 0.5B model (1 GB) and reserve a large amount of KV cache. In this way, we don't need to wait for the cold-start time for a new container.
return the FastAPI Response instead of StreamingResponse so that Uvicorn's internal thread pool won't be overloaded by chunking audio into small bytes for streaming, which reduced the total latency from 100+ seconds to a few seconds when large amount of requests are received concurrently.

A stress test was done by sending 80 requests consecutively with 0.3 seconds interval, where each requests has 3 sentences and ~40 words. All the requests were processed successfully in ~4 seconds using a single GPU container.

for i in {1..80}; do
  curl -X POST --get "https://sb-modal-ws--spark-tts-salt-sparktts-generate.modal.run" \
    --data-urlencode "text=I am a nurse who takes care of many people who have cancer.I am a nurse who takes care of many people who have cancer.I am a nurse who takes care of many people who have fever." \
    --data-urlencode "speaker_id=248" \
    --output "output_$i.wav" &
  sleep 0.3
done; wait

PatrickCmd

LGTM

jqug · 2026-02-26T09:46:40Z

Thanks for this! LGTM

For future, we might want to experiment with different settings to see how it affects latency and throughput. Since latency is particularly important, we could check if increasing max_num_seqs makes the TTFT slower. I noticed this from a vLLM optimisation guide:

fixed

3c96246

huwenjie333 changed the title ~~fixed~~ Fix Spark-TTS hanging issues and add future optimization Feb 25, 2026

huwenjie333 changed the title ~~Fix Spark-TTS hanging issues and add future optimization~~ Fix Spark-TTS model hanging issue and add further optimzation Feb 25, 2026

whisper print text outputs

87e00b6

huwenjie333 requested review from PatrickCmd and jqug February 26, 2026 08:50

PatrickCmd approved these changes Feb 26, 2026

View reviewed changes

jqug approved these changes Feb 26, 2026

View reviewed changes

huwenjie333 merged commit 22e48e8 into deploy Feb 27, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Spark-TTS model hanging issue and add further optimzation#7

Fix Spark-TTS model hanging issue and add further optimzation#7
huwenjie333 merged 2 commits intodeployfrom
fix-tts

huwenjie333 commented Feb 25, 2026 •

edited

Loading

Uh oh!

PatrickCmd left a comment

Uh oh!

jqug commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huwenjie333 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PatrickCmd left a comment

Choose a reason for hiding this comment

Uh oh!

jqug commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huwenjie333 commented Feb 25, 2026 •

edited

Loading