Spark-tts-salt deployemnt by huwenjie333 · Pull Request #2 · SunbirdAI/worker-vllm

huwenjie333 · 2025-12-17T15:00:33Z

This PR deploys the Sunbird/spark-tts-salt model to the Modal platform.

Deployment dashboard: https://modal.com/apps/sb-modal-ws/main/deployed/spark-tts-salt
With enforce_eager=False, it takes ~2 min for cold startup, and ~1s for generating a sentence.

Usage Example:

 curl -X POST --get "https://sb-modal-ws--spark-tts-salt-sparktts-generate.modal.run" \
   --data-urlencode "text=I am a nurse who takes care of many people who have cancer." \
   --data-urlencode "speaker_id=248" \
   --output output.wav

Speaker ID:
241: Acholi (female)
242: Ateso (female)
243: Runyankore (female)
245: Lugbara (female)
246: Swahili (male)
248: Luganda (female)

This PR also inlcudes other codes for testing the latency of Sunflower Ultravox model.

jqug · 2026-01-05T11:26:46Z

Thanks @huwenjie333 this is great. You mention that generating a sentence takes 3-4 seconds, is it possible to find out which part of the processing is taking this time? It should be a few hundred milliseconds, max 1 second I think. At least that's what I was getting when testing on an RTX4090 GPU.

huwenjie333 · 2026-01-05T11:45:49Z

Thanks @huwenjie333 this is great. You mention that generating a sentence takes 3-4 seconds, is it possible to find out which part of the processing is taking this time? It should be a few hundred milliseconds, max 1 second I think. At least that's what I was getting when testing on an RTX4090 GPU.

I can get ~1s when enforce_eager=False and cuda graph is built during the model initialization. However, it increases the cold startup time from 1 min to 1.5-2 mins.

Text chunking time: 0.00s 
Prompt preparation time: 0.00s
Model generation time: 1.05s
Audio decoding time: 0.04s 
Audio saving time: 0.00s 
Total generation time: 1.10s

When enforce_eager=True, the inference becomes ~3s for one sentence.

Text chunking time: 0.00s 
Prompt preparation time: 0.00s
Model generation time: 3.13s
Audio decoding time: 0.06s 
Audio saving time: 0.00s 
Total generation time: 3.19s

jqug · 2026-01-05T11:55:22Z

Ah OK that makes sense. I think we would want to use enforce_eager=False in practice, as otherwise the latency is too high to make it feel like a natural response. There could be some tricks like to already warm up the endpoint as soon as someone loads the frontend (i.e. it could be starting up in parallel with the LLM).

jqug · 2026-01-05T12:17:44Z

Does Modal have any persistent storage that could be used to cache the compiled model? Much of that increased cold-start time is model setup, which perhaps is possible to avoid doing every time... (Pointer here on vllm caching)

huwenjie333 added 5 commits December 15, 2025 15:16

init

672d266

update to deploy spark-tts-salt

dfbd0de

getting 404 errors

71ac7c9

time_to_first_token_seconds

f28285b

fix errors

56a8135

huwenjie333 changed the title ~~[WIP] Spark-tts-salt deployemnt~~ Spark-tts-salt deployemnt Jan 5, 2026

huwenjie333 requested review from PatrickCmd and jqug January 5, 2026 11:11

update comments

5c65e62

huwenjie333 added 2 commits January 5, 2026 15:19

enforce_eager=False; latency loggings

acd57b6

minor comments updates

586d1eb

PatrickCmd approved these changes Jan 7, 2026

View reviewed changes

PatrickCmd merged commit dc9ce99 into deploy Jan 7, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark-tts-salt deployemnt#2

Spark-tts-salt deployemnt#2
PatrickCmd merged 8 commits intodeployfrom
spark-tts-salt

huwenjie333 commented Dec 17, 2025 •

edited

Loading

Uh oh!

jqug commented Jan 5, 2026

Uh oh!

huwenjie333 commented Jan 5, 2026 •

edited

Loading

Uh oh!

jqug commented Jan 5, 2026

Uh oh!

jqug commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huwenjie333 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jqug commented Jan 5, 2026

Uh oh!

huwenjie333 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jqug commented Jan 5, 2026

Uh oh!

jqug commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huwenjie333 commented Dec 17, 2025 •

edited

Loading

huwenjie333 commented Jan 5, 2026 •

edited

Loading