Conversation
|
Thanks @huwenjie333 this is great. You mention that generating a sentence takes 3-4 seconds, is it possible to find out which part of the processing is taking this time? It should be a few hundred milliseconds, max 1 second I think. At least that's what I was getting when testing on an RTX4090 GPU. |
I can get ~1s when When |
|
Ah OK that makes sense. I think we would want to use |
|
Does Modal have any persistent storage that could be used to cache the compiled model? Much of that increased cold-start time is model setup, which perhaps is possible to avoid doing every time... (Pointer here on vllm caching) |
This PR deploys the Sunbird/spark-tts-salt model to the Modal platform.
enforce_eager=False, it takes ~2 min for cold startup, and ~1s for generating a sentence.Usage Example:
Speaker ID:
241: Acholi (female)
242: Ateso (female)
243: Runyankore (female)
245: Lugbara (female)
246: Swahili (male)
248: Luganda (female)
This PR also inlcudes other codes for testing the latency of Sunflower Ultravox model.