Add vLLM cache to TTS model by huwenjie333 · Pull Request #3 · SunbirdAI/worker-vllm

huwenjie333 · 2026-01-09T11:25:11Z

This PR adds the vLLM cache to the persist storage in the deployment at /root/.cache/vllm. It reduces cold start time from 1.5 mins to 1 min.

…npod-workers#236) remove unsupported cuda versions (12.1-12.3) from hub.json and tests.json to fix compatibility issues with worker deployment - hub.json: remove 12.1, 12.2, 12.3 from allowedCudaVersions - tests.json: remove 12.1, 12.2, 12.3, 12.4 from allowedCudaVersions refs: AE-1452

) * enable expert parallel arg for moe models * add ENABLE_EXPERT_PARALLEL to hub config

Sunflower-Ultravox Deployment with Modal platform

Spark-tts-salt deployemnt

PatrickCmd and others added 28 commits November 14, 2025 12:47

Vllm deploy on runpod

754cb0d

fix: remove space from gpuIds (runpod-workers#234)

67de76c

feat: bump transformers to allow Qwen3-VL (runpod-workers#225)

a88d20e

add ENABLE_EXPERT_PARALLEL engine arg for MoE models (runpod-workers#239

ee8eae2

) * enable expert parallel arg for moe models * add ENABLE_EXPERT_PARALLEL to hub config

chore(deps): update runpod to latest version (runpod-workers#242)

458a269

sunflower-ultravox-vllm inference testing

1506060

API testing

8a155ab

Sunflower Ultravox deployment

3b2ea3c

init

aa10d4a

Qwen3-8B-FP8

beb694b

Sunflower-14B-FP8

95272fc

default client script

6362b00

sunflower client updates

9700a58

deployed ultravox

daacae9

readme

78f3b5a

update temperature

4acbdd8

reorganize files

7952177

Merge pull request #1 from SunbirdAI/modal-deploy

a1616dc

Sunflower-Ultravox Deployment with Modal platform

init

672d266

update to deploy spark-tts-salt

dfbd0de

getting 404 errors

71ac7c9

time_to_first_token_seconds

f28285b

fix errors

56a8135

update comments

5c65e62

enforce_eager=False; latency loggings

acd57b6

minor comments updates

586d1eb

Merge pull request #2 from SunbirdAI/spark-tts-salt

dc9ce99

Spark-tts-salt deployemnt

huwenjie333 closed this Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM cache to TTS model#3

Add vLLM cache to TTS model#3
huwenjie333 wants to merge 28 commits intomainfrom
deploy

huwenjie333 commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

huwenjie333 commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants