Skip to content

feat: Qwen3-TTS 1.7B VoiceDesign ONNX backend#2

Closed
wavekat-eason wants to merge 13 commits intomainfrom
feat/qwen3-tts-1.7b-int4
Closed

feat: Qwen3-TTS 1.7B VoiceDesign ONNX backend#2
wavekat-eason wants to merge 13 commits intomainfrom
feat/qwen3-tts-1.7b-int4

Conversation

@wavekat-eason
Copy link
Copy Markdown
Contributor

Summary

  • Adds the Qwen3-TTS-12Hz-1.7B-VoiceDesign ONNX backend with auto-download from HF Hub
  • Supports INT4 (default) and FP32 precision via ModelPrecision enum
  • VoiceDesign instruction API (SynthesizeRequest::with_instruction) for style control
  • Interactive CLI example with live language/instruction switching
  • Model loading progress reported during download and session init

Test plan

  • make test-qwen3 passes
  • cargo run --example synthesize --features qwen3-tts,hound -- "Hello" downloads INT4 and produces audio
  • cargo run --example synthesize --features qwen3-tts,hound -- --precision fp32 "Hello" downloads FP32 and produces audio
  • cargo run --example synthesize --features qwen3-tts,hound -- -i enters interactive mode
  • WAVEKAT_MODEL_DIR=<path> cargo run --example synthesize --features qwen3-tts,hound -- "Hello" loads from local dir

🤖 Generated with Claude Code

wavekat-eason and others added 13 commits April 6, 2026 20:19
Switch qwen3-tts backend from 0.6B/elbruno to the WaveKat
1.7B VoiceDesign ONNX repo with INT4 by default.

- download: replace ureq HTTP with hf-hub client; downloads
  int4/ ONNX + embeddings/ + tokenizer/ from
  wavekat/Qwen3-TTS-1.7B-VoiceDesign-ONNX (pinned revision)
- model: HIDDEN_DIM 1024→2048, MAX_NEW_TOKENS 2048→8192,
  sampling updated to config.json values (top_k=50, temp=0.9),
  non-streaming prefill (all text in prefill, matches
  generate_onnx.py), fix last-position logits extraction bug,
  min_new_tokens=2, VoiceDesign codec prefix (4 tokens, no
  speaker slot), int4/ and embeddings/ subdirectory paths
- sampler: replace top_p with top_k
- tokenizer: load from tokenizer/ subdirectory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `instruction` field to `SynthesizeRequest` with `with_instruction()`
- Qwen3-TTS backend builds user-turn prefix from instruction tokens;
  warns on stderr when no instruction is provided
- Upgrade hf-hub 0.3 → 0.5 (fixes relative-Location redirect bug)
- synthesize example: default instruction, /lang, /langs, /instruct,
  /status, /help commands for live session control

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wavekat-eason wavekat-eason deleted the feat/qwen3-tts-1.7b-int4 branch April 7, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant