frankdoer

Frank Du frankdoer

MS student @ NTU. Working on multimodal and speech foundation models. Trying to make machines see, hear, and reason together.

Popular repositories Loading

mm-reason-bench mm-reason-bench Public

Multimodal reasoning benchmark for vision-language-audio models with cross-modal dependency analysis

Python
unified-speech-codec unified-speech-codec Public

Unified speech codec framework with pluggable quantization (RVQ, BSQ) and semantic supervision

Python
av-align av-align Public

Audio-visual alignment and evaluation toolkit for multimodal models

Python
frankdoer frankdoer Public

Profile README
AgentsMesh AgentsMesh Public

Forked from AgentsMesh/AgentsMesh

AI Agents Command Center

Go
WAM-Flow WAM-Flow Public

Forked from fudan-generative-vision/WAM-Flow

[CVPR 2026] WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving

Python