Hi, I’ve successfully run the InteractAgent pipeline with a TRUMANS scene image. The motion quality is great, but I have two questions regarding scene-aware deployment:
- Initial Position Alignment: The generated motions always start from the random origin. How can I correctly align the character's starting point with a specific semantic location in the scene (e.g., "on the sofa")? Is there a script to map the LLM's spatial commands to global scene coordinates?
- Visual Feedback Implementation: In the paper, the reflection process uses a "ghost image" or video that overlays the trajectory on the scene. Currently, the code seems to generate motions and scenes separately. Could you provide the implementation or guidance for rendering this composite visual feedback for the MLLM to "see"?
Thanks for the great work!