-
Notifications
You must be signed in to change notification settings - Fork 0
Question about depth reward #2
Copy link
Copy link
Open
Description
Hi, thank you for open-sourcing this great work. I have a quick clarification about the GRPO-SIF setup. In the current prompt, the model is asked to output interleaved and , but it does not seem to explicitly require a "depth" field inside each JSON item. Meanwhile, the depth_consistency reward appears to depend on parsing that depth value. Could this mismatch be the reason why depth reward is often 0? I would really appreciate your guidance on whether this is expected or if the prompt should explicitly enforce depth output.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels