Skip to content

Question about depth reward #2

@Messimanda

Description

@Messimanda

Hi, thank you for open-sourcing this great work. I have a quick clarification about the GRPO-SIF setup. In the current prompt, the model is asked to output interleaved and , but it does not seem to explicitly require a "depth" field inside each JSON item. Meanwhile, the depth_consistency reward appears to depend on parsing that depth value. Could this mismatch be the reason why depth reward is often 0? I would really appreciate your guidance on whether this is expected or if the prompt should explicitly enforce depth output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions