Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the forward method in src/mcore_bridge/patcher.py to include packed_seq_params, which is necessary for handling sequence packing. However, a critical issue was identified where the forward method returns a 3-tuple while the caller expects a single tensor, which will lead to a runtime error.
| input_ids=input_ids, | ||
| position_ids=position_ids, | ||
| embedding=embedding, | ||
| packed_seq_params=packed_seq_params, |
There was a problem hiding this comment.
The addition of packed_seq_params to the _get_embeddings call is correct and necessary for MTP to properly handle sequence packing (THD format). This ensures that the internal roll_tensor call correctly handles sequence boundaries.
Critical Issue: Note that this patched forward method returns a 3-tuple (hidden_states, input_ids, position_ids) at line 470. However, the caller in GPTModel._postprocess (at src/mcore_bridge/model/gpt_model.py:398) expects a single return value: hidden_states = self.mtp(...). This mismatch will cause a TypeError or ValueError in subsequent operations (like torch.chunk at line 413 of gpt_model.py) because hidden_states will be a tuple instead of a tensor.
Since the caller handles its own label/mask shifting and logging, you should consider updating the return statement at line 470 to only return hidden_states to maintain compatibility with the standard Megatron-Core API and the existing caller logic.
No description provided.