feat: add use_flex_decoding parameter for non-CUDA devices#326
Open
Mr-Neutr0n wants to merge 1 commit intovikhyat:mainfrom
Open
feat: add use_flex_decoding parameter for non-CUDA devices#326Mr-Neutr0n wants to merge 1 commit intovikhyat:mainfrom
Mr-Neutr0n wants to merge 1 commit intovikhyat:mainfrom
Conversation
Add a `use_flex_decoding` constructor parameter to allow users to disable
flex decoding on non-CUDA devices (e.g., MPS on Mac).
Changes:
- MoondreamModel: Add `use_flex_decoding` parameter (default: True)
- HfConfig: Add `use_flex_decoding` config option
- HfMoondream: Pass through `use_flex_decoding` to MoondreamModel
Usage:
```python
# For HuggingFace loading on Mac with MPS
from transformers import AutoModelForCausalLM, AutoConfig
config = AutoConfig.from_pretrained('moondream/moondream3', use_flex_decoding=False)
model = AutoModelForCausalLM.from_pretrained('moondream/moondream3', config=config)
```
Fixes vikhyat#316
Author
|
Friendly follow-up - is there anything I can improve in this PR? Happy to address any feedback! |
Author
|
Friendly bump! Let me know if there's anything I should update or improve to help move this forward. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a
use_flex_decodingconstructor parameter toMoondreamModelandHfConfigto allow users to disable flex decoding on non-CUDA devices (e.g., MPS on Mac).Problem
Moondream 3 uses flex decoding which requires CUDA. Users on Mac with MPS cannot run the model because
create_block_maskis CUDA-only. Currently, users have to manually patch the source code to setuse_flex_decoding = False.Solution
Add a
use_flex_decodingparameter that defaults toTrue(backward compatible) but can be set toFalsefor non-CUDA devices.Changes
moondream/torch/moondream.py: Adduse_flex_decodingparameter toMoondreamModel.__init__moondream/torch/hf_moondream.py: Adduse_flex_decodingtoHfConfigand pass through toMoondreamModelUsage
Test plan
use_flex_decoding=Trueon CUDA (default behavior unchanged)use_flex_decoding=Falseon MPS (Mac)Fixes #316