feat: add use_flex_decoding parameter for non-CUDA devices by Mr-Neutr0n · Pull Request #326 · vikhyat/moondream

Mr-Neutr0n · 2026-02-06T18:13:51Z

Summary

Add a use_flex_decoding constructor parameter to MoondreamModel and HfConfig to allow users to disable flex decoding on non-CUDA devices (e.g., MPS on Mac).

Problem

Moondream 3 uses flex decoding which requires CUDA. Users on Mac with MPS cannot run the model because create_block_mask is CUDA-only. Currently, users have to manually patch the source code to set use_flex_decoding = False.

Solution

Add a use_flex_decoding parameter that defaults to True (backward compatible) but can be set to False for non-CUDA devices.

Changes

moondream/torch/moondream.py: Add use_flex_decoding parameter to MoondreamModel.__init__
moondream/torch/hf_moondream.py: Add use_flex_decoding to HfConfig and pass through to MoondreamModel

Usage

# For HuggingFace loading on Mac with MPS
from transformers import AutoModelForCausalLM, AutoConfig

config = AutoConfig.from_pretrained('moondream/moondream3', use_flex_decoding=False)
model = AutoModelForCausalLM.from_pretrained('moondream/moondream3', config=config)
model.to('mps')

Test plan

Load model with use_flex_decoding=True on CUDA (default behavior unchanged)
Load model with use_flex_decoding=False on MPS (Mac)
Verify inference works correctly with flex decoding disabled

Fixes #316

Add a `use_flex_decoding` constructor parameter to allow users to disable flex decoding on non-CUDA devices (e.g., MPS on Mac). Changes: - MoondreamModel: Add `use_flex_decoding` parameter (default: True) - HfConfig: Add `use_flex_decoding` config option - HfMoondream: Pass through `use_flex_decoding` to MoondreamModel Usage: ```python # For HuggingFace loading on Mac with MPS from transformers import AutoModelForCausalLM, AutoConfig config = AutoConfig.from_pretrained('moondream/moondream3', use_flex_decoding=False) model = AutoModelForCausalLM.from_pretrained('moondream/moondream3', config=config) ``` Fixes vikhyat#316

Mr-Neutr0n · 2026-02-08T11:46:30Z

Friendly follow-up - is there anything I can improve in this PR? Happy to address any feedback!

Mr-Neutr0n · 2026-02-12T18:10:39Z

Friendly bump! Let me know if there's anything I should update or improve to help move this forward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add use_flex_decoding parameter for non-CUDA devices#326

feat: add use_flex_decoding parameter for non-CUDA devices#326
Mr-Neutr0n wants to merge 1 commit intovikhyat:mainfrom
Mr-Neutr0n:fix/flex-decoding-param

Mr-Neutr0n commented Feb 6, 2026

Uh oh!

Mr-Neutr0n commented Feb 8, 2026

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mr-Neutr0n commented Feb 6, 2026

Summary

Problem

Solution

Changes

Usage

Test plan

Uh oh!

Mr-Neutr0n commented Feb 8, 2026

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant