Skip to content

Support MXINT4 scheme#1666

Open
mengniwang95 wants to merge 7 commits intomainfrom
mengni/mx_int4
Open

Support MXINT4 scheme#1666
mengniwang95 wants to merge 7 commits intomainfrom
mengni/mx_int4

Conversation

@mengniwang95
Copy link
Copy Markdown
Contributor

Description

Support MXINT4 scheme

How to use:

model quantization:

CUDA_VISIBLE_DEVICES=0 auto-round --model /models/Llama-3.2-3B/ --scheme MXINT4 --iters 0 --format auto_round

inference with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "tmp_autoround/Llama-3.2-3B-mxint-w4g32/"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to(
    model.device
)
output = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output[0]))

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

mengniwang95 and others added 5 commits April 7, 2026 17:33
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant