🎯 Marco-o1 not only focuses on subjects with standard answers, such as mathematics, physics, and coding that are highly suitable for the use of Reinforcement Learning, but we also emphasize some open-ended solutions. Our goal is to build a general model applicable to agentic, incorporating comprehensive planning capabilities and function call abilities.
-
[Coming Soon] 🏃 Marco-o1 Agentic: A more powerful agentic model is coming soon ... ...
-
[2025/02/09] 🔥 EDPO (Difficulty-Estimated Policy Optimization): We proposed an optimization algorithm based on an online data difficulty selector. To our knowledge, this is the first work on online data selection. Experiments show that compared with GRPO, we can better resist the noise interference caused by Zero Advantage, achieving an average performance improvement of 2.4%. At the same time, this online selector can also provide multi-scale routing based on prompt difficulty in large-scale online services.
-
[2025/02/09] 🔥 The paper A State-Transition Framework for Efficient LLM Reasoning has been accepted by ICLR 2026.
-
[2025/02/09] 🔥 We released Marco-o1 v3. By training a pluggable Linear component MAM (Mixed Attention Module) on the existing dense model, we were able to dynamically compress the model to save context tokens. At the same time, we introduced TTT (Test-Time Training), and ultimately we achieved a 20% reduction in inference cost while obtaining an average performance improvement of 4.7%.
-
[2025/05/15] 🔥 The paper Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models has been accepted by ACL 2025.
-
[2025/02/14] 🔥 We released Marco-o1 v2, entirely relies on self-built data and has undergone DPO. It has been optimized more comprehensively for mathematical problem-solving、planning and instruction-following capabilities. 🍬 This time, our model's ability in counting letters is quite impressive! 😁
-
[2024/11/13] 🔥 We released Marco-o1 v1, towards open reasoning models for open-ended solutions. This includes our reasoning model, optimized for complex problem-solving and versatile applications across various domains.
To install Marco-o1, follow these steps:
# Clone the repository
git clone https://github.com/AIDC-AI/Marco-o1
# Change to the Macaw-LLM directory
cd Marco-o1
# Install required packages
pip install -r requirements.txt
-
Load Marco-o1-CoT model:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1") model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1") -
Inference:
Execute the inference script (you can give any customized inputs inside):
./src/output/talk_with_model.py # Use vLLM ./src/output/talk_with_model_vllm.py -
Deploy using FastAPI:
Check the README.md file in examples folder.
From MarcoPolo Team, AI Business, Alibaba International Digital Commerce:
If you find Marco-o1 useful for your research and applications, please cite:
@misc{zhao2024marcoo1openreasoningmodels,
title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions},
author={Yu Zhao and Huifeng Yin and Bo Zeng and Hao Wang and Tianqi Shi and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang},
year={2024},
eprint={2411.14405},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.14405},
}
@misc{yin2025wideningdistillationbottleneckreasoning,
title={Marco o1 v2:Towards Widening The Distillation Bottleneck for Reasoning Models},
author={Huifeng Yin and Yu Zhao and Minghao Wu and Xuanfan Ni and Bo Zeng and Hao Wang and Tianqi Shi and Liangying Shao and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang},
year={2025},
eprint={2503.01461},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.01461},
}
@misc{zhang2026statetransitionframeworkefficientllm,
title={A State-Transition Framework for Efficient LLM Reasoning},
author={Liang Zhang and Yu Zhao and Longyue Wang and Tianqi Shi and Weihua Luo and Kaifu Zhang and Jinsong Su},
year={2026},
eprint={2602.01198},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.01198},
}
@misc{zhao2026difficultyestimatedpolicyoptimization,
title={Difficulty-Estimated Policy Optimization},
author={Yu Zhao and Fan Jiang and Tianle Liu and Bo Zeng and Yu Liu and Longyue Wang and Weihua Luo},
year={2026},
eprint={2602.06375},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.06375},
}
This project is licensed under Apache License Version 2 (SPDX-License-identifier: Apache-2.0).
We used compliance checking algorithms during the training process, to ensure the compliance of the trained model and dataset to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.

