no-sft

Here is 1 public repository matching this topic...

beingdutta / R1-0-Style-Training-for-Small-Generative-Models

We build and test Quantitative Reasoning abilities in Small generative models skipping the SFT phase, and directly went with RL phase for building reasoning knowledge without data supervision.

reinforcement-learning chain-of-thought vision-language-models qwen2-vl deepseek-r1 grpo deepseek-r1-zero smolvlm no-sft

Updated Jan 6, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the no-sft topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the no-sft topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no-sft

Here is 1 public repository matching this topic...

beingdutta / R1-0-Style-Training-for-Small-Generative-Models

Improve this page

Add this topic to your repo