This paper studies prompt robustness and ambiguity handling for small instruction-tuned LLMs (Qwen2.5-1.5B/3B) in educational tutoring. It evaluates corruption-augmented supervised fine-tuning on GSM8K and DPO in two roles: i augmenting robustness for math reasoning under noisy prompts, ii inducing clarification-seeking behavior on ambiguous prompt
-
Updated
Mar 27, 2026 - Python