VibeThinker-3B: 3B Model Matches Larger Systems on Reasoning Tasks

Original: VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

Why This Matters

Demonstrates small language models can match large systems on reasoning, challenging assumptions about model scaling requirements.

VibeThinker-3B, a 3-billion-parameter model, achieves frontier-level reasoning performance through supervised fine-tuning and reinforcement learning, scoring 94.3 on AIME26 and matching larger flagship models like DeepSeek V3.2 and Gemini 3 Pro.

Researchers introduced VibeThinker-3B, a compact 3-billion-parameter language model designed to explore the limits of verifiable reasoning in small-scale systems. The model uses the Spectrum-to-Signal post-training paradigm combined with curriculum-based supervised fine-tuning (SFT), multi-domain reinforcement learning (GRPO), and offline self-distillation. On the AIME26 benchmark, VibeThinker-3B scored 94.3, improving to 97.1 with claim-level test-time scaling. It achieved 80.2 Pass@1 on LiveCodeBench v6 and demonstrated strong out-of-distribution generalization with 96.1% acceptance rate on unseen LeetCode contests. The model also scored 93.4 on IFEval, confirming it maintains instruction controllability despite reasoning enhancements. Performance matches or exceeds larger systems including DeepSeek V3.2, GLM-5, and Gemini 3 Pro, which have significantly more parameters. The researchers propose the Parametric Compression-Coverage Hypothesis, suggesting verifiable reasoning can be compressed into compact reasoning cores, while open-domain knowledge requires broader parameter coverage. This indicates small models represent a complementary path to frontier performance rather than mere deployment-efficient substitutes.

Source

arxiv.org — Read original →