VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

t_e_s_t 15 hours ago|

[flagged]

t_e_s_t 15 hours ago|

[flagged]

maxignol 15 hours ago||

3B param on par with opus 4.5 sounds interesting. Will read the full article before making my mind

zkmon 17 hours ago||

Does python coding depend on political facts of the world?

It might appear not, but actually, the process of reasoning is not an isolated act. The right and wrong way of doing things is codified in social evolution that absorbed all facets of life. Why should you optimize a piece of code for performance? Why performance is needed? What is a bug? What features and UI themes would be more intuitive for humans?

There is a butterfly effect. Everything affects everything to some extent.

CamperBob2 1 hour ago||

True, but this model provides something of a lower bound on just how much world knowledge is really needed for unrelated reasoning tasks. That lower bound appears to be quite low indeed. Lower than I thought it would turn out to be.

This thing is just bonkers.

spacebacon 16 hours ago||

[dead]

kmchandy 9 hours ago|

The paper makes a clear claim: "it provides an important and concrete proof: on well-constrained, verifiable reasoning tasks, first-tier performance is no longer the exclusive domain of ultra-large models" And that's exciting.