Posted by timhigins 17 hours ago
Seems like a really good model to use in an IDE when you still want control over the code structure then.
> these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios.
These kinds of models might be more useful as tools to be used by larger orchestrator models, than being the orchestrators themselves.
Here's what I got
https://9ol.es/tmp/pelican.png
with https://9ol.es/tmp/prompt_pelican.txt
using prithivMLmods/VibeThinker-3B-GGUF:Q4_K_M
It would look really dumb if someone asked it that, but that's fine. You're trying to make a model that is optimized for efficiency for a specific task. As much as possible, you should prune uncorrelated things.
Assuming tool calling hasn’t been entirely stripped out of this model.
(Edit) No tool calling, per this comment: https://news.ycombinator.com/item?id=48640189
VibeThinker-3B is developed through a staged post-training pipeline built upon Qwen2.5-Coder-3B base, a compact 3B foundation model.
Qwen2.5 is ancient by LLM standards.Prompt: If A goes to B who then goes to C, can A send something to C?
Response:
We need to interpret best. The phrase "If A goes to B who then goes to C, can A send something to C?" could be a puzzle about the concept of sending something (like passing a ball) and the relationships.
Scenario: A gives something to B, and B passes it on to C. Question: Can A also give the same thing to C? Answer: Only if A can obtain a second copy (e.g., the thing was duplicated). Otherwise, after handing it to B, A no longer holds it and cannot “send” it unless a copy exists.
[Lots of other unnecessary commentary and "scenarios" that make even lesser sense]
My hunch is that Opus scale models probably have shortcuts encoded into the model that handle these ambiguities cases, wheres this model has learned a program to reason through the edge case (crystalized vs fluid intelligence). Remembering that probablity (frontier) vs calculating it on the fly (vibethink)
> [...]
> LLM-based Query Quality Filtering. We utilize capable LLMs to assess query quality, filtering out samples with incomplete descriptions, unreasonable conditions, invalid logic, or an inability to effectively assess target knowledge points.
So who has suggestions on small models with excellent tool calling capabilities?
A alot randomness in it
Please don't hype