Top
Best
New

Posted by simonw 13 hours ago

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7(simonwillison.net)
354 points | 76 commentspage 2
atonse 4 hours ago|
Wonder what would happen if we unleashed Karpathy’s autoresearch on the pelican bicycle test. And had it read back the image to judge it.

Oh maybe it might continue to iterate on the existing drawing?

quux 5 hours ago||
This is a useless benchmark now a days, every model provider trains their models on making good pelicans. Some have even trained every combination of animal/mode of transportation
henry2023 4 hours ago|
Every model provider except OpenAI?
aliljet 10 hours ago||
I'm really curious about what competes with Claude Code to drive a local LLM like Qwen 3.6?
chabes 8 hours ago||
OpenCode or Pi are popular agent harnesses. Lots of IDEs integrate LLMs now. I believe there’s also a Qwen Code that exists, but I have yet to try it.
smashed 10 hours ago||
OpenCode?
comandillos 11 hours ago||
I've been using Qwen3.5-35B-A3B for a bit via open code and oMLX on M5 Max with 128Gb of RAM and I have to say it's impressively good for a model of that size. I've seen a huge jump in the quality of the tool calls and how well it handles the agentic workflow.
iib 11 hours ago|
This is about the newly release Qwen3.6. Just wanted to make sure you got that correctly.
bottlepalm 9 hours ago||
I really wish they spent some time training for computer use. This model is incapable of finding anywhere near the correct x,y coordinate of a simple object in a picture.
Havoc 6 hours ago||
Between the legs and the beak I'd still rate the opus pelican higher
justinbaker84 9 hours ago||
I love this benchmark!
lofaszvanitt 10 hours ago||
That Qwen flamingo on the unicycle is actually quite good. A work of art.
kburman 7 hours ago||
looks like opus have been nerfed from day1
yieldcrv 8 hours ago|
All those models that were just at version 1.x in 2024

That’s so wild

More comments...