Top
Best
New

Posted by simonw 7 hours ago

Something is afoot in the land of Qwen(simonwillison.net)
401 points | 198 commentspage 2
lacoolj 3 hours ago|
I wonder if an american company poached one/all of them. They've been pretty much bleeding edge of open models and would not surprise me if Amazon or Google snatched them up
ferfumarma 3 hours ago|
It would surprise me if they're willing to come to the US in the setting of the current DHS and ICE situation.
ilaksh 6 hours ago||
Does anyone know when the small Qwen 3.5 models are going to be on OpenRouter?
armanj 5 hours ago|
they're already there ?? https://openrouter.ai/qwen/qwen3.5-27b
yorwba 5 hours ago|||
There are smaller ones on HuggingFace https://huggingface.co/models?other=qwen3_5&sort=least_param... with 0.8B, 2B, 4B and 9B parameters.
ilaksh 5 hours ago|||
Like 4B, 2B, 9B. Supposedly they are surprisingly smart.
Sakthimm 4 hours ago||
Yep. The 9B has excellent image recognition. I showed it a PCB photo and it correctly identified all components and the board type from part numbers and shape. OCR quality was solid. Tool calling with opencode worked without issues, but general coding ability is still far from sonnet-tier. Asked it to add a feature to an existing react app, it couldn't produce an error-free build and fell into a delete-redo loop. Even when I fixed the errors, the UI looked really bad. A more explicit prompt probably would have helped. Opus one-shotted it, same prompt, the component looked exactly as expected.

But I'll be running this locally for note summarization, code review, and OCR. Very coherent for its size.

nurettin 3 hours ago||
I am singularly impressed by 35B/A3, hope that is not the reason he had to leave.
ChrisArchitect 6 hours ago||
More discussion:

https://news.ycombinator.com/item?id=47246746

w10-1 3 hours ago||
It sounds like the lead was demoted to attract new talent, quit as a result, and the rest of the team also resigned to force management to change their minds.

If so, I'm happy that the team held together, and I hope that endogenous tech leads get to control their own career and tech destiny after hard work leads to great products. (It's almost as inspiring as tank man, and the tank commanders who tried to avoid harming him...)

(ducking the downvote for challenging the primacy of equity...)

hwers 6 hours ago||
My conspiracy theory hat is that somehow investors with a stake in openai as well is sabotaging, like they did when kicking emad out of stabilityai
storus 5 hours ago||
More likely some high ranking party member's nepobaby from Gemini sniffed success with Qwen and the original folks just walked away as their reward disappeared.
ahmadyan 3 hours ago||
source?
WarmWash 3 hours ago||
There is no source. But the party in China does have ultimate control.

There would never be an Anthropic/Pentagon situation in China, because in China there isn't actually separation between the military and any given AI company. The party is fully in control.

liuliu 5 hours ago||
apples v.s. oranges. The later is true, Emad did get sabotaged (for not being able to raise money in time, about 8-month before he's leaving). Junyang didn't have that long arc of incidents.
raffael_de 6 hours ago||
> me stepping down. bye my beloved qwen.

the qwen is dead, long live the qwen.

vonneumannstan 6 hours ago||
Were they kneecapped by Anthropic blocking their distillation attempts?
zozbot234 3 hours ago|
What Anthropic was complaining about is training on mass-elicited chat logs. It is very much a ToS violation (you aren't allowed to exploit the service for the purpose of building a competitor) so the complaint is well-founded but (1) it's not "distillation" properly understood; it can only feasibly extract the same kind of narrow knowledge you'd read out from chat logs, perhaps including primitive "let's think step by step" output (which are not true fine-tuned reasoning tokens); because you have no access to the actual weights; and (2) it's something Western AI firms are very much believed to do to one another and to Chinese models all the time anyway. Hence the brouhaha about Western models claiming to be DeepSeek when they answer in Chinese.
red2awn 3 hours ago||
The "distillation attacks" are mostly using Claude as LLM-as-a-judge. They are not training on the reasoning chains in a SFT fashion.
zozbot234 3 hours ago||
So they're paying expensive input tokens to extract at best a tiny amount of information ("judgment") per request? That's even less like "distillation" than the other claim of them trying to figure out reasoning by asking the model to think step by step.
red2awn 25 minutes ago||
LLM-as-a-judge is quite effective method to RL a model, similar to RLHF but more objective and scalable. But yes, anthropic is making it more serious than it is. Plus DeepSeek only did it for 125k requests, significantly less than the other labs, but Anthropic still listed them first to create FUD.
kartika848484 4 hours ago||
what the hell, their models were promising tho
aplomb1026 4 hours ago|
[dead]
More comments...