Top
Best
New

Posted by pretext 7 hours ago

Qwen3.6-Plus: Towards real world agents(qwen.ai)
320 points | 108 commentspage 2
giancarlostoro 6 hours ago|
I hope their open source variants are just as good, having a 1 million token window for a fully offline model would be VERY interesting.
sosodev 5 hours ago|
I don't know how well it performs, but you can extend Qwen3.5 to 1 million token context using YaRN. Also, Nemotron 3 Super was recently released and scales up to 1 million token context natively.
Caum 6 hours ago||
The agent benchmarks here are interesting but I'd love to see how Qwen3.6-Plus handles long-horizon tasks where it needs to recover from its own mistakes. Most agent evals test the happy path. The hard part is when the model takes a wrong action at step 3 and needs to recognize and backtrack at step 15. Has anyone stress-tested this in a real dev workflow?
wg0 5 hours ago||
It hallucinates a lot more then Sonnet or even MiniMax M2.5. Especially in tool calls, it would end up duplicating the content in code files and then realising later and getting stuck in a loop.
justinclift 11 minutes ago||
> It hallucinates a lot more then Sonnet or even MiniMax M2.5.

Ugh, that's not good.

I evaluated Kimi K2 a while back for some text understanding -> summarisation tasks, and of the 100 tasks it hallucinated about 30% of the output. :( :( :(

noelsusman 1 hour ago||
My initial experiments are not encouraging. I have a basic planning prompt that includes instructions not to edit any files or implement anything. Qwen-3.6-Plus will consistently ignore that completely and proceed with implementation. I expect that kind of behavior from small models I run locally, not a hosted closed model claiming to compete with the frontier models.
throwaw12 5 hours ago||
I would love to hear from people using both (Claude Code OR Codex) AND (Qwen) and their experience with Qwen models, are they on par, or how far are they?
scottcha 5 hours ago|
I switch between Claude Code (Opus/Sonnet) and Qwen (OpenCode, OpenClaw) multiple times throughout the day and Qwen 3.5 is really nice. I do also use KimiK2.5 and GLM5 pretty often too and I'm starting to get a sense that the agent tool is becoming a little more important than the model with these level of models. As long as tool calling and prompt quality is all configured correctly by the provider.
danelliot 3 minutes ago||
[dead]
zkmon 5 hours ago||
It is no longer available on OpenRouter. They say "going away on 3-March", but it's already gone!
wolvoleo 4 hours ago||
Nice, I hope there will also come a small open version of it.
Art9681 6 hours ago||
How convenient of them to compare themselves to the last generation Opus and GPT models to make their model look better than it really is.
MarsIronPI 6 hours ago||
It's not open weights so I'm not interested.
esafak 6 hours ago||
Does anyone have experience with Alibaba's coding plan? Not that I'm very tempted at $50/month...
usagisushi 3 hours ago|
A bit off-topic but I’m on the legacy Lite plan (now discontinued), and it’s more than enough for hobby projects. The main draw is the generous request-based quota (18k requests/month) rather than a token-based one.

This means a 100k token request counts the same as a 100-token one. I’ve made about 8000 requests in the last two weeks, averaging around 80k tokens per request. It feels like they’re subsidizing this just to gather data on agentic workflows.

On the downside, the speed is mediocre (15–30 tg/s for GLM-5), and I’ve seen the model glitch or produce broken output about 10 times out of those 8k requests.

eis 6 hours ago|
Quite strong results in the benchmarks but why Gemini 3 Pro instead of 3.1? Why only for a few of the benchmarks? Why is OpenAI not there in the coding benchmarks? Why Opus 4.5 and not 4.6? Just jumps out into my eye as a bit strange.

As always, we'll have to try and see how it performs in the real world but the open weight models of Qwen were pretty decent for some tasks so still excited to see what this brings.

More comments...