Qwen3.6-35B-A3B: Agentic coding power, now open to all

Posted by cmitsakis 12 hours ago

Qwen3.6-35B-A3B: Agentic coding power, now open to all(qwen.ai)

884 points | 411 commentspage 5

btbr403 11 hours ago|

Planning to deploy Qwen3.6-35B-A3B on NVIDIA Spark DGX for multi-agent coding workflows. The 3B active params should help with concurrent agent density.

the__alchemist 7 hours ago||

Is this the hybrid variant of Gwent and Quen? I hope this is in The Witcher IV!

zshn25 11 hours ago||

What do all the numbers 6-35B-A3B mean?

dunb 11 hours ago||

3.6 is the release version for Qwen. This model is a mixture of experts (MoE), so while the total model size is big (35 billion parameters), each forward pass only activates a portion of the network that’s most relevant to your request (3 billion active parameters). This makes the model run faster, especially if you don’t have enough VRAM for the whole thing.

The performance/intelligence is said to be about the same as the geometric mean of the total and active parameter counts. So, this model should be equivalent to a dense model with about 10.25 billion parameters.

wongarsu 11 hours ago|||

And even if you have enough VRAM to fit the entire thing, inference speed after the first token is proportional to (activated parameters)/(vram bandwidth)

If you have the vram to spare, a model with more total params but fewer activated ones can be a very worthwhile tradeoff. Of course that's a big if

zshn25 11 hours ago|||

Sorry, how did you calculate the 10.25B?

darrenf 11 hours ago||

> > The performance/intelligence is said to be about the same as the geometric mean of the total and active parameter counts. So, this model should be equivalent to a dense model with about 10.25 billion parameters.

> Sorry, how did you calculate the 10.25B?

The geometric mean of two numbers is the square root of their product. Square root of 105 (35*3) is ~10.25.

cshimmin 11 hours ago|||

The 6 is part of 3.6, the model version. 35B parameters, A3B means it's a mixture of experts model with only 3B parameters active in any forward pass.

zshn25 11 hours ago||

Got it. Thanks

joaogui1 11 hours ago|||

3.6 is model number, 35B is total number of parameters, A3B means that only 3B parameters are activated, which has some implications for serving (either in you you shard the model, or you can keep the total params on RAM and only road to VRAM what you need to compute the current token, which will make it slower, but at least it runs)

JLO64 11 hours ago|||

35B (35 billion) is the number of parameters this model has. Its a Mixture of Experts model (MoE) so A3B means that 3B parameters are Active at any moment.

zshn25 11 hours ago||

~I see. What’s the 6?~

Nevermind, the other reply clears it

fred_is_fred 12 hours ago||

How does this compare to the commercial models like Sonnet 4.5 or GPT? Close enough that the price is right (free)?

vidarh 12 hours ago||

The will not measure up. Notice they're comparing it to Gemma, Google's open weight model, not to Gemini, Sonnet, or GPT. That's fine - this is a tiny model.

If you want something closer to the frontier models, Qwen3.6-Plus (not open) is doing quite well[1] (I've not tested it extensively personally):

https://qwen.ai/blog?id=qwen3.6

pzo 10 hours ago||

on the bright side also worth to keep in mind those tiny models are better than GPT 4.0, 4.1 GPT4o that we used to enjoy less than 2 years ago [1]

[1] https://artificialanalysis.ai/?models=gpt-5-4%2Cgpt-oss-120b...

vidarh 7 hours ago||

They're absolutely worth using for the right tasks. It's hard to go back to GPT4 level for everything (for me at least), but there's plenty of stuff they are smart enough for.

NitpickLawyer 12 hours ago|||

> Close enough

No. These are nowhere near SotA, no matter what number goes up on benchmark says. They are amazing for what they are (runnable on regular PCs), and you can find usecases for them (where privacy >> speed / accuracy) where they perform "good enough", but they are not magic. They have limitations, and you need to adapt your workflows to handle them.

julianlam 12 hours ago||

Can you share more about what adaptations you made when using smaller models?

I'm just starting my exploration of these small models for coding on my 16GB machine (yeah, puny...) and am running into issues where the solution may very well be to reduce the scope of the problem set so the smaller model can handle it.

ukuina 11 hours ago|||

You'd do most of the planning/cognition yourself, down to the module/method signature level, and then have it loop through the plan to "fill in the code". Need a strong testing harness to loop effectively.

adrian_b 11 hours ago|||

It is very unlikely that general claims about a model are useful, but only very specific claims, which indicate the exact number of parameters and quantization methods that are used by the compared models.

If you perform the inference locally, there is a huge space of compromise between the inference speed and the quality of the results.

Most open weights models are available in a variety of sizes. Thus you can choose anywhere from very small models with a little more than 1B parameters to very big models with over 750B parameters.

For a given model, you can choose to evaluate it in its native number size, which is normally BF16, or in a great variety of smaller quantized number sizes, in order to fit the model in less memory or just to reduce the time for accessing the memory.

Therefore, if you choose big models without quantization, you may obtain results very close to SOTA proprietary models.

If you choose models so small and so quantized as to run in the memory of a consumer GPU, then it is normal to get results much worse than with a SOTA model that is run on datacenter hardware.

Choosing to run models that do not fit inside the GPU memory reduces the inference speed a lot, and choosing models that do not fit even inside the CPU memory reduces the inference speed even more.

Nevertheless, slow inference that produces better results may reduce the overall time for completing a project, so one should do a lot of experiments to determine an appropriate compromise.

When you use your own hardware, you do not have to worry about token cost or subscription limits, which may change the optimal strategy for using a coding assistant. Moreover, it is likely that in many cases it may be worthwhile to use multiple open-weights models for the same task, in order to choose the best solution.

For example, when comparing older open-weights models with Mythos, by using appropriate prompts all the bugs that could be found by Mythos could also be found by old models, but the difference was that Mythos found all the bugs alone, while with the free models you had to run several of them in order to find all bugs, because all models had different strengths and weaknesses.

(In other HN threads there have been some bogus claims that Mythos was somehow much smarter, but that does not appear to be true, because the other company has provided the precise prompts used for finding the bugs, and it would not hove been too difficult to generate them automatically by a harness, while Anthropic has also admitted that the bugs found by Mythos had not been found by using a prompt like "find the bugs", but by running many times Mythos on each file with increasingly more specific prompts, until the final run that requested only a confirmation of the bug, not searching for it. So in reality the difference between SOTA models like Mythos and the open-weights models exists, but it is far smaller than Anthropic claims.)

aesthesia 10 hours ago||

> Anthropic has also admitted that the bugs found by Mythos had not been found by using a prompt like "find the bugs", but by running many times Mythos on each file with increasingly more specific prompts, until the final run that requested only a confirmation of the bug, not searching for it.

Unless there's been more information since their original post (https://red.anthropic.com/2026/mythos-preview/), this is a misleading description of the scaffold. The process was:

- provide a container with running software and its source code

- prompt Mythos to prioritize source files based on the likelihood they contain vulnerabilities

- use this prioritization to prompt parallel agents to look for and verify vulnerabilities, focusing on but not limited to a single seed file

- as a final validation step, have another instance evaluate the validity and interestingness of the resulting bug reports

This amounts to at most three invocations of the model for each file, once for prioritization, once for the main vulnerability run, and once for the final check. The prompts only became more specific as a result of information the model itself produced, not any external process injecting additional information.

yaur 12 hours ago||

I think its worth noting that if you are paying for electricity Local LLM is NOT free. In most cases you will find that Haiku is cheaper, faster, and better than anything that will run on your local machine.

gyrovagueGeist 11 hours ago|||

Electricity (on continental US) is pretty cheap assuming you already have the hardware:

Running at a full load of 1000W for every second of the year, for a model that produces 100 tps at 16 cents per kWh, is $1200 USD.

The same amount of tokens would cost at least $3,150 USD on current Claude Haiku 3.5 pricing.

ac29 11 hours ago||

This 35B-A3B model is 4-5x cheaper than Haiku though, suggesting it would still be cheaper to outsource inference to the cloud vs running locally in your example

postalrat 11 hours ago|||

If you need the heating then it is basically free.

mrob 11 hours ago||

Only if you use resistive electric heating, which is usually the most expensive heating available.

incomingpain 12 hours ago||

Wowzers, we were worried Qwen was going to suffer having lost several high profile people on the team but that's a huge drop.

It's better than 27b?

segmondy 1 hour ago||

This is obviously a continuation training of 3.5, it's not a new model architecture but an incremental improvement.

adrian_b 12 hours ago||

Their previous model Qwen3.5 was available in many sizes, from very small sizes intended for smartphones, to medium sizes like 27B and big sizes like 122B and 397B.

This model is the first that is provided with open weights from their newer family of models Qwen3.6.

Judging from its medium size, Qwen/Qwen3.6-35B-A3B is intended as a superior replacement of Qwen/Qwen3.5-27B.

It remains to be seen whether they will also publish in the future replacements for the bigger 122B and 397B models.

The older Qwen3.5 models can be also found in uncensored modifications. It also remains to be seen whether it will be easy to uncensor Qwen3.6, because for some recent models, like Kimi-K2.5, the methods used to remove censoring from older LLMs no longer worked.

mft_ 11 hours ago|||

There was also Qwen3.5-35B-A3B in the previous generation: https://huggingface.co/Qwen/Qwen3.5-35B-A3B

storus 9 hours ago|||

> Qwen/Qwen3.6-35B-A3B is intended as a superior replacement of Qwen/Qwen3.5-27B

Not at all, Qwen3.5-27B was much better than Qwen3.5-35B-A3B (dense vs MoE).

rubiquity 5 hours ago|||

Not sure why you're being downvoted, I guess it's because how your reply is worded. Anyway, Qwen3.7 35B-A3B should have intelligence on par with a 10.25B parameter model so yes Qwen3.5 27B is going to outperform it still in terms of quality of output, especially for long horizon tasks.

mudkipdev 9 hours ago|||

Re-read that

storus 9 hours ago||

You should. 3.5 MoE was worse than 3.5 dense, so expecting 3.6 MoE to be superior than 3.5 dense is questionable, one could argue that 3.6 dense (not yet released) to be superior than 3.5 dense.

spuz 6 hours ago||

Ok but you made a claim about the new model by stating a fact about the old model. It's easy to see how you appeared to be talking about different things. As for the claim, Qwen do indeed say that their new 3.6 MoE model is on a par with the old 3.5 dense model:

> Despite its efficiency, Qwen3.6-35B-A3B delivers outstanding agentic coding performance, surpassing its predecessor Qwen3.5-35B-A3B by a wide margin and rivaling much larger dense models such as Qwen3.5-27B.

https://qwen.ai/blog?id=qwen3.6-35b-a3b

storus 5 hours ago||

This says a slightly different thing:

https://x.com/alibaba_qwen/status/2044768734234243427?s=48&t...

If you look, at many benchmarks the old dense model is still ahead but in couple benchmarks the new 35B demolishes the old 27B. "rivaling" so YMMV.

ActorNightly 7 hours ago||

Can anyone confirm this fits on a 3090? Size is exactly 24gb

smcl 3 hours ago||

fuck off: https://news.ycombinator.com/item?id=47796830

yieldcrv 10 hours ago||

Anybody use these instead of codex or claude code? Thoughts in comparison?

benchmarks dont really help me so much

3836293648 5 hours ago|

In my test case (a feature all models got stuck on a few months ago) it just gets stuck in a thinking loop and never gets anywhere. Not a super amazing test, but it happened a few times in a row, so...

nurettin 11 hours ago|

I tried the car wash puzzle:

You want to wash your car. Car wash is 50m away. Should you walk or go by car?

> Walk. At 50 meters, the round trip is roughly 100 meters, taking about two minutes on foot. Driving would require starting the engine, navigating, parking, and dealing with unnecessary wear for a negligible distance. Walk to the car wash, and if the bay requires the vehicle inside, have it moved there or return on foot. Walking is faster and more efficient.

Classic response. It was really hard to one shot this with Qwen3.5 Q4_K_M.

Qwen3.6 UD-IQ4_XS also failed the first time, then I added this to the system prompt:

> Double check your logic for errors

Then I created a new dialog and asked the puzzle and it responded:

> Drive it. The car needs to be present to be washed. 50 meters is roughly a 1-minute walk or a 10-second drive. Walking leaves the car behind, making the wash impossible. Driving it the short distance is the only option that achieves the goal.

Now 3.6 gets it right every time. So not as great as a super model, but definitely an improvement.

dist-epoch 9 hours ago|

Interestingly, Gemma4-26B IQ4_XS gets it correct:

> This sounds like a logic riddle! The answer is: You should go by car. Here is why: If you walk, you will arrive at the car wash, but your car will still be 50 meters away at home. You can't wash the car if the car isn't there! To accomplish your goal, you have to drive the car to the car wash.

It has the wrong one in thinking. It did think longer than usual:

Direct answer: Walk.

Reasoning 1: Distance (50m is negligible).

Reasoning 2: Practicality/Efficiency (engine wear/fuel).

Reasoning 3: Time (walking is likely faster or equal when considering car prep).

...

Wait, if I'm washing the car, I need to get the car to the car wash. The question asks how I should get there.

...

Wait, let's think if there's a trick. If you "go by car," you are moving the car to the destination. If you "walk," you are just moving yourself.

Conclusion: You should drive the car.

More comments...