Qwen3-Next - Hacker News

Posted by tosh 1 day ago

548 points | 221 commentspage 2

croemer 1 day ago|

KronisLV 1 day ago||

> The Qwen3-Next-80B-A3B-Instruct performs comparably to our flagship model Qwen3-235B-A22B-Instruct-2507, and shows clear advantages in tasks requiring ultra-long context (up to 256K tokens).

This is pretty impressive and a bit like how the GPT-OSS-120B came out and scored pretty well on the benchmarks despite its somewhat limited size.

That said, using LLMs for software dev use cases, I wouldn't call 256K tokens "ultra-long" context, I regularly go over 100K when working on tasks with bigger scope, e.g.:

  Look at the existing code related to this functionality and the existing design patterns in the code as well as the guidelines.
  Then plan out the implementation in detail and ask me a few questions along the way to figure the details out better.
  Finally, based on everything so far, do the actual implementation.
  Then look it over and tell me if anything has been missed from the plan, then refactor the code in any number of ways.

It could be split up into multiple separate tasks, but I find that the context being more complete (unless the model starts looping garbage, which poisons the context) leads to better results.

My current setup of running Qwen3 Coder 480B on Cerebras bumps into the 131K token limit. If not for the inference speed there (seriously great) and good enough model quality, I'd probably look more in the direction of Gemini or Claude again.

slimebot80 1 day ago||

Complete newbie here - some questions, if I may!

This stuff can run on a local machine without internet access, correct?

And it can pretty much match Nano Banana? https://github.com/PicoTrex/Awesome-Nano-Banana-images/blob/...

Also -- what are the specs for a machine to run it (even if slowly!)

NitpickLawyer 1 day ago||

This model can be run completely offline, yes. You'll need anywhere from 60-200 gb of RAM (either VRAM for high speeds, or a combination of VRAM and RAM, or just CPU+RAM). The active params are really low (3B) so it'll likely run fine even on CPU. Should get 10-15+t/s even on old DDR4 systems. Offload some experts to a GPU (can be as low as 8-16gb) and you'll see greater speeds.

This has nothing to do with nano banana, or image generation. For that you want the qwen image edit[1] models.

1 - https://huggingface.co/Qwen/Qwen-Image-Edit

prawel 1 day ago|||

what you mean is Qwen Image and Qwen Image Edit, you can run it on local machine, using Draw Things application for example.

the model discussed here is text model, so similar to ChatGPT. You can also run it on your local machine, but not yet, as apps need to be updated with Qwen 3 Next support (llama.cpp, Ollama, etc)

dragonwriter 1 day ago|||

> This stuff can run on a local machine without internet access, correct?

Yes.

> And it can pretty much match Nano Banana?

No, Qwen3-Next is not a multimodal model, it has no image generation function.

Davidzheng 1 day ago||

Isn't this one a text model

slimebot80 1 day ago||

Ah, maybe! I am lost reading this page with all the terminology

arcanemachiner 1 day ago||

You'll get used to it.

Make sure to lurk on r/LocalLlama.

diggan 1 day ago||

> Make sure to lurk on r/LocalLlama.

Please do take everything you read there with a bit of salt though, as the "hive-mind" effect is huge there, even when compared to other subreddits.

I'm guessing the huge influx of money + reputations on the line + a high traffic community is ripe for both hive-minding + influence campaigns.

davidpfarrell 21 hours ago||

Hyped for the release, but bummed they fell for the ‘next’ naming convention.

What will the actual next advanced release be called:

* next-next

* next (2)

* actual-next-final

binary132 1 day ago||

I’ve been using gpt-oss-120B with CPU MoE offloading on a 24GB GPU and it’s very usable. Excited to see if I can get good results on this now!

kristopolous 1 day ago||

I was getting a bunch of strange hallucinations and weird dialog. It sounds like some exasperated person on the verge of a mental breakdown

techsystems 1 day ago||

How does the context length scaling at 256K tokens compare to Llama's 1M in terms of performance? How are the contexts treated differently?

jug 22 hours ago||

> The Qwen3-Next-80B-A3B-Instruct performs comparably to our flagship model Qwen3-235B-A22B-Instruct-2507

I'm skeptical about these claims. How can this be? Wouldn't there be massive loss of world knowledge? I'm particularly skeptical because a recent trend in Q2 2025 has been benchmaxxing.

dragonwriter 22 hours ago|

> I'm skeptical about these claims. How can this be?

More efficient architecture.

> Wouldn't there be massive loss of world knowledge?

If you assume equally efficient architecture and no other salient differences, yes, that’s what you’d expect from a smaller model.

jug 21 hours ago||

Hmm. Let's just say if this is true, that this is actually better with such a much lower total parameter count, it's the greatest accomplishment in over a year of LLM development. With the backdrop of bechmaxxing in 2025, I'll believe in this when I see the results on closed benchmarks and SimpleBench. My concern is this might be a hallucination machine.

KaoruAoiShiho 20 hours ago|||

In my testing this model is quite bad and far behind 235b a22b. https://fiction.live/stories/Fiction-liveBench-Sept-12-2025/...

bigyabai 21 hours ago|||

Might be. FWIW, my experience with the Qwen3 30b model basically took ChatGPT out of rotation for me. It's not hard for me to imagine an 80b model pushing that further, especially with thinking enabled.

I recommend playing with the free hosted models to draw your own conclusions: https://chat.qwen.ai/

cchance 22 hours ago||

Those rope tests are impressive AF

esafak 1 day ago|

ICYMI qwen3-max was released last week.

Alifatisk 22 hours ago|

Was Qwen3-max better than Qwen3-235B-A22B-2507 at anything? Except higher token limit?

esafak 12 hours ago||

According to benchmarks. Scroll to the bottom of https://lmarena.ai/leaderboard/

More comments...