Qwen3.7-Max: The Agent Frontier

Posted by kevinsimper 13 hours ago

Qwen3.7-Max: The Agent Frontier(qwen.ai)

585 points | 234 commentspage 2

jdw64 9 hours ago|

QWEN really hits the sweet spot it's cheap, fast, and actually good.

eleventen 4 hours ago||

Checking openrouter (it's not available yet) and, uh, what's up with the spike in Qwen usage from early april here? https://openrouter.ai/qwen

Is this normal humans kicking the tires on a new model, or a few whales doing serious benchmarks?

d2kx 4 hours ago||

Qwen 3.6 Plus released and they offered it for free

spaceman_2020 4 hours ago||

personally seen a lot of people switch to Kimi and Qwen after Opus 4.7. Kimi 2.6 feels like Opus 4.6 which, to me, was a great model for 98% of coding tasks

wolttam 4 hours ago||

Frontier: Need it done quick and I'm willing to pay.

Open-weight: Good enough for the majority of tasks, and I'm willing to spend a bit more time and effort steering towards my desired result.

bratao 12 hours ago||

It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.

vessenes 12 hours ago||

Some of it’s probably timing. Some of it is wanting to look good. That said, I just went to the claw-eval site, and neither 4.7 nor 5.5 from oAI are listed on the benchmarks. So there’s also just the time from others to get benchmarking done and published.

varispeed 11 hours ago|||

Opus-4.6 was probably the best model so far before it got nerfed. 4.7 is nowhere near experience I had. In fact I stopped using it completely because more often than not its output is just dumber than local models.

leonidasv 8 hours ago|||

Same here. Can't stand 4.7.

solenoid0937 6 hours ago|||

Opus 4.6 was never nerfed, that's FUD. There were harness-level problems that were fixed.

4.7 is much better. But perception is a funny thing, once you think something is bad you start looking for it everywhere.

anonyfox 3 hours ago|||

Still anecdotal but the exact same coding task on the exact same repo (I clone from GitHub templates for projects) worked amazingly well in December with CC/Opus, couldn’t accomplish the goal anymore end of march, with essentially identical prompts, and 4.7 was just comically useless. But even these days I tried repeatedly and 4.6 still can’t do the thing it could in December.

kroaton 3 hours ago|||

Did you even use it? It was nerfed to hell and back. It stopped following instructions, forgot what sub-agents responded and so on. Stop spreading this pro-Anthropic narrative. They did a rug pull due to lack of compute.

dyauspitr 9 hours ago||

Because these can’t compete with the SoTA but they’re close.

bsenftner 11 hours ago||

Any reports from people using their coding agent(s)?

rayboy1995 11 hours ago||

I'm running Qwen 3.6 27B Q5 K M GGUF on a Tesla P40 and koboldcpp using pi.dev as the harness, I gotta say I am impressed. Took some setup and configuring but I already have some code it has made commited and pushed. It can be slow on my hardware at >50k tokens, but the fact I bought this one P40 for like $150 back when the LLM trend started I can't complain. (I have a second one too but I couldn't physically fit the card in my server unfortunately.)

The setup I had to do was important and I had to compile koboldcpp with a few special params for my hardware, I mostly just had Claude figure it out. I don't remember everything I did now but it was very slow and would often stop mid task, it seems it was mostly a parsing issue. It made the model seem broken/dumb, but once I had all that settled I actually am able to use this how I use Claude Code. Disclaimer, I am pretty explicit with requirements, I imagine this fails more when you leave it to figure out things on its own but for my flow its pretty rad.

Currently setting it up as an automated agent now to pull Trello cards, create PRs for them, and move the card to be reviewed.

Command I am using to run: python koboldcpp.py \ --port 61514 --quiet --multiuser --gpulayers 999 --contextsize 262144 --quantkv 2 \ --usecublas normal --threads 4 --jinja --jinja_tools --jinja_kwargs '{"enable_thinking":true, "preserve_thinking":false}' \ --skiplauncher --model /data/models/Qwen3.6-27B-Q5_K_M.gguf --smartcache 5

lostmsu 8 hours ago||

Qwen recommends to preserve_thinking: true for agentic/coding workloads.

rayboy1995 6 hours ago||

Thanks!! I had disabled that previously while debugging, I can confirm this is helping accuracy from what I can tell so far. (And speed since the cache is preserved more often!)

satvikpendem 4 hours ago||

Use the MTP models which 2x token generation speed, for example: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

vibe42 10 hours ago||

I'm using the pi-mono coding agent (open source, free) without any extensions and very simple prompts. The 3.6 27B model (BF16, 250k context) uses 67GB VRAM on an RTX PRO 9000.

It's very capable on almost any coding task I've thrown at it, and very good for easy-to-medium hard scripts, new code bases.

It struggles on some complex tasks in larger code bases, e.g. using to debug and fix bugs in llama.cpp it gets close to working code but often introduces errors. For such tasks its still very useful as a search/explore tool and drafting fixes.

XCSme 11 hours ago||

Any info on pricing and latency?

mchusma 7 hours ago|

I've looked like a dozen places, I don't see anything. :(

aliljet 8 hours ago||

Where can a user reasonably host this in an affordable way to access the local LLM revolution?

satvikpendem 4 hours ago||

Unsloth Studio with its MTP support: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

julianlam 6 hours ago|||

Try llama.cpp and Qwen3.6-35B-A3B

Good balance of intelligence and speed.

plagiarist 6 hours ago||

I think their Max models are far bigger than fits on consumer hardware. People are typically using Apple, AMD Halo, or dGPUs if/when they do smaller versions. Those are all varying degrees of "affordable."

LAC-Tech 2 hours ago||

Trying to buy Qwen credits and get an API key is a challenge all in itself. So many site redirects.

hmaddipatla 9 hours ago||

The tokenomics and value for capability, context and latency look like they could deliver super competitive offer - what would it take for you to switch??

xiaoluolyg 8 hours ago||

congrats to qwen teams, remarkable

cft 7 hours ago|

Downloading this and cancelling Google Antigravity Pro at the same time:

I had a Google Pro account that I inherited from buying a Pixel 9 XL - it's free for a year after a flagship Pixel phone purchase. After a year they started charging for it, and i tolerated it, because Flash was usable in Antigravity for dumb auxiliary tasks that I did not want to waste GPT/Opus on. It had a separate generous quota from Gemini 3.1 Pro. Now with Flash 3.5 they combined the quotas with Pro, such that on a Google pro account you can work 4-5 hours per week in Flash. And by the way, 3.1 Pro is useless for programming, compared to Codex/Opus

bel8 6 hours ago|

same boat. Google Pro AI quota became barely useful for anything meaningful.

I think they envision Pro plan as "just a taste of AI, enough to lure folks into the Ultra plan" but that won't work for me when Codex is half the price and DeepSeek 4 Flash is 1/10 of their price per task.

So I'll downgrade just enough to keep my Google Drive space. And use DeepSeek 4 as workhorse plus Codex or Copilot for advanced stuff.

cft 5 hours ago||

How do you use DeepSeek 4 Flash? Via a cli?

bel8 5 hours ago||

I use their VSCode extension:

https://marketplace.visualstudio.com/items?itemName=sst-dev....

It adds a button to VSCode to open a tab with opencode loaded. It's a bit better than just opening the CLI because it has some vscode integration.

With their $10/mo opencode go plan: https://opencode.ai/go

For my use it's about endless use of DS4 Flash on high setting. I find high better than max because it's less chatty.

The best thing is the speed. So many tokens per second.

edit: This is how it looks in action https://i.imgur.com/RNDXr07.png

georgefrowny 4 hours ago||

How is that extension compared to, say, DS4 via OpenRouter and the usual VSCode Copilot panel?

bel8 4 hours ago||

Good question.

I haven't tested openrouter but I expect it to be slightly less cheap because it charges per token and opencode Go plan is a $10/mo fixed price model. Economies of scale leads me to think that for heavy use, openrouter will be more costly since opencode Go can subside heavy users like me with money from light users (just like gyms do with people that pay but barely use it).

With that said, I find vscode native copilot chat more pleasant to use, but also more laggy for large sessions.

opencode configuration is less polished and you'll have to grok around for some things. For example opencode CTRL+p conflicts with VSCode CTRL+p. I changed opencode to use Ctrl+L instead.

More comments...