Top
Best
New

Posted by mfiguiere 11 hours ago

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving(qwen.ai)
509 points | 256 commentspage 2
atilimcetin 10 hours ago|
Nowadays, I'm working on a realtime path tracer where you need proper understanding of microfacet reflection models, PDFs, (multiple) importance sampling, ReSTIR, etc.. Saying that mine is a somewhat specific use case.

And I use Claude, Gemini, GLM, Qwen to double check my math, my code and to get practical information to make my path tracer more efficient. Claude and Gemini failed me more than a couple of times with wrong, misleading and unnecessary information but on the other hand Qwen always gave me proper, practical and correct information. I’ve almost stopped using Claude and Gemini to not to waste my time anymore.

Claude code may shine developing web applications, backends and simple games but it's definitely not for me. And this is the story of my specific use case.

wg0 10 hours ago||
I have said similar things about someone experiencing similar things while writing some OpenGL code (some raytracing etc) that these models have very little understanding and aren't good at anything beyond basic CRUD web apps.

In my own experience, even with web app of medium scale (think Odoo kind of ERP), they are next to useless in understanding and modling domain correctly with very detailed written specs fed in (whole directory with index.md and sub sections and more detailed sections/chapters in separate markdown files with pointers in index.md) and I am not talking open weight models here - I am talking SOTA Claude Opus 4.6 and Gemini 3.1 Pro etc.

But that narrative isn't popular. I see the parallels here with the Crypto and NFT era. That was surely the future and at least my firm pays me in cypto whereas NFTs are used for rewarding bonusess.

wg0 9 hours ago|||
Someone exactly said it better here[0] already.

[0]. https://news.ycombinator.com/item?id=47817982

amarcheschi 9 hours ago|||
a semester ago i was taking a machine learning exam in uni and the exam tasked us with creating a neural network using only numerical libraries (no pytorch ecc). I'm sure that there are a huge lot of examples looking all the same, but given that we were just students without a lot of prior experience we probably deviated from what it had in its training data, with more naive or weird solutions. Asking gemini 3 to refactor things or in very narrow things to help was ok, but it was quite bad at getting the general context, and spotting bugs, so much that a few times it was easier to grab the book and get the original formula right

otoh, we spotted a wrong formula regarding learning rate on wikipedia and it is now correct :) without gemini and just our intuition of "mhh this formula doesn't seem right", that definitely inflated our ego

zozbot234 10 hours ago|||
What size of Qwen is that, though? The largest sizes are admittedly difficult to run locally (though this is an issue of current capability wrt. inference engines, not just raw hardware).
atilimcetin 10 hours ago||
I'm directly using https://chat.qwen.ai (Qwen3.6-Plus) and planning to switch to Qwen Code with subscription.
jasonjmcghee 10 hours ago|||
You may be interested in "radiance cascades"
muyuu 8 hours ago|||
for Anthropic and OpenAI there is a very real danger that people invest serious time finding the strengths of alternative models, esp Chinese/open models that can to some degree be run locally as well

it puts a massive backstop at the margins they can possibly extract from users

hedora 5 hours ago|||
What do you use instead of the Claude code client app?
jansan 10 hours ago||
How "social" does Quen feel? The way I am using LLMs for coding makes this actually the most important aspect by now. Claude 4.6 felt like a nice knowledgeable coworker who shared his thinking while solving problems. Claude 4.7 is the difficult anti-social guy who jumps ahead instead of actually answering your questions and does not like to talk to people in general. How are Qwen's social skills?
zozbot234 10 hours ago||
Qwen feels like wise Chinese philosopher. Talks in very short elegant sentences, but does very solid work.
Alifatisk 9 hours ago||
> Talks in very short elegant sentences

This is not my experience at all, Qwen3.6-Plus spits out multiple paragraphs of text for the prompts I give. It wasn't like this before. Now I have to explicitly tell it not to yap so much and keep it short, concise and direct.

Oras 11 hours ago||
I find it odd that none of OpenAI models was used in comparison, but used Z GLM 5.1. Is Z (GLM 5.1) really that good? It is crushing Opus 4.5 in these benchmarks, if that is true, I would have expected to read many articles on HN on how people flocked CC and Codex to use it.
ac29 11 hours ago||
GLM 5.1 is pretty good, probably the best non-US agentic coding model currently available. But both GLM 5.0 and 5.1 have had issues with availability and performance that makes them frustrating to use. Recently GLM 5.1 was also outputting garbage thinking traces for me, but that appears to be fixed now.
cmrdporcupine 10 hours ago||
Use them via DeepInfra instead of z.ai. No reliability issues.

https://deepinfra.com/zai-org/GLM-5.1

Looks like fp4 quantization now though? Last week was showing fp8. Hm..

wolttam 10 hours ago||
Deepinfra's implementation of it is not correct. Thinking is not preserved, and they're not responding to my submitted issue about it.

I also regularly experience Deepinfra slow to an absolute crawl - I've actually gotten more consistent performance from Z.ai.

I really liked Deepinfra but something doesn't seem right over there at the moment.

cmrdporcupine 9 hours ago||
Damn. Yeah, that sucks. I did play with it earlier again and it did seem to slow down.

It's frankly a bummer that there's not seemingly a better serving option for GLM 5.1 than z.AI, who seems to have reliability and cost issues.

coder68 10 hours ago|||
In fact it is appreciated that Qwen is comparing to a peer. I myself and several eng I know are trying GLM. It's legit. Definitely not the same as Codex or Opus, but cheaper and "good enough". I basically ask GLM to solve a program, walk away 10-15 minutes, and the problem is solved.
Oras 10 hours ago||
cheaper is quite subjective, I just went to their pricing page [0] and cost saving compared to performance does not sell it well (again, personal opinion).

CC has a limited capacity for Opus, but fairly good for Sonnet. For Codex, never had issues about hitting my limits and I'm only a pro user.

https://z.ai/subscribe

kardianos 11 hours ago|||
Yes. GLM 5.1 is that good. I don't think it is as good as Claude was in January or February of this year, but it is similar to how Claude runs now, perhaps better because I feel like it's performance is more consistent.
vidarh 10 hours ago|||
GLM 5.1 is the first model I've found good enough to spring for a subscription for other than Claude and Codex.

It's not crushing Opus 4.5 in real-life use for me, but it's close enough to be near interchangeable with Sonnet for me for a lot of tasks, though some of the "savings" are eaten up by seemingly using more tokens for similar complexity tasks (I don't have enough data yet, but I've pushed ~500m tokens through it so far.

pros 11 hours ago|||
I'm using GLM 5.1 for the last two weeks as a cheaper alternative to Sonnet, and it's great - probably somewhere between Sonnet and Opus. It's pretty slow though.
bensyverson 9 hours ago||
This is what kills it for me… The long thinking blocks can make a simple task take 30 minutes.
Alifatisk 10 hours ago|||
GLM-5 is good, like really good. Especially if you take pricing into consideration. I paid 7$ for 3 months. And I get more usage than CC.

They have difficulty supplying their users with capacity, but in an email they pointed out that they are aware of it. During peak hours, I experience degraded performance. But I am on their lowest tier subscription, so I understand if my demand is not prioritized during those hours.

ekuck 10 hours ago||
Where are you getting 3 months for $7?
Alifatisk 8 hours ago||
They had a Christmas deal that ended January 31.
culi 9 hours ago|||
If you only look at open models, GLM 5.1 is the best performance you can get on on the Pareto distribution

https://arena.ai/leaderboard/text?viewBy=plot&license=open-s...

c0n5pir4cy 10 hours ago|||
I've been using it through OpenCode Go and it does seem decent in my limited experience. I haven't done anything which I could directly compare to Opus yet though.

I did give it one task which was more complex and I was quite impressed by. I had a local setup with Tiltdev, K3S and a pnpm monorepo which was failing to run the web application dev server; GLM correctly figured out that it was a container image build cache issue after inspecting the containers etc and corrected the Tiltfile and build setup.

cleaning 10 hours ago|||
Most HN commenters seem to be a step behind the latest developments, and sometimes miss them entirely (Kimi K2.5 is one example). Not surprising as most people don't want to put in the effort to sift through the bullshit on Twitter to figure out the latest opinions. Many people here will still prefer the output of Opus 4.5/4.6/4.7, nowadays this mostly comes down to the aesthetic choices Anthropic has made.
Oras 10 hours ago||
Not just aesthetics though, from time to time I implement the same feature with CC and Codex just to compare results, and I yet to find Codex making better decisions or even the completeness of the feature.

For more complicated stuff, like queries or data comparison, Codex seems always behind for me.

throwaw12 11 hours ago|||
maybe they decided OpenAI has different market, hence comparing only with companies who are focusing in dev tooling: Claude, GLM
edwinjm 10 hours ago||
Haven’t you heard about Codex?
throwaw12 10 hours ago||
its an SKU from OpenAI's perspective, broader goal and vision is (was) different. Look at the Claude and GLM, both were 95% committed to dev tooling: best coding models, coding harness, even their cowork is built on top of claude code
zozbot234 10 hours ago||
I'm not sure how this makes sense when Claude models aren't even coding specific: Haiku, Sonnet, Opus are the exact same models you'd use for chat or (with the recent Mythos) bleeding edge research.
throwaw12 10 hours ago||
Anthropic models and training data is optimized for coding use cases, this is the difference.

OpenAI on the other hand has different models optimized for coding, GPT-x-codex, Anthropic doesnt have this distinction

pixel_popping 9 hours ago||
But they detect it under the hood and apply a similar "variant", as API results are not the same than on Claude Code (that was documented before by someone).
__blockcipher__ 11 hours ago|||
Yeah GLM’s great for coding, code review, and tool use. Not amazing at other domains.
esafak 11 hours ago||
I use it and think its intelligence compares favorably with OpenAI and Anthropic workhorses. Its biggest weakness is its speed.
chatmasta 8 hours ago||
Is this going to be an open weights model or not? The post doesn’t make it clear. It seems the weights are not available today, but maybe that’s because it’s in preview?
zozbot234 8 hours ago|
The Max series has never been open.
marsulta 9 hours ago||
I think the benchmarks and numbers need to be easier to read. Those benchmarks are useless to the regular consumer.
o10449366 7 hours ago||
I have the M3 Max MBP with 128 GB of memory and the 40 core GPU. What's the best local model I can run today for coding?
alx-ppv 6 hours ago|
You can try https://github.com/AlexsJones/llmfit
Aeroi 5 hours ago||
why do people continue to benchmark their sota models against older models.
XCSme 8 hours ago||
A bit weird to be comparing it to Opus-4.5 when 4.7 was released...
xmly 7 hours ago||
Very impressive!
DeathArrow 10 hours ago||
I am trying since one week to subscribe Alibaba Coding Plan (to use Qwen 3.6 Plus) but it's always out of stock.

They brag about Qwen but don't let people use it.

dakolli 7 hours ago|
ToKeN PrIcEs ArE gOiNg tO PluMmEt, InTelLigEnCe WiLl Be AfForDaBlE FoR EvErYOnE
More comments...