Top
Best
New

Posted by cmitsakis 15 hours ago

Qwen3.6-35B-A3B: Agentic coding power, now open to all(qwen.ai)
957 points | 424 commentspage 7
shevy-java 15 hours ago|
I don't want "Agentic Power".

I want to reduce AI to zero. Granted, this is an impossible to win fight, but I feel like Don Quichotte here. Rather than windmill-dragons, it is some skynet 6.0 blob.

lagniappe 14 hours ago|
Then who is Rocinante?
blazzy 8 hours ago||
A dimming IBM x40 Thinkpad missing its F key.
amazingamazing 15 hours ago|
More benchmaxxing I see. Too bad there’s no rig with 256gb unified ram for under $1000
cpburns2009 10 hours ago||
Sir, this is 2026. You're not getting any 128GB of RAM for under $1k.
kennethops 15 hours ago|||
do you know if they did this to it?

https://research.google/blog/turboquant-redefining-ai-effici...

kgeist 14 hours ago||
Llama.cpp already uses an idea from it internally for the KV cache [0]

So a quantized KV cache now must see less degradation

[0] https://github.com/ggml-org/llama.cpp/pull/21038

bigyabai 13 hours ago||
taps the sign

  Unified Memory Is A Marketing Gimmeck. Industrial-Scale Inference Servers Do Not Use It.
wren6991 1 hour ago|||
On M5 Pro/Max the memory is actually just attached straight to the GPU die. CPU accesses memory through the die-to-die bridge. I don't see the difference between that and a pure GPU from a memory connectivity point of view.

Wrt inference servers: sure, it's not cost-effective to have such a huge CPU die and a bunch of media accelerators on the GPU die if you just care about raw compute for inference and training. Apple SoCs are not tuned for that market, nor do they sell into it. I'm not building a datacentre, I'm trying to run inference on my home hardware that I also want to use for other things.

zozbot234 13 hours ago||||
Industrial Scale Inference is moving towards LPDDR memory (alongside HBM), which is essentially what "Unified Memory" is.
0x457 10 hours ago|||
> which is essentially what "Unified Memory" is.

Unified memory is when CPU and GPU can reference the same memory address without things being copied (CUDA allows you to write code as if it was unified even if it's not, so that doesn't count, but HMM does count[1])

That is all. What technology is underneath is hardware detail. Unified memory on macs lets you put something into a memory, then do some computation on it with CPU, ANE, ANA, Metal Shaders. All without copying anything.

DGX Spark also has unified memory.

[1]: https://docs.nvidia.com/cuda/cuda-programming-guide/02-basic...

bigyabai 13 hours ago|||
LPDDR is LPDDR. There's nothing "unified" about it architecturally.
rcxdude 5 hours ago|||
Unified Memory is mainly how consumer hardware has enough RAM accessible by the GPU to run larger models, because otherwise the market segmentation jacks up the price substantially.
bigyabai 4 hours ago||
UMA removes the PCIe bottleneck and replaces it with a memory controller + bandwidth bottleneck. For most high-performance GPUs, that would be a direct downgrade.