Top
Best
New

Posted by cafkafk 15 hours ago

A 10 year old Xeon is all you need(point.free)
625 points | 255 commentspage 5
b65e8bee43c2ed0 4 hours ago|
so how many tokens/s do you get, pp and tg? did I miss it in the article?
fortran77 2 hours ago||
My current desktop machine is a 24-core Xeon-3345 with 256GB of RAM and an Nvidia 5090. It still feels extremely fast, even though it's about 8 year old technology with a newer video card.
sperandeo 7 hours ago||
ive been doing the same thing. i refactored a old newtek stream machine . its my new favorite thing to do! adding old PCs to my "starcraft" fleet xD
gigatexal 12 hours ago||
What kind of tokens per second did the op get I saw nothing of this written.
urbandw311er 12 hours ago||
11.94 tokens/sec (from another answer above)
hparadiz 12 hours ago||
I'm now staring at a 10 year old 4U with 256 GB of DDR4 and thinking hmmmmm
christkv 13 hours ago||
Makes you wonder if its possible to squeeze more tps out of a strix halo system using the 16 zen5 cores as well as the gpu.
Havoc 13 hours ago||
In general you’re mem bandwidth constrained so cpu vs gpu often ends up similar on APUs
fulafel 12 hours ago||
There are ways to trade off compute power for memory bandwidth (like MTP and other speculative decoding approaches). The CPU and GPU would need to be able to share the same cache for this to work. In the Strix Halo case the GPU has a private cache on the GPU die I think, which is the snag.
cafkafk 13 hours ago||
If you get the inference engine to route the heavy matrix math to the GPU and the speculative drafting to the CPU without choking on latency it's probably gonna be very fast.

Would love to see the benchmarks if someone actually pulls something like that off.

bitwize 4 hours ago||
Successfully ran Gemma4-26B-A4B on my 8yo first-gen Ryzen with a GeForce GTX 1070. It actually ran acceptably well; I was surprised. I even did some coding with it, but the wheels fell abruptly off when it tried several times to use a constant I told it doesn't exist. I only have 32 GiB of RAM in this old bucket, and these results are not worth the RAM consumption, so I put it aside. Maybe if I finish that build with more memory...
api 4 hours ago||
Have to point out one boring thing though: this will use a lot more electricity than newer things. So it'll work, but it'll run up your electric bill.
rvba 11 hours ago|
As someone doing this for fun on a windows 11 machine (96gb ram, 5090 24gb) I wonder if I need any flags to keep the model in memory and avoid swapping to ssd?

I use LM studio and qwen3.5 35B - but never figured out if it is swapping or not.

Om am unrelated note, does anyone know a model that can help with this use case:

https://news.ycombinator.com/item?id=48301635

smw 8 hours ago|
The article talks about using --mlock
More comments...