A 10 year old Xeon is all you need

Posted by cafkafk 16 hours ago

A 10 year old Xeon is all you need(point.free)

645 points | 261 commentspage 6

rvba 12 hours ago|

As someone doing this for fun on a windows 11 machine (96gb ram, 5090 24gb) I wonder if I need any flags to keep the model in memory and avoid swapping to ssd?

I use LM studio and qwen3.5 35B - but never figured out if it is swapping or not.

Om am unrelated note, does anyone know a model that can help with this use case:

https://news.ycombinator.com/item?id=48301635

smw 9 hours ago|

The article talks about using --mlock

nurettin 13 hours ago||

I also run a Qwen 3.6 moe A4B on old hardware. I set it up with

numactl --membind=1

so it is constrained to one of the memory sticks which speeds up token generation a little.

ForOldHack 9 hours ago||

Well, lets get started. I have 4 of those machines, and they are Two dual processor. They all had 32GB of ram, so now I have two with 64GB, and two with zero. They all hand stock K5000s, now how two have two cards. I stripped the uni processors ram and video cards, and put those into the dual procs. They have 256Gb SSDs, and two 1TB disk drives. One machine has 8Gb of VRam across two cards. Dual processors are 8Cx2 and 32 Threads. They can easily play 16 videos at once. For AI, I have not found a model that I can get above 3 tokens a second. Not a one.

ezconnect 12 hours ago||

When you use page up and page down key when reading that blog the first line on the screen is obscured by the floating bar or what ever it is. It is not even needed for reading.

shevy-java 12 hours ago||

The webpage's layout is just horrible. Scrolling is also non-default - and thus rather annoying; I had to stop after two scroll events. Why do people think they need so much fancy effects or non-standard behaviour, if their alleged goal is to get information across to other people?

bflesch 13 hours ago||

Might consider going for even older CPUs which don't have the Intel ME ring -3 thing which is full of backdoors

bflesch 12 hours ago|

I appreciate the downvotes without any reasoning. It's a fact that newer Intel CPUs have Intel ME which was not in older CPUs and significantly increases attack surface if you are not living in a five eyes state.

adrian_b 9 hours ago|||

In a server, you have to worry about the ME only if you also have an Intel Ethernet interface, which is connected to a potentially hostile network.

If that is not true, the ME cannot be controlled remotely.

The existence of the ME is much more worrisome in laptops, where the ME can be accessed remotely through WiFi. There, to be certain that there is no way for the ME to be accessed remotely you would have to disconnect or cut the internal antennas and use a USB dongle for WiFi.

s20n 11 hours ago||||

I agree with the first part. I think this article by FSF about Intel's ME summarizes the issue https://static.fsf.org/nosvn/blogs/Intel_ME_Carikli_article_...

As for the second part, I am not sure about how living in a five eyes state would mitigate it. What do you mean by that?

bflesch 8 hours ago||

As five eyes citizen you have at least some rights on paper and you can appeal to your government, but if you are foreigner these guys can go gloves off without any fear of retribution.

Try analyzing Epstein files and posting about it, they'll give you a proper penetration test of all your devices to see what you found out about their ex employee.

Nowadays even EU citizens migrating away from US cloud providers are a "national security issue".

smilespray 6 hours ago||

Isn't the whole five eyes argument moot because member states spy on citizens from the other countries and trade intel with each other?

bflesch 6 hours ago||

No need for that charade if you are a foreigner, even from NATO ally.

tryauuum 10 hours ago|||

How old are we talking?

bflesch 9 hours ago||

IIRC it is pre-2008.

SXX 13 hours ago||

Now we need someone try run Kimi K2.6 on old Xeon and DDR3. After all these platforms do support up to 768GB RAM.

segmondy 7 hours ago||

You can run these on a turing machine. At what point is it not worth it? At some point the energy to generate each token matters. We often seen token per second. I think a missing metric is tokens per kilowatt. That is what really matters.

SXX 2 hours ago||

This is just like running Crysis via software rendering on CPU / llvmpipe. It dont have to be practical in order to be fun to try.

Havoc 11 hours ago||

It’ll work but yield a token per minute. With ancient servers the throughput is the limiting aspect not mem size

maxothex 6 hours ago||

[flagged]

6_7 7 hours ago||

[dead]

hypfer 13 hours ago|

> The argument for speculative decoding is stronger on CPU than on GPU.

Uh. Uuuh.

No?

___

Also

> While a GPU has a massive pool of ultra-fast High-Bandwidth Memory (HBM), a CPU relies on small, lightning-fast “caches” (L1, L2, L3) built directly onto the processor chip.

What purpose does the quoting of "caches" serve there? Is this AI writing written by that model running on that host?