I am giving up on Intel and have bought an AMD Ryzen 9950X3D

Posted by secure 6 days ago

I am giving up on Intel and have bought an AMD Ryzen 9950X3D(michael.stapelberg.ch)

343 points | 358 comments

c0l0 6 days ago|

I realize this has not much to do with CPU choice per se, but I'm still gonna leave this recommendation here for people who like to build PCs to get stuff done with :) Since I've been able to afford it and the market has had them available, I've been buying desktop systems with proper ECC support.

I've been chasing flimsy but very annoying stability problems (some, of course, due to overclocking during my younger years, when it still had a tangible payoff) enough times on systems I had built that taking this one BIG potential cause out of the equation is worth the few dozens of extra bucks I have to spend on ECC-capable gear many times over.

Trying to validate an ECC-less platform's stability is surprisingly hard, because memtest and friends just aren't very reliably detecting more subtle problems. PRIME95, y-cruncher and linpack (in increasing order of effectiveness) are better than specialzied memory testing software in my experience, but they are not perfect, either.

Most AMD CPUs (but not their APUs with potent iGPUs - there, you will have to buy the "PRO" variants) these days have full support for ECC UDIMMs. If your mainboard vendor also plays ball - annoyingly, only a minority of them enables ECC support in their firmware, so always check for that before buying! - there's not much that can prevent you from having that stability enhancement and reassuring peace of mind.

Quoth DJB (around the very start of this millenium): https://cr.yp.to/hardware/ecc.html :)

dijit 6 days ago||

> only a minority of them enables ECC support in their firmware, so always check for that before buying!

This is the annoying part.

That AMD permits ECC is a truly fantastic situation, but if it's supported by the motherboard is often unlikely and worse: it's not advertised even when it's available.

I have an ASUS PRIME TRX40 PRO and the tech specs say that it can run ECC and non-ECC but not if ECC will be available to the operating system, merely that the DIMMS will work.

It's much more hit and miss in reality than it should be, though this motherboard was a pricey one: one can't use price as a proxy for features.

sunshowers 5 days ago|||

If you're on Linux, dmesg containing

  EDAC MC0: Giving out device to module amd64_edac

is a pretty reliable indication that ECC is working.

See my blog post about it (it was top of HN): https://sunshowers.io/posts/am5-ryzen-7000-ecc-ram/

oneshtein 5 days ago||

My `dmesg` tells:

    EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
    EDAC MC1: Giving out device to module igen6_edac controller Intel_client_SoC MC#1: DEV 0000:00:00.0 (INTERRUPT)

but `dmidecode --type 16` says:

    Error Correction Type: None
    Error Information Handle: Not Provided

sunshowers 5 days ago|||

To be honest I don't know how Intel works, my post is limited to AMD.

c0l0 5 days ago|||

Are you sure you have ECC-capable DIMM installed?

What does

    find /sys/devices/system/edac/mc/mc0/csrow* -maxdepth 1 -type f -exec grep --color . {} +

report?

oneshtein 4 days ago||

AFAIK, I have 2x DDR5 non-ECC memory (`dmidecode --type 17` says Samsung M425R1GB4BB0-CQKOL). Your command tells about SECDEC (single bit error correction, double bit error detection).

    /sys/devices/system/edac/mc/mc0/csrow0/ce_count:0
    /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label:MC#0_Chan#0_DIMM#0
    /sys/devices/system/edac/mc/mc0/csrow0/size_mb:8192
    /sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
    /sys/devices/system/edac/mc/mc0/csrow0/ue_count:0
    /sys/devices/system/edac/mc/mc0/csrow0/mem_type:Unbuffered-DDR3
    /sys/devices/system/edac/mc/mc0/csrow0/edac_mode:SECDED
    /sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
    /sys/devices/system/edac/mc/mc0/csrow0/ch1_dimm_label:MC#0_Chan#1_DIMM#0
    /sys/devices/system/edac/mc/mc0/csrow0/dev_type:x16

> find /sys/devices/system/edac/mc/mc0/csrow* -maxdepth 1 -type f -exec grep --color . {} +

It looks like DDR5 supports SECDEC by default. :-/

c0l0 6 days ago||||

Usually, if a vendor's spec sheet for a (SOHO/consumer-grade) motherboard mentions ECC-UDIMM explicitly in its memory compatibility section, and (but this is a more recent development afaict) DOES NOT specify something like "operating in non-ECC mode only" at the same time, then you will have proper ECC (and therefore EDAC and RAS) support in Linux, if the kernel version you have can already deal with ECC on your platform in general.

I would assume your particular motherboard to operate with proper SECDED+-level ECC if you have capable, compatible DIMM, enable ECC mode in the firmware, and boot an OS kernel that can make sense of it all.

adrian_b 5 days ago|||

This is weird. I have used many ASUS MBs specified as "can run ECC and non-ECC" and this has always meant that there was an ECC enabling option in the BIOS settings, and then if the OS had an appropriate EDAC driver for the installed CPU ECC worked fine.

I am writing this message on such an ASUS MB with a Ryzen CPU and working ECC memory. You must check that you actually have a recent enough OS to know your Threadripper CPU and that you have installed any software package required for this (e.g. on Linux "edac-utils" or something with a similar name).

jml7c5 5 days ago|||

The big problem with ECC for me is that the sticks are so much more expensive. You'd expect ECC UDIMMs to have a price premium of just over 12.5% (because there are 9 chips instead of 8), but it's usually at least 100%. I don't mind paying reasonable premium for ECC, but paying double is too hard to swallow.

mr_toad 5 days ago|||

Trouble with enterprise is that the people buying care about the technology, but not the cost, while the people that do care about cost don’t understand the technology.

Some businesses (and governments) try and unify their purchasing, but this seems to make things worse, with the purchasing department both not understanding technology and being outwitted by vendors.

thewebguyd 5 days ago|||

> Trouble with enterprise is that the people buying care about the technology, but not the cost

Enterprise also ruins it for small/medium businesses as well, at least those with dedicated internal IT departments who do care about both the technology and the cost. We are left with unreliable consumer-grade hardware, or prohibitively expensive enterprise hardware.

There's very little in between. This market is also underserved with software/SaaS as well with the SSO Tax and whatnot. There's a huge gap between "I'm taking the owner's CC down to best buy" and "Enterprise" that gets screwed over.

wmf 5 days ago|||

Enterprise IT is overpriced so you can negotiate a 50% discount. Unfortunately negotiating isn't worth it for something like a pair of DIMMs.

attila-lendvai 5 days ago||

Enterprise IT is overpriced because it's mostly people spending other people's money... which usually has a lot of turbulence in its flow...

sippeangelo 5 days ago||||

Yeah, with that kind of markup you might as well just buy new ones IF they break, or just spend the extra budget on better quality parts. Just having to pick a very specific motherboard that probably is very much not optimal for your build will blow the costs up even more, and for what gain?

I've been building my own gaming and productivity rigs for 20 years and I don't think memory has ever been a problem. Maybe survivorship bias, but surely even budget parts aren't THIS bad.

lmm 5 days ago||

> with that kind of markup you might as well just buy new ones IF they break

Assuming you can tell, and assuming you don't end up silently corrupting your data before then.

coldtea 5 days ago||

You take backups and do checksums for the valuable data. For the rest, who cares...

cma 5 days ago||

Let's say you corrupted one bit in a blender asset 200 revisions ago and it was unnoticeable and still upgraded through five blender upgrades, but now on the sixth upgrade it fails with a corruption error and doesn't upgrade.

Without knowing how to fix that error you've lost 200 revisions of work. You can go back and find which revision had the problem, go before that, and upgrade it to the latest blender, but all your 200 revisions were made on other versions that you can't backport.

fluoridation 5 days ago|||

So don't upgrade it. Export it to an agnostic format and re-import it in the new version. Since it's failing to upgrade, it must be a metadata issue, not a data issue, so removing the Blender-specific bits will fix it.

What a silly hypothetical. There's a myriad freak occurrences that could make you have to redo work that you don't worry about. Now, I'm not saying single-bit errors don't happen. They just typically don't result in the sort of cascading failure you're describing.

cma 5 days ago||

Doing a lossy export/reimport process probably isn't going to be viable on something like a big movie scene blender file with lots of constraints, scripted modifiers and stuff that doesn't automatically come through with an export to USD.

My point is that there are scenarios where corruption in the past puts you in a bind and can cause a lot of loss of work or expensive diagnostic and recovery process long after it first occurred, blender was just one example but it can be much worse with proprietary software binary formats where you don't have any chance of jumping into the debugger to figure out what's going wrong with an upgrade or export. And maybe the subscription version of it won't even let you go back to the old version.

> There's a myriad freak occurrences that could make you have to redo work that you don't worry about.

Yes other sources of corruption are more likely from things like software errors. It's not that you wouldn't worry about them if you had unlimited budget and could have people audit the code etc., but you do have a budget and ECC is much cheaper relative to that. That doesn't mean it always makes sense for everyone to pay more for ECC. But I can see why people working on gigantic CAD files for nuclear reactor design, etc. tend to have workstations with ECC.

fluoridation 5 days ago||

>a big movie scene blender file with lots of constraints, scripted modifiers and stuff

Not really what I would call an "asset", but fine.

>It's not that you wouldn't worry about them if you had unlimited budget and could have people audit the code etc.

Hell, I was thinking something way simpler, like your cat climbing on the case and throwing up through the top vents, or you tripping and dropping your ass on your desk and sending everything flying.

>But I can see why people working on gigantic CAD files for nuclear reactor design, etc. tend to have workstations with ECC.

Yeah, because those people aren't buying their own machines. If the credit card is yours and you're not doing something super critical, you're probably better served by a faster processor than by worrying against freak accidents.

coldtea 5 days ago|||

>Let's say you corrupted one bit in a blender asset 200 revisions ago and it was unnoticeable and still upgraded through five blender upgrades, but now on the sixth upgrade it fails with a corruption error and doesn't upgrade.

And let's say you have archived copies of it with checksums like I suggested, going back to all revisions ago.

What's the issue again now, that ECC would have solved? Not to mention that ECC wouldn't help at all with corruption at the disk level anyway.

fluoridation 5 days ago||

>What's the issue again now, that ECC would have solved?

If the bit flip happened in RAM, the checksum would be of the corrupted data. ECC corrects single bit errors of data on RAM.

>Not to mention that ECC wouldn't help at all with corruption at the disk level anyway.

Yes, using ECC without ZFS, btrfs, ReFS, or checksummed file formats is pretty pointless (unless your application never touches storage).

varispeed 5 days ago|||

You would think that competition would naturally regulate the price down, but it seems like we are dealing with some sort of a cartel that regulators have not caught up with yet.

consp 6 days ago|||

Isn't it mostly an ease of mind thing? I've never seen a ECC error on my home server which has plenty of memory in use and runs longer than my desktop. Maybe it's more common with higher clocked, near the limit, desktop PC's.

Also: DDR5 has some false ecc marketing due to the memory standard having an error correction scheme build in. Don't fall for it.

adrian_b 5 days ago|||

Whether you will see ECC errors depends a lot on how much memory you have and how old it is.

A computer with 64 GB of memory is 4 times more likely to encounter memory errors than one with 16 GB of memory.

When DIMMs are new, at the usual amounts of memory for desktops, you will see at most a few errors per year, sometimes only an error after a few years. With old DIMMs, some of them will start to have frequent errors (such modules presumably had a borderline bad fabrication quality and now have become worn out, e.g. due to increased leakage leading to storing a lower amount of charge on the memory cell capacitors).

For such bad DIMMs, the frequency of errors will increase, and it may become of several errors per day, or even per hour.

For me, a very important advantage of ECC has been the ability to detect such bad memory modules (in computers that have been used for 5 years or more) and replace them before corrupting any precious data.

I also had a case with a HP laptop with ECC, where memory errors had become frequent after being stored for a long time (more than a year) in a rather humid place, which might have caused some oxidation of the SODIMM socket contacts, because removing the SODIMMs, scrubbing the sockets and reinserting the SODIMMs made disappear the errors.

fluoridation 5 days ago||

>A computer with 64 GB of memory is 4 times more likely to encounter memory errors than one with 16 GB of memory.

No. Or well, not exactly. More bits will flip randomly, but if between the two systems only the total installed memory changed, both systems will see the same amount of memory errors, because bit flips on the additional 48 GB will not result in errors, because they will not be used. Memory errors scale with memory used not with memory installed.

c0l0 6 days ago||||

I see a particular ECC error at least weekly on my home desktop system, because one of my DIMMs doesn't like the (out of spec) clock rate that I make it operate at. Looks like this:

    94 2025-08-26 01:49:40 +0200 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=18), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=0, mcgcap=0x0000011c, status=0x9c2040000000011b, addr=0x36e701dc0, misc=0xd01a000101000000, walltime=0x68aea758, cpuid=0x00a50f00, bank=0x00000012
    95 2025-09-01 09:41:50 +0200 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=18), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=0, mcgcap=0x0000011c, status=0x9c2040000000011b, addr=0x36e701dc0, misc=0xd01a000101000000, walltime=0x68b80667, cpuid=0x00a50f00, bank=0x00000012

(this is `sudo ras-mc-ctl --errors` output)

It's always the same address, and always a Corrected Error (obviously, otherwise my kernel would panic). However, operating my system's memory at this clock and latency boosts x265 encoding performance (just one of the benchmarks I picked when trying to figure out how to handle this particular tradeoff) by about 12%. That is an improvement I am willing to stomach the extra risk of effectively overclocking the memory module beyond its comformt zone for, given that I can fully mitigate it by virtue of properly working ECC.

Hendrikto 6 days ago|||

Running your RAM so far out of spec that it breaks down regularly, where do you take the confidence that ECC will still work correctly?

Also: Could you not have just bought slightly faste RAM, given the premium for ECC?

c0l0 6 days ago||

"Breaks down" is a strong choice of words for a single, corrected bit error. ECC works as designed, and demonstrates that it does by detecting this re-occurring error. I take the confidence mostly from experience ;)

And no, as ECC UDIMM for the speed (3600MHz) I run mine at simply does not exist - it is outside of what JEDEC ratified for the DDR4 spec.

adithyassekhar 6 days ago||

JEDEC rated DDR4 at only 2400mhz right? And anything higher is technically over clocking?

dijit 6 days ago|||

JEDEC has a few frequencies they support: https://www.jedec.org/standards-documents/docs/jesd79-4a

DDR4-1600 (PC4-12800)

DDR4-1866 (PC4-14900)

DDR4-2133 (PC4-17000)

DDR4-2400 (PC4-19200)

DDR4-2666 (PC4-21300)

DDR4-2933 (PC4-23466)

DDR4-3200 (PC4-25600) (the highest supported in the DDR4 generation)

What's *NOT* supported are some enthusiast ones that typically require more than 1.2v for example: 3600 MT/s, 4000 MT/s & 4266 MT/s

c0l0 6 days ago|||

JEDEC specifies rates up to 3200MT/s, what's officially referred to as DDR4-3200 (PC4-25600).

kderbe 6 days ago||||

I would loosen the memory timings a bit and see if that resolves the ECC errors. x265 performance shouldn't fall since it generally benefits more from memory clock rate than latency.

Also, could you share some relevant info about your processor, mainboard, and UEFI? I see many internet commenters question whether their ECC is working (or ask if a particular setup would work), and far fewer that report a successful ECC consumer desktop build. So it would be nice to know some specific product combinations that really work.

c0l0 5 days ago||

I've been on AM4 for most of the past decade (and still am, in fact), and the mainboards I've personally had in use with working ECC support were:

  - ASRock B450 Pro4
  - ASRock B550M-ITX/ac
  - ASRock Fatal1ty B450 Gaming-ITX/ac
  - Gigabyte MC12-LE0

There's probably many others with proper ECC support. Vendor spec sheets usually hint at properly working ECC in their firmware if they mention "ECC UDIMM" support specifically.

As for CPUs, that is even easier for AM4: Everything that's not based on a APU core (there are some SKUs marketed without iGPU that just have the iGPU part of the APU disabled, such as the Ryzen 5 5500) cannot support ECC. An exception to that rule are "PRO"-series APUs, such as the Ryzen 5 PRO 5650G et al., which have an iGPU, but also support ECC. Main differences (apart from the integrated graphics) between CPU and APU SKUs is that the latter do not support PCIe 4.0 (APUs are limited to PCIe 3.0), and have a few Watts lower idle power consumption.

When I originally built the desktop PC that I still use (after a number of in-place upgrades, such as swapping out the CPU/GPU combo for an APU), I blogged about it (in German) here: https://johannes.truschnigg.info/blog/2020-03-23#0033-2020-0...

If I were to build an AM5 system today, I would look into mainboards from ASUS for proper ECC support - they seem to have it pretty much universally supported on their gear. (Actual out-of-band ECC with EDAC support on Linux, not the DDR5 "on-DIE" stuff.)

ainiriand 5 days ago|||

I think you've found a particularly weak memory cell, I would start thinking about replacing that module. The consistent memory_channel=1, csrow=0 pattern confirms it's the same physical location failing predictably.

wpm 5 days ago||||

I had a somewhat dodgy stick of used RAM (DDR4 UDIMM) in a Supermicro X11 board. This board is running my NAS, all ZFS, so RAM corruption can equal data corruption. The OS alerted me to recoverable errors on DIMM B2. Swapped it and another DIMM, rebooted, saw DIMM error on slot B1. Swapped it for a spare stick. No more errors.

This was running at like, 1866 or something. It's a pretty barebones 8th gen i3 with a beefier chipset, but ECC still came in clutch. I won't buy hardware for server purposes without it.

immibis 6 days ago||||

I saw a corrected memory error logged every few hours when my current machine was new. It seems to have gone away now, so either some burn-in effect, or ECC accidentally got switched off and all my data is now corrupted. Threadripper 7000 series, 4x64GB DDR5.

Edit: it's probably because I switched it to "energy efficiency mode" instead of "performance mode" because it would occasionally lock up in performance mode. Presumably with the same root cause.

Jach 5 days ago||||

I have a slightly older system with 128 GB of UDIMM DDR4 over four sticks. Ran just fine for quite a while but then I started having mysterious system freezes. Later discovered I had somehow disabled ECC error reporting in my system log on linux... once that was turned back on, oh, I see notices of recoverable errors. I finally found a repeatable way to trigger a freeze with a memory stress testing tool and that was from an unrecoverable error. I couldn't narrow the problem down to a single stick or RAM channel, it seemed to only happen if all 4 slots were occupied, but I eventually figured out that if I just lowered the RAM speed from standard 3200 MHz to the next officially supported (by the sticks) step of 2933 MHz, everything was fine again and no more ECC errors, recoverable or not. Been running like that since.

Last winter I was helping someone put together a new gaming machine... it was so frustrating running into the fake ecc marketing for DDR5 that you mention. The motherboard situation for whether they support it or not, or whether a bios update added support or then removed it or added it back or not, was also really sad. And even worse IMO is that you can't actually max out 4 slots on the top tier mobos unless you're willing to accept a huge drop in RAM speed. Leads to ugly 48 GB sized sticks and limiting to two of them... In the end we didn't go with ECC for that someone, but I was pretty disappointed about it. I'm hoping the next gen will be better, for my own setup running ZFS and such I'm not going to give up ECC.

hedora 5 days ago||||

You have to go pretty far down the rabbit hole to make sure you’ve actually got ECC with [LP]DDR5

Some vendors use hamming codes with “holes” in them, and you need the CPU to also run ECC (or at least error detection) between ram and the cache hierarchy.

Those things are optional in the spec, because we can’t have nice things.

BikiniPrince 5 days ago||||

I pick up old serves for my garage system. With edac it is a dream to isolate the fault and be instantly aware. It also lets you determine the severity of the issue. Dimms can run for years with just the one error or overnight explode into streams of corrections. I keep spares so it’s fairly easy to isolate any faults. It’s just how do you want to spend your time?

Scramblejams 5 days ago|||

I run a handful of servers and I have a couple that pop ECC errors every year or three, so YMMV.

swinglock 6 days ago|||

Excellent point. It's a shame and a travesty that data integrity is still mostly locked away inside servers, leaving most other computing devices effectively toys, the early prototype demo thing but then never finished and sold forever at inflated prices.

I wish AMD would make ECC a properly advertised feature with clear motherboard support. At least DDR5 has some level of ECC.

kevin_thibedeau 5 days ago|||

> At least DDR5 has some level of ECC.

That is mostly to assist manufacturers in selling marginal chips with a few bad bits scattered around. It's really a step backwards in reliability.

wpm 5 days ago|||

I wish AMD wouldn't gate APU ECC support behind unobtainium "PRO" SKUs they only give out, seemingly, to your typical "business" OEMs and the rare Chinese miniPC company.

c0l0 5 days ago||

It's not that dire as you make it out to be :)

Both the 8700G and the 8700G PRO are readily available in the EU, and the PRO SKU is about 50% more expensive (EUR 120 in absolute numbers): https://geizhals.eu/?cmp=3096260&cmp=3096300&cmp=3200470&act...

rendaw 5 days ago|||

So I'm trying to learn more about this stuff, but aren't there multiple ECC flavors and the AMD consumer CPUs only support one of them (not the one you'd have on servers?)

Does anyone maintain a list with de-facto support of amd chips and mainboards? That partlist site only shows official support IIRC, so it won't give you any results.

adrian_b 5 days ago|||

The difference between the "unbuffered" ECC DIMMs (ECC UDIMMs), which you must use in desktop motherboards (and in some of those advertised as "workstation" MBs) and the "registered" ECC DIMMs (ECC RDIMMs), which you must use in server motherboards (and in some of the "workstation" MBs), has existed for decades.

However in the past there have existed very few CPU models and MBs that supported either kind of DIMMs, while today this has become completely impossible, as the mechanical and electrical differences between them have increased.

In any case, today, like also 20 years ago, when searching for ECC DIMMs you must always search only the correct type, e.g. unbuffered ECC DIMMs for desktop CPUs.

In general, registered ECC DIMMs are easier to find, because wherever "server memory" is advertised, that is what is meant. For desktop ECC memory, you must be careful to see both "ECC" and "unbuffered" mentioned in the module description.

hungmung 5 days ago|||

Seconding this. I'm looking for a fanless industrial mini PC with out of band ECC and I'm having a hell of a time.

adrian_b 5 days ago|||

Had you been looking for "in-band ECC", the cheap ODROID H4 PLUS ($150) or the cheaper ODROID H4 ($110) would have been fine, or for something more expensive some of the variants of Asus NUC 13 Rugged support in-band ECC.

For out-of-band ECC, e.g. with standard ECC SODIMMs, all the embedded SBCs that I have seen used only CPUs that are very obsolete nowadays, i.e. ancient versions of Intel Xeon or old AMD industrial Ryzen CPUs (AMD's series of industrial Ryzen CPUs are typically at least one or two generations behind their laptop/desktop CPUs).

Moreover all such industrial SBCs with ECC SODIMMs were rather large, i.e. either in the 3.5" form factor or in the NanoITX form factor (120 mm x 120 mm), and it might have been necessary to replace their original coolers with bigger heatsinks for fanless operation.

In-band ECC causes a significant decrease of the performance, but for most applications of such mini-PCs the performance is completely acceptable.

nicman23 5 days ago|||

https://www.asrockind.com/en-gb/iBOX-V2000V

something like that?

devnullbrain 5 days ago|||

I like the warning not to buy a motherboard from a manufacturer that has been defunct for 17 years

storus 5 days ago|||

Now where can I get 64GB ECC UDIMM DDR5 modules so that my X870E board can have 256GB RAM? The largest I found were just 48GB ECC UDIMMs or 64GB non-ECC UDIMMs.

c0l0 5 days ago||

I don't think 64GB ECC UDIMM is commercially available yet. I use Geizhals to check for EU availability: https://geizhals.eu/?cat=ramddr3&xf=7500_DDR5~7501_DIMM~7761...

In my experience, it's generally unwise to push the platform you're on to the outermost of its spec'd limits. At work, we bought several 5950X-based Zen3 workstations with 128GB of 3200MT/s ECC UDIMM, and two of these boxes will only ever POST when you manually downclock memory to 3000MT/s. Past a certain point, it's silicon lottery deciding if you can make reality live up to the datasheets' promises.

storus 5 days ago||

I am fine with downclocking the RAM; my X870E board (ProArt) should be fine running ECC, I only use 9800X3D to have a single CCD (maybe upgraded later to EPYC 4585PX) and together have RTX 6000 Pro and 2x NVLinked A6000 in PCIe slots, with two M.2 SSDs. Power supply follows the latest specs as well. This build was meant to be a light-weight Threadripper replacement and ECC is a must for my use cases (it's a build for my summer house so that I can do serious work while there).

unethical_ban 5 days ago|||

Any specific recommendations? I am having random, OS agnostic lockups on my ryzen 1xxx build and thought DDR5 will be enough, but true ECC sounds good.

edit: Looks like a lot of Asus motherboards work, and the thing to look for is "unbuffered" ECC. Kingston has some, I see 32GB module for $190 on Newegg.

moffkalast 5 days ago|||

Do you live at a very high altitude with a significant amount of solar radiation, or at an underfunded radiology lab or perhaps near a uranium deposit or a melted down nuclear reactor? Because the average machine should never see a memory bit flip error at all during its entire lifetime.

rkomorn 5 days ago|||

Then how do you explain all the bugs in the software I write?!

moffkalast 5 days ago||

It truly is a cosmic mystery :)

yndoendo 5 days ago|||

Bit flipping can be the byproduct of bow the system components harmonize. Role hammer RAM also has the same affect. [0]

[0] https://en.m.wikipedia.org/wiki/Row_hammer

moffkalast 4 days ago||

> Furthermore, research shows that precisely targeted three-bit Rowhammer flips prevents ECC memory from noticing the modifications.

Doesn't exactly sound like a use case for ECC memory, given that it can't correct these attacks. Interesting though, I'd have thought that virtual addresses would've largely fixed this.

wer232essf 5 days ago||

[dead]

enronmusk 6 days ago||

If OP's CPU cooler (Noctua NH-D15 G2) wasn't able to cool down his CPU below 100C, he must have been (intentionally or unintentionally with Asus multi core enhancement) overclocked his CPU. Or he didn't apply thermal paste properly or didn't remove the cooler plastic sticker?

I have followed his blog for years and hold him in high respect so I am surprised he has done that and expected stability at 100C regardless of what Intel claim is okay.

Not to mention that you rapidly hit diminishing returns pass 200W with current gen Intel CPUs, although he mentions caring able idle power usage. Why go from 150W to 300W for a 20% performance increase?

magicalhippo 5 days ago||

He did have the Fractal Define 7 Compact case, and the pictures[1] only show a single 140mm case fan. From personal experience the Fractal Define cases are great at sound reduction due to the thermal padding, but those pads also insulates well.

Given the motherboard and RAM will also generate quite some heat, if the case fan profile was conservative (he does mention he likes low noise), could be the insides got quite toasty.

Back when I got my 2080 Ti, I had this issue when gaming. The internal temps would get so hot due to the blanket effect of the padding I couldn't touch the components after a gaming session. Had to significantly tweak my fan profiles. His CPU at peak would generate about the same amount of heat as my 2080 Ti + CPU I had then, and I had the non-Compact case with two case fans.

[1]: https://michael.stapelberg.ch/posts/2025-05-15-my-2025-high-...

enronmusk 5 days ago|||

Excellent point. A single case fan is highly atypical and concerning.

I also have a fractal define case with anti noise padding material and dust filters, but my temperatures are great and the computer is almost inaudible even when compiling code for hours with -j $(nproc). And my fans and cooler are much cheaper than his.

magicalhippo 5 days ago|||

> thermal padding

That should of course be sound padding...

Dunedan 5 days ago|||

> […] so I am surprised he has done that and expected stability at 100C regardless of what Intel claim is okay.

Intel specifies a max operating temperature of 105°C for the 285K [1]. Also modern CPUs aren't supposed to die when run with inadequate cooling, but instead clock down to stay within their thermal envelope.

[1]: https://www.intel.com/content/www/us/en/products/sku/241060/...

epolanski 5 days ago|||

I always wonder: how many sensors are registering that temp?

Because CPUs can get much hotter in specific spots at specific pins no? Just because you're reading 100, doesn't mean there aren't spots that are way hotter.

My understanding is that modern Intel CPUs have a temp sensor per core + one at package level, but which one is being reported?

lucianbr 5 days ago||

There's no way on Earth Intel hasn't thought of this. Probably the sensors are in or near the places that get the hottest, or they are aware of the delta and have put in the proper margin, or something like that.

epolanski 5 days ago||

I haven't said they didn't think about it, I'm just asking due to sheer ignorance.

enronmusk 5 days ago|||

Yes, I have read the article and I agree Intel should be shamed (and even sued) for inaccurate statements. But it doesn't change the fact it has never been a good idea to run desktop processors at their throttling temperature -- it's not good for performance, it's not good for longevity and stability, and it's also terrible for efficiency (performance per watt).

Anyway, OP's cooler should be able to cool down 250W CPUs below 100C. He must have done something wrong for this to not happen. That's my point -- the motherboard likely overclocked the CPU and he failed to properly cool it down or set a power limit (PL1/PL2). He could have easily avoided all this trouble.

dahauns 5 days ago|||

The cpu temps are one thing, but if - as you said - even a beast like the D15 G2 has it pegged at 100C, this very much sounds like bad ventilation and other parts of the system being toasted as well - VRMs in particular, for which the "PRIME" (actually being the low-end series) mainboards from Asus, as used here, don't exactly have a stellar reputation.

And yeah, having Arrow Lake running at its defaults is just a waste of energy. Even halving your TDP just loses you roughly 15% performance in highly MT scenarios...

secure 5 days ago||

> If OP's CPU cooler (Noctua NH-D15 G2) wasn't able to cool down his CPU below 100C, he must have been (intentionally or unintentionally with Asus multi core enhancement) overclocked his CPU. Or he didn't apply thermal paste properly or didn't remove the cooler plastic sticker?

I did not overclock this CPU. I pay attention to what I change in the BIOS/UEFI firmware, and I never select any overclocking options.

Also, I have applied thermal paste properly: Noctua-supplied paste, following Noctua’s instructions for this CPU socket.

enronmusk 5 days ago||

Thank you for responding. How do you explain your CPU hitting 100C in that case? That should not have happened.

https://www.techpowerup.com/review/intel-core-ultra-9-285k/2... lists maximum temperature as 88.2C with the previous gen NH-D15 cooler.

danieldk 6 days ago||

I feel like both Intel and AMD are not doing great in the desktop CPU stability department. I made a machine with a Ryzen 9900X a while back and it had the issue that it would freeze when idling. A few years before I had a 5950X that would regularly crash under load (luckily it was a prebuilt, so it was ultimately fixed).

When you do not have a bunch of components ready to swap out it is also really hard to debug these issues. Sometimes it’s something completely different like the PSU. After the last issues, I decided to buy a prebuilt (ThinkStation) with on-site service. The cooling is a bit worse, etc., but if issues come up, I don’t have to spend a lot of time debugging them.

Random other comment: when comparing CPUs, a sad observation was that even a passively cooled M4 is faster than a lot of desktop CPUs (typically single-threaded, sometimes also multi-threaded).

seec 6 days ago||

Your comment about the passively cooled M4 is misleading. Sure, in single thread, it will be definitely faster. In multithread unless you are going for low end or older CPUs it's basically a lie. A 10 Core M4 will score around a 14TH gen mobile i5. It will consume much less power but the argument is on performance, so that's beside the point.

And if we are talking about a passively cooled M4 (MacBook Air basically) it will quite heavily throttle relatively quickly, you lose at the very least 30%.

So, let's not misrepresent things, Apple CPUs are very power efficient but they are not magic, if you hit them hard, they still need good cooling. Plenty of people have had the experience with their M4 Max, discovering that actually, if they did use the laptop as a workstation, it will generate a good amount of fan noise, there is no other way around.

Apple stuff is good because most people actually have bursty workload (especially graphic design, video editing and some audio stuff) but if you hammer it for hours on end, it's not that good and the power efficiency point becomes a bit moot.

bob1029 6 days ago|||

I've got a 5950x that I can reliably crater with a very specific .NET 8 console app when it would otherwise be stable 24/7/365, even under some pretty crazy workloads like Unity.

I think a lot of it boils down to load profile and power delivery. My 2500VA double conversion UPS seems to have difficulty keeping up with the volatility in load when running that console app. I can tell because its fans ramp up and my lights on the same circuit begin to flicker very perceptibly. It also creates audible PWM noise in the PC which is crazy to me because up til recently I've only ever heard that from a heavily loaded GPU.

heelix 6 days ago|||

I wonder if cooling/power is really the key here. I've got a 5950x that ended up getting the water loop I'd intended for my next threadripper - only to find they were not selling the blasted things to anyone but a few companies. With the cooling sized for almost twice what the 5950x could put out, it has been a very stable machine for some crazy workloads. That old dog will likely keep the setup when a zen 5 TR gets swapped in.

For a long time, my Achille's heel was my Bride's vacuum. Her Dyson pulled enough amps that the UPS would start singing and trigger the auto shutdown sequence for the half rack. Took way too long to figure out as I was usually not around when she did it.

esseph 5 days ago|||

I have a 5700X with an AIO water cooler and it runs 65C under load. Never seems to crash. Been like this for years.

486sx33 5 days ago|||

My 5950 didn’t like liquid cooling and lives very well with air cooling :)

neRok 5 days ago||||

> I think a lot of it boils down to load profile and power delivery

You said the right words but with the wrong meaning! On Gigabyte mobo you want to increase the "CPU Vcore Loadline Calibration" and the "PWM Phase Control" settings, [see screenshot here](https://forum.level1techs.com/t/ddr4-ram-load-line-calibrati...).

When I first got my Ryzen 3900X cpu and X570 mobo in 2019, I had many issues for a long time (freezes at idle, not waking from sleep, bios loops, etc). Eventually I found that bumping up those settings to ~High (maybe even Extreme) was what was required, and things worked for 2 years or so until I got a 5950X on clearance last year.

I slotted that in to the same mobo and it worked fine, but when I was looking at HWMon etc, I noticed some strange things with the power/voltage. After some mucking about and theorising with ChatGPT (it's way quicker than googling for uncommon problems), it became apparent that the ~High LLC/power settings I was still using were no good. ChatGPT explained that my 3900X was probably a bit "crude" in relative quality, and so it needed the "stronger" power settings to keep itself in order. Then when I've swapped to 5950X, it happens to be more "refined" and thus doesn't need to be "manhandled" — and in fact, didn't like being manhandled at all!

shrubble 5 days ago||||

If you have a double conversion UPS that is complaining about less than 100W deviation, I would recommend you check the UPS for a component that is out of spec or on the way to failure.

bob1029 5 days ago||

The concern isn't the average rated TDP. It's the high Di/dt (change in current over time) transients of certain workload profiles cascading through the various layers of switch mode power supplies. Every layer of power delivery has some reactivity to it. I'd agree this would be no problem if all our power supplies were purely linear (and massively inefficient).

bell-cot 6 days ago|||

I'm sure there are spec's for how fast a PS should be able to ramp up in response to spikes in demand, how a motherboard should handle sudden load changes, etc.

But if your UPS (or just the electrical outlet you're plugged into) can't cope - dunno if I'd describe that as cratering your CPU.

encom 6 days ago|||

>M4 is faster than a lot of desktop CPUs

Yea, but unfortunately it comes attached to a Mac.

An issue I've encountered often with motherboards, is that they have brain damaged default settings, that run CPU's out of spec. You really have to go through it all with a fine toothed comb and make sure everything is set to conservative stock manufacturer recommended settings. And my stupid MSI board resets everything (every single BIOS setting) to MSI defaults when you upgrade its BIOS.

homebrewer 6 days ago|||

Also be careful with overclocking, because the usual advice of "just running EXPO/XMP" often results in motherboards setting voltages on very sensitive components to more than 30% over their stock values, and this is somehow considered normal.

It looks completely bonkers to me. I overclocked my system to ~95% of what it is able to do with almost default voltages, using bumps of 1-3% over stock, which (AFAIK) is within acceptable tolerances, but it requires hours and hours of tinkering and stability testing.

Most users just set automatic overclocking, have their motherboards push voltages to insane levels, and then act surprised when their CPUs start bugging out within a couple of years.

Shocking!

danieldk 6 days ago|||

Unfortunately, some separately purchasable hardware components seem to be optimized completely for gamers these days (overclocking mainboards, RGB on GPUs, etc.).

I'd rather run everything at 90% and get very big power savings and still have pretty stellar performance. I do this with my ThinkStation with Core Ultra 265K now - I set the P-State maximum performance percentage to 90%. Under load it runs almost 20 degrees Celsius cooler. Single core is 8% slower, multicore 4.9%. Well worth the trade-off for me.

(Yes, I know that there are exceptions.)

hedora 5 days ago|||

I’ve had multiple systems that crash or corrupt data when underclocked, so running at 90% might not be what you want.

You can always play with the CPU governor / disable high power states. That should be well-tested.

anonymars 5 days ago|||

It sounds like you are conflating undervolting for underclocking. Undervolting runs it out of spec, while underclocking simply runs it slower

danieldk 5 days ago|||

Setting a P-State max percentage is completely reliable/stable, it will clock lower on average to hit the lower performance target. It’s kinda similar to setting a powersave governor, but more granular.

I think you are confusing with undervolting.

hedora 4 days ago||

No. For giggles, set your DRAM frequency (MT/sec) to something crazy low, then try using the machine for a week.

mschuster91 6 days ago|||

> Unfortunately, some separately purchasable hardware components seem to be optimized completely for gamers these days

It turned out during the shitcoin craze and then AI craze that hardcore gamers, aka boomers with a lot of time and retirement money on their hands and early millennials working in big tech building giant-ass man caves, are a sizeable demographic with very deep pockets.

The wide masses however, they gotta live with the scraps that remain after the AI bros and hardcore gamers have had their pick.

danieldk 6 days ago||||

Also relevant in this context:

https://www.pugetsystems.com/blog/2024/08/02/puget-systems-p...

to;dr: they heavily customize BIOS settings, since many BIOSes run CPUs out-of-spec by default. With these customizations there was not much of a difference in failure rate between AMD and Intel at that point in time (even when including Intel 13th and 14th gen).

ahartmetz 6 days ago||||

Since you mention EXPO/XMP, which are about RAM overclocking: RAM has the least trouble with overvoltage. Raising some of the various CPU voltages is a problem, which RAM overclocking may also do.

eptcyka 6 days ago||

The heat is starting to become an issue for DDR5 with higher voltage.

electroglyph 6 days ago|||

yah, the default overclocking stuff is pretty aggressive these days

danieldk 6 days ago||||

Yea, but unfortunately it comes attached to a Mac.

Yeah. If Asahi worked on newer Macs and Apple Silicon Macs supported eGPU (yes I know, big ifs), the choice would be simple. I had NixOS on my Mac Studio M1 Ultra for a while and it was pretty glorious.

claudex 6 days ago||||

>And my stupid MSI board resets everything (every single BIOS setting) to MSI defaults when you upgrade its BIOS.

I had the same issue with my MSI board, next one won't be a MSI.

techpression 6 days ago||

My ASUS and Gigabyte did the same too. I think vendors are being lazy and don’t want to write migration code

izacus 6 days ago||

Did what exactly? All the ASUS and Gigabytes I've seen had PBO (which I guess you're talking about) disabled by default.

techpression 6 days ago||

Reset all the settings with BIOS updates

esseph 5 days ago||||

I don't think I've ever had a BIOS that didn't reset things to default after a firmware upgrade.

philistine 6 days ago|||

I'd bet you don't care that it's attached to a Mac. I bet you don't want to switch OS. Which is understandable. In a couple of years, when Microsoft finally offers Windows as an ARM purchase, Linux is finally full-fledged done implementing support, and Apple resuscitates Boot Camp, I think a lot of people like you will look at Macs like the Mac Mini differently.

timmytokyo 6 days ago||

Just get used to the extortionate prices on things like memory and storage. Who wouldn't want to pay $200 to go from a 16GB to a 24GB configuration? Who wouldn't want to pay $600 more for the 2TB storage option? And forget about upgrading after you buy the computer.

etempleton 6 days ago|||

My experience is similar. Modern enthusiast CPUs and hardware compatibility is going backwards. I have a 5900x that randomly crashes on idle, but not under load. My 285K has so far been rock solid and generally feels snappier. I feel like both Intel and AMD are really trying to push the envelope to look good on benchmarks and this is the end result.

naasking 5 days ago|||

Crash on idle, interesting. Must be some timing issue related to down clocking, or maybe a voltage issue related to shutting off a core.

InMice 5 days ago|||

Have you tried using powerprofilesctl to change the power profile to 'performance' instead of 'balanced' or 'power saver'? I think this would prevent the lowest idle states at least. Just a guesss, never had this problem myself.

My modern CPU problems are DDR5 and the pre-boot timing thing never completing. So a build of a 9700x that I did that WAS supposed to be located remotely from me has to sit in my office and have its hand held thru every reboot cuz you never know quite know when its doing to decide it needs to retime and randomly never come back. Requires pulling the plug from the back and waiting a few minutes then powering back, then waiting 30 minutes for 64gb of ddr5 to do its timing thing.

Dennip 5 days ago|||

I have a 5950X system that will just randomly shut down, I've RMA'd the CPU, tried swapping the RAM, GPU, PSU and the motherboard in different combinations. I cannot track down a specific issue and it just won't be stable. I've given up and decided to discard the PC of theseus and build a new one -_-.

66fm472tjy7 5 days ago|||

Occasionally occurring issues are so annoying. I lived with these issues for years before becoming able to reliably reproduce them by accident and thus making a good guess on the cause:

My system would randomly freeze for ~5 seconds, usually while gaming and having a video in the browser running a the same time. Then, it would reliably happen in Titanfall 2 and I noticed there were always AHCI errors in the Windows logs at the same time so I switched to an NVMe drive.

The system would also shut down occasionally (~ once every few hours) in certain games only. Then, I managed to reproduce it 100% of the time by casting lightning magic in Oblivion Remastered. I had to switch out my PSU, the old one probably couldn't handle some transient load spike, even though it was a Seasonic Prime Ultra Titanium.

api 6 days ago|||

The M series chips aren’t the absolute fastest in raw speed, though they are toward the top of the list, but they destroy x86 lineage chips on performance per watt.

I have an M1 Max, a few revisions old, and the only thing I can do to spin up the fans is run local LLMs or play Minecraft with the kids on a giant ultra wide monitor at full frame rate. Giant Rust builds and similar will barely turn on the fan. Normal stuff like browsing and using apps doesn’t even get it warm.

I’ve read people here and there arguing that instruction sets don’t matter, that it’s all the same past the decoder anyway. I don’t buy it. The superior energy efficiency of ARM chips is so obvious I find it impossible to believe it’s not due to the ISA since not much else is that different and now they’re often made on the same TSMC fabs.

AnthonyMouse 5 days ago|||

> they destroy x86 lineage chips on performance per watt.

This isn't really true. On the same process node the difference is negligible. It's just that Intel's process in particular has efficiency problems and Apple buys out the early capacity for TSMC's new process nodes. Then when you compare e.g. the first chips to use 3nm to existing chips which are still using 4 or 5nm, the newer process has somewhat better efficiency. But even then the difference isn't very large.

And the processors made on the same node often make for inconvenient comparisons, e.g. the M4 uses TSMC N3E but the only x86 processor currently using that is Epyc. And then you're obviously not comparing like with like, but as a ballpark estimate, the M4 Pro has a TDP of ~3.2W/core whereas Epyc 9845 is ~2.4W/core. The M4 can mitigate this by having somewhat better performance per core but this is nothing like an unambiguous victory for Apple; it's basically a tie.

> I have an M1 Max, a few revisions old, and the only thing I can do to spin up the fans is run local LLMs or play Minecraft with the kids on a giant ultra wide monitor at full frame rate. Giant Rust builds and similar will barely turn on the fan. Normal stuff like browsing and using apps doesn’t even get it warm.

One of the reasons for this is that Apple has always been willing to run components right up to their temperature spec before turning on the fan. And then even though that's technically in spec, it's right on the line, which is bad for longevity.

In consumer devices it usually doesn't matter because most people rarely put any real load on their machines anyway, but it's something to be aware of if you actually intend to, e.g. there used to be a Mac Mini Server product and then people would put significant load on them and then they would eat the internal hard drives because the fan controller was tuned for acoustics over operating temperature.

ac29 6 days ago||||

This anecdote perfectly describes my few generation old Intel laptop too. The fans turn on maybe once a month. I dont think its as power efficient as an M-series Apple CPU, but total system power is definitely under 10W during normal usage (including screen, wifi, etc).

adithyassekhar 6 days ago||||

I'd rather it spin the fan all the time to improve longevity but that's just me.

One of the many reasons why snapdragon windows laptops failed was both amd and Intel (lunar lake) was able to reach the claimed efficiency of those chips. I still think modern x86 can match arm ones in efficiency if someone bothered to tune the os and scheduler for most common activities. M series was based on their phone chips which were designed from the ground up to run on a battery all these years. AMD/Intel just don't see an incentive to do that nor do Microsoft.

hedora 5 days ago||

I have a modern AMD system on chip mini desktop that runs Linux (devuan), and have had M1/2/3 laptops. They all seem pretty comparable on power usage, especially at idle. Games and LLM load warm up the desktop and kill the laptop battery. Other than that, power consumption seems fine.

There is one exception: If I run an idle Windows 11 ARM edition VM on the mac, then the fans run pretty much all the time. Idle Linux ARM VMs don’t cause this issue on the mac.

I’ve never used windows 11 for x86. It’s probably also an energy hog.

dagmx 5 days ago|||

Afaik they are the fastest cores in raw speed. They’re just not available in very high core offerings so eventually fall behind when parallelism wins.

johnisgood 6 days ago|||

This does not fill me with much hope. What am I even ought to buy at this point then, I wonder. I have a ~13 years old Intel CPU which lacks AVX2 (and I need it by now) and I thought of buying a new desktop (items separately, of course), but that is crazy to me that it freezes because of the CPU going idle. It was never an issue in my case. I guess I can only hope it is not going to be a problem once I completed building my PC. :|

On what metric am I ought to buy a CPU these days? Should I care about reviews? I am fine with a middle-end CPU, for what it is worth, and I thought of AMD Ryzen 7 5700 or AMD Ryzen 5 5600GT or anything with a similar price tag. They might even be lower-end by now?

hhh 6 days ago|||

Just buy an AMD CPU. One person’s experience isn’t the world. Nobody in my circle has had an issue with any chip from AMD in recent time (10 years).

Intel is just bad at the moment and not even worth touching.

danieldk 6 days ago|||

I agree that Intel is bad at the moment (especially with the 13th and 14th gen self-destruct issues). But unfortunately I also know plenty of people with issues with AMD systems.

And it's no bad power quality on mains as someone suggested (it's excellent here) or 'in the air' (whatever that means) if it happens very quickly after buying.

I would guess that a lot of it comes from bad firmware/mainboards, etc. like the recent issue with ASRock mainboards destroying Ryzen 9000-series GPUs: https://www.techspot.com/news/108120-asrock-confirms-ryzen-9... Anyone who uses Linux and has dealt with bad ACPI bugs, etc. knows that a lot of these mainboards probably have crap firmware.

I should also say that I had a Ryzen 3700X and 5900X many years back and two laptops with a Ryzen CPU and they have been awesome.

J_Shelby_J 5 days ago||

All of my friends who are on AMD have had issues over the past three years.

My belief is that it is in the memory controllers and the XMP profiles provided with RAM. It’s very easy for the XMP profiles to be overly optimistic or for the RAM to degrade overtime and fall out of spec.

Meanwhile, my intel systems are solid. Even the 9900k hand me down I have to my partner. There is an advantage to using very old tech. And they’re not even slower for gaming: everything is single core bottlenecked anyways. Only in the past year or so that AMD had surpassed in single core performance, but we are talking single digit percentage differences for gaming.

I’m glad AMD has risen, but the dialogue about AMD vs intel in the consumer segment is tainted by people who can’t disconnect their stock ownership from reality.

tester756 6 days ago||||

This is funny because recently my AMD Ryzen 7 5700X3D died and I've decided that my next CPU will be Intel

https://news.ycombinator.com/item?id=45043269

_zoltan_ 6 days ago|||

that's a pretty bad decision $/perf wise I'd wager. AMD decidedly owns the desktop space, and deserves so.

tester756 5 days ago||

wdym?

https://www.cpubenchmark.net/cpu_value_alltime.html

CPUs like Intel Core Ultra 7 265K are pretty close to top Ryzens

Panzer04 5 days ago||

Intel CPUs are decidedly better value when multicore performance is a concern. At the top end they trade blows.

If your workload is pointer-chasing intel's new CPUs aren't great though, and the X3D chips are possibly a good pick (if the workload fits in cache) which is why they get a lot of hype from reviewers who benchmark games and judge the score 90% based on that performance.

homebrewer 6 days ago|||

This Intel?

https://youtu.be/OVdmK1UGzGs

https://youtu.be/oAE4NWoyMZk

tester756 5 days ago||

That's been year ago, my AMD CPU died very recently

hedora 5 days ago||||

I went further and got an AMD system on chip machine with an integrated gpu. It’s fine for gaming and borderline for LLM inference (I should have put 64GB in instead of 32GB).

The only issues are with an intel Bluetooth chipset, and bios auto detection bugs. Under Linux, the hardware is bug for bug compatible with Windows, and I’m down to zero known issues after doing a bit of hardware debugging.

johnisgood 6 days ago|||

That is what I thought, thanks.

ahofmann 6 days ago||||

I wouldn't be so hopeless. Intel and AMD CPUs are used in millions of builds and most of them just work.

dahcryn 5 days ago|||

Indeed. I feel so weird reading this discussion section.

My home server is on a 5600G. I turned it on, installed home assistant and jellyfin etc... , and since it has not been off. It's been chugging along completely unattended, no worries.

Yes, it's in a basement where temperature is never above 21C, and it's almost never pushed to 100%, and certainly never for extended periods of time.

But it's the stock cooler, cheap motherboard, cheap RAM and cheap SSD (with expensive NAS grade mechanical hard drives).

danieldk 6 days ago|||

However, the vast majority of PCs out there are not hobbyist builds but Dell/Lenovo/HP/etc. [1] with far fewer possible configurations (and much more testing as a byproduct). I am not saying these machines never have issues, but a high failure rate would not be acceptable to their business customers.

[1] Well, most non-servers are probably laptops today, but the same reasoning applies.

giveita 6 days ago||

If you value your time a dell laptop with extended warranty and accidental damage where they replace shit and send people out to fix shit is well worth it. It costs but you can be a dumb user and call "IT" when you need a fix and thats a nice feeling IMO!

homebrewer 6 days ago||||

It's either bad luck, bad power quality from the mains, or something in the air in that particular area. I know plenty of people running AM5 builds, have done so myself for the last couple of years, and there were no problems with any of them apart from the usual amdgpu bugs in latest kernels (which are "normal" since I'm running mainline kernels — it's easy to solve by just sticking to lts, and it has seemingly improved anyway since 6.15).

johnisgood 4 days ago||

I have a really specific Xorg config with my own kernel boot line in GRUB for amdgpu, it seems to work for me. I have not experienced any bugs with amdgpu so far. The GPU and HDD / SSD would stay anyway, I would only have to get everything else.

scns 6 days ago||||

> I thought of AMD Ryzen 7 5700

Definetly not that one if you plan to pair with a dedicated GPU! The 5700X has twice the L3 cache. All Ryzen 5000 with a GPU have only 16MB, 5700 has the GPU deactivated.

johnisgood 4 days ago||

I have a lower-end Radeon GPU (Navi 24 [Radeon RX 6400]). Which CPU would you suggest with it? I only ever want to use the GPU though, not the CPU's integrated one. I kind of want to get a motherboard that is compatible with the latest AM socket, which is AM5, right? So if I want a CPU with AM5, what would you suggest for the CPU?

But see, this is why it is so difficult. I would have never guessed. I would have to research this A LOT, which I am fine with, but you know.

PartiallyTyped 6 days ago|||

3 of my last 4 machines have been AMD x NVDA and I have been very happy. The intel x NVDA machine has been my least stable one.

protocolture 5 days ago|||

>I made a machine with a Ryzen 9900X a while back and it had the issue that it would freeze when idling

I also have this issue.

c0balt 5 days ago|||

If you are on Linux, there are long time known problems with low power cpu states. These states can be entered by your CPU when under low/no load.

A common approach is to go into the BIOS/UEFI settings and check that c6 is disabled. To verify and/or temporarily turn c6 off, see https://github.com/r4m0n/ZenStates-Linux

hedora 5 days ago||

It’s also worth checking all the autodetected stuff that can be overclocked, like ram speed. That stuff can be wrong, and then you get crazy but similar bugs in linux and windows.

InMice 5 days ago|||

You could try using powerprofilesctl to change the mode from 'balanced' or 'power saver' to 'performance' since i think this may prevent the cpu from ever entering the throttled down low idle states that your freezing happens in. they are controlled with powerprofilesctl. You may also be able to add some flugs to grub config file. assuming you are using linux i guess.

protocolture 4 days ago||

I have set my timings manually in the bios, and disabled most advanced CPU features for stability.

If I enable virtualisation, the issue can be replicated within 15 minutes of boot.

But with basically half the CPU set to do nothing, and all features disabled its once a week max.

Which sucks because I basically live in WSL.

sunmag 5 days ago|||

Have had three systems (two 5800x, one 3600x) that reboots/freezes due to WHEA errors. Started after about 3years problem free. One of the 5800xs so frequently it was trashed.

cptskippy 5 days ago||

I wonder if it's related to the main board... Are you running X or B series chipsets? I've found that less is more when it comes to stability. Vendors always add features to justify the $200-300 price of X series.

I have always run B series because I've never needed the overclocking or additional peripherals. In my server builds I usually disable peripherals in the UEFI like Bluetooth and audio as well.

kristopolous 5 days ago|||

M3 ultra is the more capable chip by quite a bit. For instance: 80 GPU cores versus 10 in the m4.

Twice the memory bandwidth, twice the CPU core count... It's really wacky how they've decided to name things

bee_rider 5 days ago||

Is there some tricky edge case here? I thought the “3” and “4” just denoted generations. The Ultra chips are like Apple’s equivalent of a workstation chip, they are always bigger right? It is like comparing the previous generation Xeon to a current gen i5.

dwood_dev 5 days ago||

Apple has been iterating IPC as well as increasing core count.

The Ultra is a pair of Max chips. While the core counts didn't increase from M3 to M4 Max, overall performance is in the neighborhood of 5-25% better. Which still puts the M3 Ultra as Apple's top end chip, and the M5 Max might not dethrone it either.

The uplift in IPC and core counts means that my M1 Max MBP has a similar amount of CPU performance as my M3 iPad Air.

bee_rider 5 days ago||

Yeah, I thought that was it. I was confused because they describe it as “wacky,” haha. The Ultra chips are the result of scaling up a given generation.

Of course, each generation has some single-core improvements and eventually that could catch up, but it can take a while to catch up to… twice as much silicon.

IAmGraydon 5 days ago|||

9900x here and zero crashes since I built it 9 months ago. A lot of the stability comes down to choosing the right RAM with the right timing for Ryzen CPUs.

jonbiggums22 5 days ago||

I haven't moved on from AM4 yet but the way XMP is advertised you'd think it was guaranteed instead of overclocking that technically voids your warranty.

ozgrakkurt 5 days ago||

Buying one generation old CPUs seems like the good option now.

It is cheaper and more stable. Performance difference doesn’t matter that much too

baobabKoodaa 6 days ago||

Why is the author showing a chart of room temperatures? CPU temperature is what matters here. Expecting a CPU to be stable at 100C is just asking for problems. Issue probably could have been avoided by making improvements to case airflow.

Jolter 6 days ago||

I would expect the CPU to start throttling at high temperatures in order to avoid damage. Allegedly, it never did, and instead died. Do you think that’s acceptable in 2025?

ACCount37 6 days ago|||

Thermal throttling originated as a safety feature. The early implementations were basically a "thermal fuse" in function, and cut all power to the system to prevent catastrophic hardware damage. Only later did the more sophisticated versions that do things like "cut down clocks to prevent temps from rising further" appear.

On desktop PCs, thermal throttling is often set up as "just a safety feature" to this very day. Which means: the system does NOT expect to stay at the edge of its thermal limit. I would not trust thermal throttling with keeping a system running safely at a continuous 100C on die.

100C is already a "danger zone", with elevated error rates and faster circuit degradation - and there are only this many thermal sensors a die has. Some under-sensored hotspots may be running a few degrees higher than that. Which may not be enough to kill the die outright - but more than enough to put those hotspots into a "fuck around" zone of increased instability and massively accelerated degradation.

If you're relying on thermal throttling to balance your system's performance, as laptops and smartphones often do, then you seriously need to dial in better temperature thresholds. 100C is way too spicy.

baobabKoodaa 6 days ago||||

What does room temperature have to do with any of this? Yes, you can lower your CPU temperature by lowering your room temperature. But you can also lower your CPU temperature by a variety of other means; particularly by improving case airflow. CPU temperature is the interesting metric here, not room temperature.

FeepingCreature 6 days ago|||

No but it's also important to realize that this CPU was running at an insane temperature that should never happen in normal operation. I have a laptop with an undersized fan and if I max out all my cores with full load, I barely cross 80. 100 is mental. It doesn't matter if the manufacturer set the peak temperature wrong, a computer whose cpu reaches 100 degrees celsius is simply built incorrectly.

If nothing else, it very clearly indicates that you can boost your performance significantly by sorting out your cooling because your cpu will be stuck permanently emergency throttling.

izacus 6 days ago||

I somehow doubt that, are you looking at the same temperature? I haven't seen a laptop that would have thermal stop under 95 for a long time and any gaming laptop will run at 95 under load for package temps.

FeepingCreature 6 days ago||

i7 8550u. Google confirms it stabilizes at 80-85C.

That said, there's a difference between a laptop cpu turbo boosting to 90 for a few minutes and a desktop cpu, which are usually cooler anyway, running at 100 sustained for three hours.

hedora 5 days ago||

There’s something very odd with those temps. It’s hitting 100C at > 95% iowait. The CPU core should be idle when that happens.

Maybe the pci bus is eating power, or maybe it’s the drives?

formerly_proven 6 days ago|||

Strange, laptop CPUs and their thermal solutions are designed in concert to stay at Tjmax when under sustained load and throttle appropriately to maintain maximum temperature (~ power ~ performance).

ACCount37 6 days ago||

And those mobile devices have much more conservative limits, and much more aggressive throttling behavior.

Smartphones have no active cooling and are fully dependent on thermal throttling for survival, but they can start throttling at as low as 50C easily. Laptops with underspecced cooling systems generally try their best to avoid crossing into triple digits - a lot of them max out at 85C to 95C, even under extreme loads.

dezgeg 6 days ago||

For handhelds the temperature of the device's case is one factor as well when deciding the thermal limits (so you don't burn the user's hands) - less of a problem on laptops.

userbinator 5 days ago|||

Expecting a CPU to be stable at 100C is just asking for problems.

I had an 8th-gen i7 sitting at the thermal limit (~100C) in a laptop for half a decade 24/7 with no problem. As sibling comments have noted, modern CPUs are designed to run "flat-out against the governor".

Voltage-dependent electromigration is the biggest problem and what lead to the failures in Intel CPUs not long ago, perhaps ironically caused by cooling that was "too good" --- the CPU finds that there's still plenty of thermal headroom, so it boosts frequency and accompanying voltage to reach the limit, and went too far with the voltage. If it had hit the thermal limit it would've backed off on the voltage and frequency.

swinglock 6 days ago|||

The text clearly explains all of this.

baobabKoodaa 6 days ago||

No it does not. Which part of the text do you feel explains this?

chmod775 5 days ago||

First off, there's a chart for CPU temperature at the very top and they do talk about it:

> I also double-checked if the CPU temperature of about 100 degrees celsius is too high, but no: [..] Intel specifies a maximum of 110 degrees. So, running at “only” 100 degrees for a few hours should be fine.

Secondly, the article reads:

> Tom’s Hardware recently reported that “Intel Raptor Lake crashes are increasing with rising temperatures in record European heat wave”, which prompted some folks to blame Europe’s general lack of Air Conditioning.

> But in this case, I actually did air-condition the room about half-way through the job (at about 16:00), when I noticed the room was getting hot. Here’s the temperature graph:

> [GRAPH]

> I would say that 25 to 28 degrees celsius are normal temperatures for computers.

So apparently a Tom's Hardware article connected a recent heat wave with crashing computers containing Intel CPUs. They brought that up to rule it out by presenting a graph showing reasonable room temperatures.

I hope this helps.

baobabKoodaa 5 days ago||

Hmmh. On a new reading, you're right, the Tom's Hardware reference does justify it. Though I still wouldn't have discussed room temperature at all as it doesn't bring any useful extra information after already monitoring for the CPU temp.

perching_aix 6 days ago||

> Expecting a CPU to be stable at 100C is just asking for problems.

No. High performance gaming laptops will routinely do this for hours on end for years.

If it can't take it, it shouldn't allow it.

bell-cot 6 days ago||

I've not looked at the specifics here - but "stable at X degrees, Y% duty cycle, for Z" years is just another engineering spec.

Intel's basic 285K spec's - https://www.intel.com/content/www/us/en/products/sku/241060/... - say "Max Operating Temperature 105 °C".

So, yes - running the CPU that close to its maximum is really not asking for stability, nor longevity.

No reason to doubt your assertion about gaming laptops - but chip binning is a thing, and the manufacturers of those laptops have every reason to pay Intel a premium for CPU's which test to better values of X, Y, and Z.

vid 5 days ago||

For the past 30 odd years I've hand picked and built a desktop PC (which also acts as a home server) pretty much every year, selling the "old" one each time. I really enjoy it as a hobby and for the benefits of understanding and optimizing a system at the parts level. Even though there is a lot of nonsense created by all the choices and marketing, I really prefer the parts approach, and am happy with Linux so a Mac isn't very appealing. A perfectly designed PC can do tasks very well with optimized parts and at a much better price.

But I just can't bring myself to upgrade this year. I dabble in local AI, where it's clear fast memory is important, but the PC approach is just not keeping up without going to "workstation" or "server" parts that cost too much.

There are glimmers of hope with MR-DIMMs CU-DIMM, and other approaches, but really boards and CPUs need to support more memory channels. Intel has a small advantage over AMD, but it's nothing compared to the memory speed of a Mac Pro or higher. "Strix Halo" offers some hope with four memory channel support, but it's meant for notebooks so isn't really expandable (which would enable à la carte hybrid AI; fast GPUs with reasonably fast shared system RAM).

I wish I could fast forward to a better time, but it's likely fully integrated systems will dominate if the size and relatively weak performance for some tasks makes the parts industry pointless. It is a glaring deficiency in the x86 parts concept and will result in PC parts being more and more niche, exotic and inaccessible.

827a 5 days ago||

To be honest, much of the sense that Apple is ridiculously far ahead when it comes to unified memory SoC architectures comes from people who aren't actually invested in any kind of non-Nvidia local AI development to the degree where you'd actually notice a difference (either the AMD AI Max platform or Apple Silicon Ultra). Because if you were, you'd realize that the grass isn't greener on these unified memory platforms, and no one in the industry has a product that can compete with Nvidia on any vertical except "things for Jeff Geerling to make a video about".

vid 5 days ago||

People are running GPT OSS 120b at 46 tokens per second on Strix Halo systems, which is quite usable and a fraction of the cost of a 128GB NVidia or Apple system. Apple's GPU isn't that strong, so real competition to Apple and NVidia can be created.

827a 5 days ago||

Exactly yeah, my point is that there's a lot more to running these models than just the raw memory bandwidth and GPU-available memory size, and the difference between a $6000 M4 Ultra Mac Studio and a $2000 AI Max 395+ isn't actually as big as the raw numbers would suggest.

On the flip-side, though: Running GPT-OSS-120b locally is "cool", but have people found useful, productivity enhancing use-cases which justify doing this over just loading $2000 into your OpenAI API account? That, I'm less sure of.

I think we'll get to the point where running a local-first AI stack is obviously an awesome choice; I just don't think the hardware or models are there yet. Next-year's Medusa Halo, combined with another year of open source model improvements might be the inflection point.

vid 5 days ago||

I use local AI fairly often for innocuous queries (health, history, etc) I don't want to feed the spy machines plus I like the hands on aspect, I would use it more if I had more time and while I hear the 120b is pretty good (I mostly use qwen 30b), I would use it a lot more if I could run some of the really great models. Hopefully Medusa Halo will be all that.

gsibble 5 days ago|||

I think most parts are geared towards gaming these days. When I've needed a server, I went for multi-CPU setups with older, cheaper CPUs.

That being said, for AI, HEDT is the obvious answer. Back in the day, it was much more affordable with my 9980XE only costing $2,000.

I just built a Threadripper 9980 system with 192GB of RAM and good lord it was expensive. I will actually benefit from it though and the company paid for it.

That being said, there is a glaring gap between "consumer" hardware meant for gaming and "workstation" hardware meant for real performance.

Have you looked into a 9960 Threadripper build? The CPU isn't TOO expensive, although the memory will be. But you'll get a significantly faster and better machine than something like a 9950X.

I also think besides the new Threadripper chips, there isn't much new out this year anyways to warrant upgrading.

vid 5 days ago||

I have looked into the Threadripper, but just can't justify it. The tension between all the options and the cost, power usage, size (EATX) is too much, and I don't think such a system, especially with 2025 era DDR5 in the 6000mt range, will hold its value well. If I were directly earning money with it, sure, but as a hobby/augmentation to my work, I will wait out a generation or lose interest in the pursuit.

Competitors to NVidia really need to figure things out, even for gaming with AI being used more I think a high end APU would be compelling with fast shared memory.

stillsut 5 days ago|||

At a meta-level, I wonder if there's this un-talked about advantage of poaching ambitious talent out of an established incumbent to work a new product line in a new organization, in this case Apple Silicon disrupting Intel/AMD. And we've also seen SpaceX do this NASA/Boeing, and OpenAI do it to Google's ML departments.

It seems like large, unchallenged organizations like Intel (or NASA or Google) collect all the top talent out of school. But changing budgets, changing business objectives, frozen product strategies make it difficult for emerging talent to really work on next-generation technology (those projects have already been assigned to mid-career people who "paid their dues").

Then someone like Apple Silicon with M-chip or SpaceX with Falcon-9 comes along and poaches the people most likely to work "hardcore" (not optimizing for work/life balance) while also giving the new product a high degree of risk tolerance and autonomy. Within a few years, the smaller upstart organization has opened up in un-closeable performance gap with behemoth incumbent.

Has anyone written about this pattern (beyond Innovator's Dilemma)? Does anyone have other good examples of this?

vid 5 days ago||

I'm not sure it really takes that kind of breakthrough approach. Apple chips are more energy efficient, but x86 can be much faster on CPU or GPU tasks, and it's much more versatile. A main "bug and feature" issue is the PC industry relies on common denominator standards and components, whereas Apple has gone vertical with very limited core expansion. This is particularly important when it comes to memory speed, where the standards are developed and factories upgraded over years at huge cost.

I gather it's very difficult and expensive to make a board that supports more channels of RAM, so that seems worth targeting at the platform level. Eight channel RAM using common RAM DIMMs would transform PCs for many tasks, however for now gamers are a main force and they don't really care about memory speed.

stillsut 5 days ago||

Makes sense: M-chips, Falcon-9, GPT's are product subsets or the incumbent's traditional product capabilities.

natch 5 days ago||

Apple and unified memory seems great but losing CUDA seem like a big downside.

How do you sell your systems when their time comes?

vid 5 days ago||

I post them with good descriptions on local sites and Facebook marketplace (sigh) and wait for the right buyer. Obviously for less than what I paid, but top end parts can usually get a good price, I got a year of enjoyment out of it, and it's not going to landfill.

orthoxerox 5 days ago||

It's interesting how Intel has been surviving in smaller and smaller market niches these days:

  - cheap ULV chips like N100, N150, N300
  - ultrabook ULV chips (I hope Lunar Lake is not a fluke)
  - workstation chips that aren't too powerful (mainstream Core CPUs)
  - inexpensive GPUs (a surprising niche, but excruciatingly small)

AMD has been dominating them in all other submarkets.

Without a mainstream halo product Intel has been forced to compete on price, which is not something they can afford. They have to make a product that leapfrogs either AMD or Nvidia and successfully (and meaningfully) iterate on it. The last time they tried something like that was in 2021 with the launch of Alder Lake, but AMD overtook them with 3D V-Cache in 2022.

norman784 5 days ago|

AFAIK most (if not all) business laptops AKA Dell are intel based? Also I believe they are still big in the server market.

guardian5x 5 days ago|||

Dell has been very loyal to Intel all these years, but i guess that is under pressure as well. As more and more customers look for AMD CPUs nowadays. I guess the CPU doesn't matter much in standard office company laptops and price is more important.

noisem4ker 5 days ago|||

I'm not sure whether a "Dell Pro 16 Plus" is considered a "business laptop" (although I think so), but I'm using one right now and it has an AMD Ryzen AI 5 Pro CPU inside.

whyoh 6 days ago||

It's crazy how unreliable CPUs have become in the last 5 years or so, both AMD and Intel. And it seems they're all running at their limit from the factory, whereas 10-20 years ago they usually had ample headroom for overclocking.

stavros 6 days ago||

That's good, isn't it? I don't want the factory leaving performance on the table.

topspin 5 days ago|||

I do. I've been buying Intel for the same reason as the author: I build machines that don't have glitches and mysterious failures and driver issues and all the rest of the garbage one sees PC assemblers inflict on themselves. Make conservative choices and leave ample headroom and you get a solid machine with no problems.

I've never overclocked anything and I've never felt I've missed out in any way. I really can't imagine spending even one minute trying to squeeze 5% or whatnot tweaking voltages and dealing with plumbing and roaring fans. I want to use the machine, not hotrod it.

I would rather Intel et al. leave a few percent "on the table" and sell things that work, for years on end without failure and without a lot of care and feeding. Lately it looks like a crapshoot trying to identify components that don't kill themselves.

stavros 5 days ago||

So underclock your CPU.

dahauns 5 days ago|||

How about you "overclock" (overvolt, unlock TDP etc.)?

This is about sane, stable defaults. If you want the extra performance far beyond the CPUs sweet-spot it should be made explicit you're forfeiting the stability headrooms.

stavros 5 days ago||

Good thing that's not the point I was making, then!

dahauns 5 days ago||

Maybe not intentionally.

topspin 5 days ago|||

Because I'm not a CPU engineer, and neither are you. Neither of us can claim anything about fucking around with CPU clocks and voltages or anything else about any of this. If you want to screw around in BIOS settings and learn where all the sharp edges are and spend your time like this, enjoy. I've never done this nonsense and I never will.

stavros 5 days ago||

I know enough to tweak the "voltage" slider down a few numbers, and that's enough to get more stability. Otherwise, I vote with my wallet, and don't buy CPUs that break, which is why companies don't generally make CPUs that break.

dahauns 5 days ago||

>which is why companies don't generally make CPUs that break.

Well, that's the issue, isn't it? Both Intel and AMD (resp. their board partners) had issues in recent times stemming from the increasingly aggressive push to the limit for those last few %.

bell-cot 6 days ago|||

Depends on your priorities. That "performance on the table" might also be called "engineering safety factor for stability".

makeitdouble 6 days ago|||

TBF using more conservative energy profiles will bring stability and safety. To that effect in Windows the default profile effectively debuffs the CPU and most people will be fine that way.

therein 5 days ago||

So now you're saying just accept the fact that they come pushed past their limits, and the limits are misrepresented. Factory configuration runs them faster than they could in a stable fashion.

That sounds terrible.

stavros 6 days ago||||

Given that there used to be plenty of room to overclock the cores while still keeping them stable, I think it was more "performance on the table".

formerly_proven 6 days ago||

You could also get the idea that vendors sometimes make strange decisions which increase neither performance nor reliability.

For example, various brands of motherboards are / were known to basically blow up AMD CPUs when using AMP/XMP, with the root cause being that they jacked an uncore rail way up. Many people claimed they did this to improve stability, but overclockers now that that rail has a sweet spot for stability and they went way beyond it (so much so that the actual silicon failed and burned a hole in itself with some low-ish probability).

devnullbrain 5 days ago|||

Yep. Redundancy and headroom are antonyms of efficiency.

techpression 6 days ago|||

The 7800X3D is amazing here, runs extremely cool and stable, you can push it far above its defaults and it still won’t get to 80C even with air cooling. Mine was running between 60-70 under load with PBO set to high. Unfortunately it seems its successor is not that great :/

williamDafoe 6 days ago|||

The 7000 series of CPUs is NOT known for running cool, unlike the AMD 5000 series (which are basically server CPUs repurposed for desktop usage). In the 7000 series, AMD decided to just increase the power of each CPU and that's where most of the performance gains are coming from - but power consumption is 40-50% higher than with similar 5000-series CPUs.

scns 6 days ago|||

When you use EcoMode with them you only lose ~5% performance, but are still ~30% ahead of the corresponding 5000-series CPU. You can reduce PPT/TDP even further while still ahead.

https://www.computerbase.de/artikel/prozessoren/amd-ryzen-79...

techpression 6 days ago||||

I specifically singled out the 7800X3D though, it runs incredibly cool and at a very low power draw for the performance you get.

mldbk 6 days ago|||

> You know, I'm something of a CPU engineer myself :D

Actually almost everything what you wrote is not true, and commenter above already sent you some links.

7800X3D is the GOAT, very power efficient and cool.

Numerlor 5 days ago||

The only reason the 7800x3d is power efficient is because it simply can't use much power, and so it runs at a better spot of the efficiency curve. Most of the CPUs won't use more than ~88w without doing manual overclocking (not pbo). Compare that to e.g. a 7600x that's 2 cores fewer on the same architecture and will happily pull over 130w.

And even if could push it higher, they run very hot compared to other CPUs at the same power usage as a combination of AMD's very thick IHS, the compute chiplets being small/power dense and 7000 series X3D cache being on top of the compute chiplet unlike 9000 series that has it on the bottom.

The 9800x3d limited in the same way will be both mildly more power efficient from faster cores and run cooler because of the cache location. The only reason it's hotter is that it's allowed to use significantly more power, usually up to 150w stock, for which you'd have to remove the IHS on the 7800X3D if you didn't want to see magic smoke

hu3 5 days ago||||

Same for 9800X3D here, which is basically the same CPU. Watercooled. Silent. Stupidly fast.

k4rli 5 days ago|||

7900X same. System uptimes of 1month+ often and nearly always runs at 5.0Ghz. Never goes above 80c or so either.

mrheosuper 5 days ago||

we have unstable "code" generator, so unstable CPU would be natural.

augustl 5 days ago||

Happy 9950X user here. Super happy with it, everything is crazy fast. Not a gamer, according to internet and benchmarks the extra cost is only worth it for gaming workloads.

I use Arch, btw ;)

Aeolun 5 days ago|

With AMD I find it’s often quite reasonable to go for their fastest hardware simply because you can. A top of the line AMD PC is $2500, but a Intel/Nvidia one easily runs $5000, though I’ll admit that’s almost all GPU.

fmajid 6 days ago||

I generally prefer AMD Zen5 to Intel due to AVX512 not being gimped by crippled E-cores that really don't belong on a desktop system, SMT (hyperthreading) that actually works and using TSMC processes, but they've also had their issues recently:

https://www.theregister.com/2025/08/29/amd_ryzen_twice_fails...

Ekaros 6 days ago|

Seems like failure in choosing cooling solutions. These high-end chips have obscene cooling needs. My guess would be using something that was not designed for TDP in question.

Sufficient cooler, with sufficient airflow is always needed.

uniqueuid 6 days ago||

For what it's worth, I have an i9-13900K paired with the largest air cooler available at the time (a be quiet! Dark Rock 5 IIRC), and it's incapable of sufficiently cooling that CPU.

The 13900k draws more than 200W initially and thermal throttles after a minute at most, even in an air conditioned room.

I don't think that thermal problems should be pushed to end user to this degree.

michaelt 6 days ago|||

The "Dark Rock 5" marketing materials say it provides a 210 W TDP [1] and marketers seldom under-sell their products' capabilities.

So if your CPU is drawing "more than 200W" you're pretty much at the limits of your cooler.

[1] https://www.bequiet.com/en/cpucooler/5110

lmm 5 days ago||

Feels like CPU manufacturers should be at least slapping a big warning on if they're selling a CPU that draws more power than any available cooler can dissipate.

fishtacos 5 days ago|||

In hindsight, I would have gone for an AMD deskop replacement laptop instead of the Dell Intel-based gaming laptop that I purchased last year. The CPU is the best the Raptor Lake line has to offer in mobile format (i7-13900hx) but there is no conceivable way for the laptop, ast thick as it is, to cool it beyond very bursty workloads.

This affects the laptop with other issues, like severe thermal throttling both in CPU and GPU.

A utility like throttlestop allows me to place maximums on power usage so I don't hit the tjMax during regular use. That is around 65-70W for the CPU - which can burst to 200+W in its allowed "Performance" mode. Absolutely nuts.

anonymars 5 days ago|||

The idea is that it's for a limited time, after a period of lower-than-that cooling. In other words TDP is time-weighted.

SomeoneOnTheWeb 6 days ago|||

This means your system doesn't have enough airflow if it throttles this quickly.

But I agree this should not be a problem in the first place.

mrheosuper 5 days ago|||

CPU TDP means nothing now. a 65W tdp cpu can easily consume over 100w during boost.

ttyyzz 6 days ago|||

Agree. Also, use good thermal paste. 100 °C is not safe or sustainable long term. Unfortunately, I think the manufacturer's specifications regarding the maximum temperature are misleading. With proper cooling, however, you'll be well within that limit.

onli 6 days ago|||

No, those processors clock or shut down if too hot. In no circumstances should they fail because of insufficient cooling. Even without airflow etc.

williamDafoe 6 days ago||

A badly optimized CPU will take excessive amounts of power. The "failure in choosing cooling solutions" excuse is just the pot calling the kettle black.

More comments...