Top
Best
New

Posted by BrendanLong 9/3/2025

%CPU utilization is a lie(www.brendanlong.com)
437 points | 167 commentspage 3
tonymet 9/3/2025|
I like his empirical approach to get to the root significance of the cpu %-age indicator. Software engineers and data analysts take discrete "data" measurements and statistics for granted.

"data" / "stats" are only a report, and that report is often incorrect.

rollcat 9/3/2025||
I'm surprised nobody has mentioned OpenBSD yet.

They've been advocating against SMT for a long while, citing security risks and inconsistent performance gains. I don't know which HW/CPU bug in the long series of rowhammer, meltdown, spectre, etc prompted the action, but they've completely disabled SMT in the default installation at some point.

The core idea by itself is fine: keep the ALUs busy. Maybe security-wise, the present trade-off is acceptable, if you can instruct the scheduler to put threads from the same security domain on the same physical core. (How to tell when two threads are not a threat to each other is left up as an exercise.)

saagarjha 9/3/2025|
The security argument might make sense but OpenBSD is not really the place to take performance advice from
rollcat 9/3/2025|||
My original point stands, also per TFA - performance gains from SMT are questionable for certain workloads. Whether OpenBSD prioritises absolute performance is besides the point - they benchmark against their own goals, not someone else's achievements.
whizzter 9/3/2025|||
Do people even use or mention OpenBSD out of performance concerns? We all know they prioritize security.
gbin 9/3/2025||
Yeah and those tests don't even trigger some memory or cache contention ...
smallstepforman 9/3/2025||
Read kernel code to see how CPU utilisation is calculated. In essence, count scheduled threads to execute and divide by number of cores. Any latency (eg. wait for memory) is still calculated as busy core.
codedokode 9/3/2025||
A worse lie is memory usage reporting, I think in every major OS it is understated and misreported. In case with Linux, I wanted to know who is using memory, and tried to add PSS values for every process, I never got back the total memory usage. In case with Windows/Mac I judge by screenshot of their tools which show unrealistically small values.

As for the article, the slowdown can be also caused by increased use of shared resources like caches, TLBs, branch predictors.

biggusdickus69 9/3/2025|
The memory usage is interesting, where different kind of shared memory is obvious hard to visualize, just two values per process doesn’t say enough.

Most users actually wants a list of ”what can I kill to make the computer faster”, I.e. they want an oracle (no pun) that knows how fast the computer will be if different processes are killed.

HPsquared 9/3/2025||
GPU utilisation as reported in Task Manager also seems quite a big lie, it bears little relation to Watts / TDP.
aaa_2006 9/4/2025||
CPU utilization alone is misleading. Pair it with per core load average or runqueue length to see how threads are actually queuing. That view often reveals the real bottleneck, whether it is I/O, memory, or scheduling delays.
steventhedev 9/3/2025||
%cpu is misleading at best, and should largely be considered harmful.

System load is well defined, matches user expectations, and covers several edge cases (auditd going crazy, broken CPU timers, etc).

pama 9/3/2025||
Wait until you encounter GPU utilization. You could have two codes listing 100% utilization and have well over 100x performance difference from each other. The name of these metrics creates natural assumptions that are just wrong. Luckily it is relatively easy to estimate the FLOP/s throughput for most GPU codes and then simply compare to the theoretical peak performance of the hardware.
spindump8930 9/3/2025||
Don't forget that theoretical peak performance is (probably) half the performance listed on the nvidia datasheet because they used the "with sparsity" numbers! I've seen this bite folks who miss the * on the figure or aren't used to reading those spec sheets.
BrendanLong 9/3/2025|||
Yeah, the obvious thing with processors is to do something similar:

(1) Measure MIPS with perf (2) Compare that to max MIPS for your processor

Unfortunately, MIPS is too vague since the amount of work done depends on the instruction, and there's no good way to measure max MIPS for most processors. (╯°□°)╯︵ ┻━┻

saagarjha 9/3/2025||
If your workload is compute bound, of course. Sometimes you want to look at bandwidth instead.
pama 9/3/2025||
Of course. Lots of useful metrics exist to help tweak code performance without always needing to go into detailed profiler traces. GPU utilization is a particularly poor metric in helping much, except for making sure the code made it to the GPU somehow :-)
PathOfEclipse 9/3/2025|
I think it was always a mistake to pretend hyperthreading doubles your core count. I always assumed it was just due to laziness; the operating system treats a hyperthreaded core as two "virtual cores" and schedules as two cores, so then every other piece of tooling sees double the number of actual cores. There's no good reason I know of that a CPU utilization tool shouldn't use real cores when calculating percentages. But, maybe that's hard to do given how the OS implements hyperthreading.
fluoridation 9/3/2025|
>There's no good reason I know of that a CPU utilization tool shouldn't use real cores when calculating percentages

On AMD, threads may as well be cores. If you take a Ryzen and disable SMT, you're basically halving its parallelism, at least for some tasks. On Intel you're just turning off an extra 10-20%.

PathOfEclipse 9/3/2025||
Can you provide some links for this? A quick web search turns this up at near the top from 2024:

https://www.techpowerup.com/review/amd-ryzen-9-9700x-perform...

The benchmarks show a 10% drop in "application" performance when SMT is disabled, but an overall 1-3% increase in performance for games.

From a hardware perspective, I can't imagine how it could be physically possible to double performance by enabling SMT.

fluoridation 9/3/2025||
I don't. It's based off my own testing, not by disabling SMT, but by running either <core_count> or <thread_count> parallel threads. It was my own code, so it's possible code that uses SIMD more heavily will see a less-significant speed-up. It's also possible I just measured wrong; running Cargo on a directory with -j16 and -j32 takes 58 and 48 seconds respectively.

>From a hardware perspective, I can't imagine how it could be physically possible to double performance by enabling SMT.

It depends on which parts of the processor your code uses. SMT works by duplicating some but not all the components of each core, so a single core can work on multiple independent uops simultaneously. I don't know the specifics, but I can imagine ALU-type code (jumps, calls, movs, etc.) benefits more from SMT than very math-heavy code. That would explain why rustc saw a greater speedup than Cinebench, as compiler code is very twisty with not a lot of math.

More comments...