Top
Best
New

Posted by steeve 6 days ago

Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs(zml.ai)
77 points | 11 comments
rdyro 2 days ago|
Looks cool!

nvtop can actually support TPUs too via https://github.com/rdyro/libtpuinfo/ https://github.com/Syllo/nvtop/blob/76890233d759199f50ad3bdb...

serialx 1 day ago||
Look into all-smi https://github.com/lablup/all-smi It supports all GPUs thinkable including Apple Silicon and many AI accelerator cards.
mrflop 6 days ago||
Renaming fopen64 to intercept library calls feels like a brittle hack masquerading as "sandboxing." Why not just upstream this hardware support to nvtop instead of fragmenting the ecosystem?
steeve 6 days ago||
sadly, sandboxing is something that can't be upstreamed. this way, sandboxing is kept in zml instead of patching mesa.

as for nvtop, great program, but we missed a few features (such as sandboxing)

pstuart 2 days ago||
It looks cool and I was excited to get monitoring for the NPU on my Ryzen AI 395+, unfortunately it does not show. NPU support in linux really seems to be an afterthought.
steeve 2 days ago||
Weird, because we tried it. It doesn’t show anything?

We use the amdsmi to get metrics. I’ll investigate.

marwanet 2 days ago||
If this logic were pushed into nvtop, wouldn't the codebase become unmaintainable? Each vendor's interception method is going to be different.
nareyko 2 days ago||
[dead]
imcritic 1 day ago||
Is it capable of exposing metrics in Prometheus format?
steeve 1 day ago|
consider it done
synergy20 1 day ago||
would be nice to have cpu usage added so I have all in one?

currently I use btop which shows basic gpu load along with cpu, network, etc.

152334H 1 day ago|
"NPU" seems to refer to trainium only?