Taking on CUDA with ROCm: 'One Step After Another'

Posted by mindcrime 21 hours ago

Taking on CUDA with ROCm: 'One Step After Another'(www.eetimes.com)

248 points | 185 comments

lrvick 20 hours ago|

Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.

It has been a bit of a nightmare and had to package like 30+ deps and their heavily customized LLVM, but got the runtime to build this morning finally.

Things are looking bright for high security workloads on AMD hardware due to them working fully in the open however much of a mess it may be.

WhyNotHugo 17 hours ago||

I also attempted to package ROCM on musl. Specifically, packaging it for Alpine Linux.

It truly is a nightmare to build the whole thing. I got past the custom LLVM fork and a dozen other packages, but eventually decided it had been too much of a time sink.

I’m using llama.cpp with its vulkan support and it’s good enough for my uses. Vulkan so already there and just works. It’s probably on your host too, since so many other things rely on it anyway.

That said, I’d be curious to look at your build recipes. Maybe it can help power through the last bits of the Alpine port.

sigmoid10 8 hours ago|||

Interesting how Vulkan and ROCM are roughly the same age (~9 years), but one is incredibly more stable (and sometimes even more performant) for AI use cases as side-gig, while the other one is having AI as its primary raison d'être. Tells you a lot about the development teams behind them.

Hasslequest 7 hours ago||

[dead]

lrvick 16 hours ago||||

Keep an eye out for a stable rocm PR to stagex in the next week or so if all goes well.

seemaze 4 hours ago||||

I realize it does not address the OP security concerns, but I'm having success running rocm containers[0] on alpine linux specifically for llama.cpp. I also got vLLM to run in a rocm container, but I didn't have time to to diagnose perf problems, and llama.cpp is working well for my needs.

[0] https://github.com/kyuz0/amd-strix-halo-toolboxes

icedchai 5 hours ago|||

I've built llama.cpp against both Vulkan and ROCm on a Strix Halo dev box. I agree Vulkan is good enough, at least for my hobbyist purposes. ROCm has improved but I would say not worth the administrative overhead.

zby 7 hours ago|||

It is sad to observe this time and time again. Last year I had the idea to run a shareholder campaign to change this, I suspended it after last years AMD promises - but maybe this really needs to be done: https://unlockgpu.com/action-plan/

jauntywundrkind 19 hours ago|||

https://github.com/ROCm/TheRock/issues/3477 makes me quite sad for a variety of reasons. It shouldn't be like this. This work should be usable.

lrvick 19 hours ago|||

Oh I fully abandoned TheRock in my stagex ROCm build stack. It is not worth salvaging, but it was an incredibly useful reference for me to rewrite it.

MrDrMcCoy 13 hours ago||||

So much about this confuses me. What do Kitty and ncurses have to do with ROCm? Why is this being built with GCC instead of clang? Why even bother building it yourself when the tarballs are so good and easy to work with?

jeroenhd 11 hours ago|||

The analysis was AI generated. This was Claude brute-forcing itself through building a library.

CamouflagedKiwi 12 hours ago|||

On the last one: OP said they were trying to get it working for a musl toolchain, so the tarballs are probably not useful to them (I assume they're built for glibc).

Agreed on the others though. Why's it even installing ncurses, surely that's just expected to be on the system?

fwip 4 hours ago||

> Hey @rektide, @apaz-cli, we bundle all sysdeps to allow to ship self-contained packages that users can e.g. pip install. That's our basic default and it allows us to tightly control what we ship. For building, it should generally be possible to build without the bundled sysdeps in which case it is up to the user to make sure all dependencies are properly installed. As this is not our default we seemed to have missed some corner cases and there is more work needed to get back to allow builds with sysdeps disabled. I started #3538 but it will need more work in some other components to fully get you what you're asking with regards to system dependencies. Please not that we do not test with the unbundled, system provided dependencies but of course we want to give the community the freedom to build it that way.

jauntywundrkind 2 hours ago||

I did get past that issue with nurses & kitty! Thanks for some work there!

There are however quite a large list of other issues that have been blocking builds on systems with somewhat more modern toolchains / OSes than whatever the target is here (Ubuntu 24.04 I suspect). I really want to be able to engage directly with TheRock & compile & run it natively on Ubuntu 25.04 and now Ubuntu 26.04 too. For people eager to use the amazing leading edge capabilities TheRock offers, I suspect they too will be more bleeding edge users, also with more up to date OS choices. They are currently very blocked.

I know it's not the intent at all. There's so much good work here that seems so close & so well considered, an epic work spanning so many libraries and drivers. But this mega thread of issues gives me such vibes of the bad awful no good Linux4Tegra, where it's really one bespoke special Linux that has to be used, that nothing else works. In this case you can download the tgz and it will probably work on your system, but that means you don't have any chance to improve or iterate or contribute to TheRock, that it's a consume only relationship, and that feels bad and is a dangerous spot to be in, not having usable source.

I'd really really like to see AMD have CI test matrixes that we can see, that shows the state of the build on a variety of Linux OSes. This would give the discipline and trust that situations like what we have here do not arise. This obviously cannot hold forever, Ubuntu 24.04 is not acceptable as a build machine for perpetuity, so these problems eventually have to be tackled, but it's really a commitment to avoiding making the build work on one blessed image only that needs to happen. This situation should not have developed; for TheRock to be accepted and useful, the build needs to work on a variety of systems. We need fixes right now to make that true, and AMD needs to be showing that their commitment to that goal is real, ideally by running and showing a build matrix CI where we can see it that it does compile.

hackernows_test 18 hours ago|||

[flagged]

999900000999 17 hours ago|||

Wait ?

You don't trust Nvidia because the drivers are closed source ?

I think Nvidia's pledged to work on the open source drivers to bring them closer to the proprietary ones.

I'm hopping Intel can catch up , at 32GB of VRAM for around 1000$ it's very accessible

jeroenhd 11 hours ago|||

Nvidia is opening their source code because they moved most of their source code to the binary blob they're loading. That's why they never made an open source Nvidia driver for Pascal or earlier, where the hardware wasn't set up to use their giant binary blobs.

It's like running Windows in a VM and calling it an open source Windows system. The bootstrapping code is all open, but the code that's actually being executed is hidden away.

Intel has the same problem AMD has: everything is written for CUDA or other brand-specific APIs. Everything needs wrappers and workarounds to run before you can even start to compare performance.

Asmod4n 9 hours ago||

In the python eco system you can just replace CUDA with DirectML in at least one popular framework and it just runs. You are limited to windows then though.

lrvick 16 hours ago||||

Nvidia has been pledging that for years. If it ever actually happens, I am here for it.

shaklee3 15 hours ago||

It happened 2 years ago:

https://developer.nvidia.com/blog/nvidia-transitions-fully-t...

cyberax 13 hours ago||

Their userspace is still closed. ROCm is fully open.

pjmlp 10 hours ago||

Provided you happen to have one of those few supported GPUs.

Thus being open source isn't of much help without it.

cmxch 16 hours ago|||

> Intel

For some workloads, the Arc Pro B70 actually does reasonably well when cached.

With some reasonable bring-up, it also seems to be more usable versus the 32gb R9700.

MrDrMcCoy 13 hours ago||

I have both of those cards. Llama.cpp with SYCL has thus far refused to work for me, and Vulkan is pretty slow. Hoping that some fixes come down the pipe for SYCL, because I have plenty of power for local models (on paper).

cmxch 4 hours ago||

Hmm.

I had to rebuild llama.cpp from source with the SYCL and CPU specific backends.

Started with a barebones Ubuntu Server 24 LTS install, used the HWE kernel, pulled in the Intel dependencies for hardware support/oneapi/libze, then built llama.cpp with the Intel compiler (icx?) for the SYCL and NATIVE backends (CPU specific support).

In short, built it based mostly on the Intel instructions.

salawat 15 hours ago||

>Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.

...I have a feeling you might not be at liberty to answer, but... Wat? The hell kind of "I must apparently resist Reflections on Trusting Trust" kind of workloads are you working on?

And what do you mean "binaries only built using a single compiler"? Like, how would that even work? Compile the .o's with compiler specific suffixes then do a tortured linker invo to mix different .o's into a combined library/ELF? Are we talking like mixing two different C compilers? Same compiler, two different bootstraps? Regular/cross-mix?

I'm sorry if I'm pushing for too much detail, but as someone whose actually bootstrapped compilers/user spaces from source, your usecase intrigues me just by the phrasing.

lrvick 8 hours ago||

You can get a sense of what my team and I do from https://distrust.co/threatmodel.html

For information on stagex and how we do signed deterministic compiles across independently operated hardware see https://stagex.tools

Stagex is used by governments, fintech, blockchains, AI companies, and critical infrastructure all over the internet, so our threat model must assume at least one computer or maintainer is compromised at all times and not trust any third party compiled code in the entire supply chain.

androiddrew 7 hours ago||

I have been trying since February to get someone at AMD to shipped tuned Tensile kernels in the rcom-libs for the gfx1201. They are used by Ollama but no one on the Developer Discord knows who is responsible for that. It has been pretty frustrating and it shows that AMD has an organizational problem to overcome in addition to all the things technically that they want rocm to do.

FuriouslyAdrift 2 hours ago|

Have you filed anything at github? https://github.com/zichguan-amd seems to be one of the main people for that...

or https://github.com/harkgill-amd

0xbadcafebee 18 hours ago||

AMD has years of catching up to do with ROCm just to get their devices to work well. They don't support all their own graphics cards that can do AI, and when it is supported, it's buggy. The AMDGPU graphics driver for Linux has had continued instability since 6.6. I don't understand why they can't hire better software engineers.

xethos 16 hours ago||

> I don't understand why they can't hire better software engineers.

Beyond the fact they're competing with the most valuable companies in the world for talent while being less than a decade past "Bet the company"-level financial distress?

shakow 9 hours ago||

I don't think that you need top-of-the-line, $1M/yr TC people to revamp a build system.

mathisfun123 3 hours ago||

lol the irony is that the person who started revamping the build system is a $1M/yr TC person.

klooney 41 minutes ago||

Sometimes the only way you can get basic engineering practices done like "have tests", "have a build system", "run the tests and the builds automatically", "insist that the above work" without management freaking out is to pay someone a lot of money.

onlyrealcuzzo 17 hours ago|||

Because they aren't willing to pay for them?

prewett 4 hours ago|||

I figure it must be a cultural problem. ATI was known for buggy graphics drivers back in The Day, if I remember correctly. I certainly remember not buying their cards for that reason. Apparently after AMD bought them, they have been unable to change the culture (or didn't care). The state of ATI drivers has always been about the same.

philipallstar 3 hours ago||

I don't think they invest nearly as large a percentage of their profits in software compared to Nvidia.

StillBored 2 hours ago||

I don't even think that is the problem. It seems more an engineering cultural one, that has sadly infected most of the software industry at this point. Instead of incremental improvement it seems the old ATI drivers (and seemingly much of the recent history) are just rewrites rather than having a replaceable low level core and a reasonable amount of legacy that just gets forward ported to newer HW architectures. So, they release the hardware and its basically obsolete before the driver stack ever stabilizes sufficiently that any single driver can run a wide range of games well.

oofbey 16 hours ago|||

Years. They neglected ROCm for soooo long. I have friends who worked there 5+ years ago who tried desperately to convince execs to invest more in ROCm and failed. You had to have your head stuck pretty deep in the sand back then to not see that AI was becoming an important workload.

I would love AMD to be competitive. The entire industry would be better off if NVIDIA was less dominant. But AMD did this to themselves. One hundred percent.

tux1968 16 hours ago|||

It would be very helpful to deeply understand the truth behind this management failing. The actual players involved, and their thinking. Was it truly a blind spot? Or was it mistaken priorities? I mean, this situation has been so obvious and tragic, that I can't help feeling like there is some unknown story-behind-the-story. We'll probably never really know, but if we could, I wouldn't spend quite as much time wearing a tinfoil hat.

throwawayrgb 16 hours ago|||

if you asked AMD execs they'd probably say they never had the money to build out a software team like NVIDIA's. that might only be part of the answer. the rest would be things like lack of vision, "can't turn a tanker on a dime", etc.

pjc50 9 hours ago|||

Has to be lack of vision. I refuse to believe it's impossible to _do_, but it sounds like it's impossible to _specify_ within AMD. Like they're genuinely incapable of working out what the solution might look like.

KeplerBoy 11 hours ago||||

I don't buy that story. NVIDIA wasn't that huge of a company when they built CUDA, they weren't huge when the first GPT model was trained with it.

Alupis 4 hours ago||

CUDA was built during the time AMD was focusing every resource on becoming competitive in the CPU market again. Today they dominate the CPU industry - but CUDA was first to market and therefore there's a ton of inertia behind it. Even if ROCm gets very good, it'll still struggle to overcome the vast amount of support (read "moat") CUDA enjoys.

KeplerBoy 2 hours ago||

True. After all Nvidia hasn't built tensorflow or PyTorch. That stuff was bound to be built on the first somewhat viable platform. Rocm is probably far ahead of where cuda was back then, but the goal moved.

imtringued 8 hours ago||||

Nobody is asking AMD to rebuild the entire NVidia ecosystem. Most people just want to run GPGPU code or ML code on AMD GPUs without the entire computer crashing on them.

throwawayrgb 7 hours ago||

yeah it's a very frustrating situation.

according to public information NVIDIA started working on CUDA in 2004, that was before AMD made the ATI acquisition.

my suspicion is that back then ATI and NVIDIA had very different orientations. neither AMD nor ATI were ever really that serious about software. so in that sense i guess it was a match made in heaven.

so you have a cultural problem, which is bad enough, then you add in the lean years AMD spent in survival mode. forget growing software team, they had to cling on to fewer people just to get through.

now they're playing catch-up in a cutthroat market that's moving at light speed compared to 20 years ago.

we're talking about a major fumble here so it's easy to lose context and misunderstand things were a little more complex than they appeared.

aurareturn 9 hours ago|||

They were doing stock buybacks before the AI boom.

oofbey 16 hours ago|||

My guess is it’s just incompetence. Imagine you’re in charge of ROCm and your boss asks you how it’s going. Do you say good things about your team and progress? Do you highlight the successes and say how you can do all the major things CUDA can? I think many people would. Or do you say to your boss “the project I’m in charge of is a total disaster and we are a joke in the industry”? That’s a hard thing to say.

Shitty-kitty 15 hours ago|||

a 10 year lead can't be closed overnight but Intel had a even larger lead and look how the mighty have fallen.

Alupis 4 hours ago|||

I'd argue Intel fell is large part because of Intel's own complacency and incompetence. If Intel had taken AMD seriously, they'd probably still be a serious competitor today.

pjmlp 14 hours ago|||

Intel was never famous for good GPUs, and they are basically the only ones still trying to make something out of OpenCL, with most of the tooling going beyond what Khronos offers.

one API is much more than a plain old SYCL distribution, and still.

Shitty-kitty 14 hours ago|||

I meant their CPU supremaciy. ;)

pjmlp 13 hours ago||

That still reigns in PCs and servers.

People like to talk about Apple CPUs, but keep forgetting they don't sell chips, and overall desktop market is around 10% world wide.

ARM is mostly about phones and tablets, good luck finally getting those Windows ARM or GNU/Linux desktop cases or laptops.

Servers, depends pretty much about which hyperscalers we are on.

RISC-V is still to be seen, on the desktop, laptops and servers.

Where AMD is doing great are game consoles.

cm2187 10 hours ago||

Intel still has 60% server market share but it is in free fall https://wccftech.com/intel-server-client-cpu-market-share-hu...

wlesieutre 10 hours ago|||

Also on pace to drop below AMD on the Steam hardware survey this year

pjmlp 9 hours ago||

The same Steam hardware survey whose quality is questioned about when we talk about Linux adoption numbers?

pjmlp 10 hours ago|||

Interesting information, that leaves desktop and laptop markets, where AMD still has adoption issues especially on laptops.

wlesieutre 10 hours ago||

Between the MacBook Neo on the low end and Strix Halo on they high end Intel is in for some tougher laptop competition

pjmlp 9 hours ago||

Outside US, and countries with similar salary levels, people don't earn enough for Apple tax served with 8 GB.

throwaway173738 7 hours ago|||

Try not to rely on Intel too much. They cut products with promise all the time because they miss quarterly numbers.

throwawayrgb 15 hours ago|||

> My guess is it’s just incompetence.

maybe on some level but not that level you're describing. pretty much everyone at AMD understands the situation, and has for a while.

jijijijij 7 hours ago|||

Not even AI. My 5 years old APU is completely neglected by AMD ROCm efforts. So I also can't use it in Blender! I feel quite betrayed to be honest. How is such a basic thing not possible, not to mention years later?

Look where Apple Silicon managed going in the same time frame...

Because of this, I would never consider another AMD GPU for a long time. Gaming isn't everything I want my GPU doing. How do they keep screwing this up? Why isn't it their top priority?

jrm4 3 hours ago||

This is the question I came to ask. Given that being "the other big GPU manufacturer" today has to got to be a license for printing infinite money, what is going ON? Almost feels like there has to be something deeper than mere incompetence?

StillBored 2 hours ago||

I just wish they would make another pass at cleaning up the stack. It should be easy to `git clone --recurse-submodules rocm` followed by a configure/make that both prints out missing dependencies and configures without them, along with choices for 'build the world' vs just build some lower level opencl/HIP/SPIRV tooling without all the libraries/etc on top in a clear way.

Right now the entire source base is literally throw a bunch of crap into the rocm brand and hope it builds together vs some overarching architecture. Presumably the entire spend it also tied to "whatever big Co's evaluation needs this week" when it comes to developing with it.

AshamedCaptain 8 hours ago||

> Last year, AMD ran a GitHub poll for ROCm complaints and received more than 1,000 responses. Many were around supporting older hardware, which is today supported either by AMD or by the community, and one year on, all 1,000 complaints have been addressed, Elangovan said.

Must have been by waiting for each of the 1000 complainers to die of old age, because I do not know what old hardware they have added support for.

throwaway173738 7 hours ago|

I guess it counts if you can find the information from one of the many conflicting wikis out there and then figure out how to hack support for your card into the specific version of ROCm.

grokcodec 8 hours ago||

The day ROCm supports EVERY AMD card on release, just like CUDA does, is the day I will actually believe this marketing hype.They really dropped the ball here, also when they abandoned recently released cards (at the time) like the 400 series. Hopefully management gets their heads out of their butts and invests more in the software stack.

greenail 3 hours ago|

I think GB10 is a bit of a counter point. There are tons of features that are not implemented for GB10 which was released 8/2025 It isn't all roses on the cuda side.

mellosouls 1 hour ago||

Related from Jan 2025:

ROCm Device Support Wishlist (205 points, 107 comments)

https://news.ycombinator.com/item?id=42772170

mstaoru 13 hours ago||

I'm team "taking on CUDA with OpenVINO" (and SYCL*), Intel seems really upped their game on iGPU and dGPU lately, with sane prices and fairly good software support and APIs.

I'm not talking gaming CUDA, but CV and data science workloads seem to scale well on Arc and work well on Edge on Core Ultra 2/3.

rdevilla 18 hours ago||

ROCm is not supported on some very common consumer GPUs, e.g. the RX 580. Vulkan backends work just fine.

chao- 17 hours ago||

I purchased my RX 580 in early 2018 and used it through late 2024.

I am critical of AMD for not fully supporting all GPUs based on RNDA1 and RDNA2. While backwards compatibility is always better than less for the consumer, the RX 580 was a lightly-updated RX 480, which came out in 2016. Yes, ROCm technically came out in 2016 as well, but I don't mind acknowledging that it is a different beast to support the GCN architecture than the RDNA/CDNA generations that followed (Vega feels like it is off on an island of its own, and I don't even know what to say about it).

As cool as it would be to repurpose my RX 580, I am not at all surprised that GCN GPUs are not supported for new library versions in 2026.

I would be MUCH more annoyed if I had any RDNA1 GPU, or one of the poorly-supported RDNA2 GPUs.

daemonologist 16 hours ago|||

ROCm usually only supports two generations of consumer GPUs, and sometimes the latest generation is slow to gain support. Currently only RDNA 3 and RDNA 4 (RX 7000 and 9000) are supported: https://rocm.docs.amd.com/projects/install-on-linux/en/lates...

It's not ideal. CUDA for comparison still supports Turing (two years older than RDNA 2) and if you drop down one version to CUDA 12 it has some support for Maxwell (~2014).

0xbadcafebee 13 hours ago|||

Worse, RDNA3 and RDNA4 aren't fully supported, and probably won't be, as they only focus on chips that make them more money. If we didn't have Vulkan, every nerd in the world would demand either a Mac or an Intel with Nvidia chip. AMD keeps leaving money on the table.

lpcvoid 12 hours ago||

Up until recently they didn't even support their cashcow Ryzen 395+ MAX properly. Idk about the argument that they only care about certain chips.

terribleperson 14 hours ago||||

It's pretty crazy that a 6900XT/6950XT aren't supported.

bavell 7 hours ago||

Eh, YMMV. I was using rocm for minor AI things as far back as 2023 on an "unsupported" 6750XT [0]. Even trained some LoRAs. Mostly the issues were how many libs were cuda only.

[0] https://news.ycombinator.com/item?id=43207015

kombine 12 hours ago||||

I have RX 6700XT, damn. AMD is shooting themselves in the foot

bavell 7 hours ago||

Try it before you give up, I got plenty of AI stuff working on my 6750XT years ago.

imtringued 8 hours ago|||

If you are on an unsupported AMD GPU, why would you ever consider switching to a newer AMD GPU, considering you know that it will reach the same sorry state as your current GPU?

Especially when as you say, the latest generation is slow to gain support, while they are simultaneously dropping old generations, leaving you with a 1-2 year window of support.

maxloh 17 hours ago|||

I have the same experience with my RX 5700. The supported ROCm version is too old to get Ollama running.

Vulkan backend of Ollama works fine for me, but it took one year or two for them to officially support it.

pjmlp 10 hours ago|||

Vulkan backends work just fine, provided one wants to be constrained by Vulkan developer experience without first class support for C++, Fortran and Python JIT kernels, IDE integration, graphical debugging, libraries.

BobbyTables2 17 hours ago|||

Did it used to be different?

A few years ago I thought I had used the ROCm drivers/libraries with hashcat on a RX580

Now it’s obsolete ?

hurricanepootis 17 hours ago||

RX 580 is a GCN 4 GPU. I'm pretty sure the bare minimum for ROCm is GCN 5 (Vega) and up.

daemonologist 16 hours ago||

Among consumer cards, latest ROCm supports only RDNA 3 and RDNA 4 (RX 7000 and RX 9000 series). Most stuff will run on a slightly older version for now, so you can get away with RDNA 2 (6000 series).

adev_ 10 hours ago|

A little feedback to AMD executives about the current status of ROCm here:

(1) - Supporting only Server grade hardware and ignoring laptop/consumer grade GPU/APU for ROCm was a terrible strategical mistake.

A lot of developers experiments first and foremost on their personal laptop first and scale on expensive, professional grade hardware later. In addition: some developers simply do not have the money to buy server grade hardware.

By locking ROCm only to server grade GPUs, you restrict the potential list of contributors to your OSS ROCm ecosystem to few large AI users and few HPC centers... Meaning virtually nobody.

A much more sensible strategy would be to provide degraded performance for ROCm on top of consummer GPUs, and this is exactly what Nvidia with CUDA is doing.

This is changing but you need to send a clear message there. EVERY new released device should be properly supported by ROCm.

- (2) Supporting only the two last generations of architecture is not what customers want to see.

https://rocm.docs.amd.com/projects/install-on-linux/en/docs-...

People with existing GPU codebase invests significant amount of effort to support ROCm.

Saying them two years later: "Sorry you are out of update now!" when the ecosystem is still unstable is unacceptable.

CUDA excels to backward compatibility. The fact you ignore it entirely plays against you.

(3) - Focusing exclusively on Triton and making HIP a second class citizen is non-sensical.

AI might get all the buzz and the money right now, we go it.

It might look sensible on the surface to focus on Python-base, AI focused, tools like Triton and supporting them is definitively necessary.

But there is a tremendous amount of code that is relying on C++ and C to run over GPU (HPC, simulation, scientific, imaging, ....) and that will remain there for the multiple decades to come.

Ignoring that is loosing, again, custumers to CUDA.

It is currently pretty ironic to see such a move like that considering that AMD GPUs currently tend to be highly competitive over FP64, meaning good for these kind of applications. You are throwing away one of your own competitive advantage...

(4) - Last but not least: Please focus a bit on the packaging of your software solution.

There has been complained on this for the last 5 years and not much changed.

Working with distributions packagers and integrating with them does not cost much... This would currently give you a competitive advantage over Nvidia..

pjmlp 10 hours ago||

Additional points, CUDA is polyglot, and some people do care about writing their kernels in something else other than C++, C or Fortran, without going through code generation.

NVidia is acknowledging Python adoption, with cuTile and MLIR support for Python, allowing the same flexibility as C++, using Python directly even for kernels.

They seem to be supportive of having similar capabilities for Julia as well.

The IDE and graphical debuggers integration, the libraries ecosystem, which now are also having Python variants.

As someone that only follows GPGPU on the side, due to my interests in graphics programming, it is hard to understand how AMD and Intel keep failing to understand what CUDA, the whole ecosystem, is actually about.

Like, just take the schedule of a random GTC conference, how much of it can I reproduce on oneAPI or ROCm as of today.

shawnz 7 hours ago|||

> Supporting only Server grade hardware and ignoring laptop/consumer grade GPU/APU for ROCm was a terrible strategical mistake. A lot of developers experiments first and foremost on their personal laptop first and scale on expensive, professional grade hardware later.

NVIDIA is making the same mistake today by deprioritizing the release of consumer-grade GPUs with high VRAM in favour of focusing on server markets.

They already have a huge moat, so it's not as crippling for them to do so, but I think it presents an interesting opportunity for AMD to pick up the slack.

Symmetry 8 hours ago||

There actually isn't any locking involved. I can take a new, officially unsupported version of ROCm and just use it with my 7900 XT despite my card not being officially supported and it works. It's just that AMD doesn't feel that they need to invest the resources to run their test suite against my card and bless it as officially supported. And maybe if I was doing something other than running PyTorch I'd run into bugs. But it's just laziness, not malice.

hmry 7 hours ago|||

I used to be able to run ROCm on my officially unsupported 7840U. Bought the laptop assuming it would continue to work.

Then in a random Linux kernel update they changed the GPU driver. Trying to run ROCm now hard-crashed the GPU requiring a restart. People in the community figured out which patch introduced the problem, but years later... Still no fix or revert. You know, because it's officially unsupported.

So "Just use HSA_OVERRIDE_GFX_VERSION" is not a solution. You may buy hardware based on that today, and be left holding the bag tomorrow.

machomaster 7 hours ago|||

This is a very unprofessional attitude. There is no space for laziness in business.

More comments...