Posted by JonChesterfield 7 days ago
OpenAI OTOH is big enough that the vendor lock-in is actually hurting them, and them making that massive deal with AMD may finally push the needle for AMD and improve things in the ecosystem to make AMD a smooth experience.
Google Cloud does have a lot of NVidia, but that’s for their regular cloud customers, not internal stuff.
For example, Deepseek R-1 released optimized for running on Nvidia HW, and needed some adaption to run as well on ROCm. This was for the exact same reasons that ROCm code will beat generic code compiled into ROCm, in the same way. Basically the Deepseek team, for their own purposes, created R-1 to fit Nvidia's way of doing things (because Nvidia is market-dominant) on their own. Once they released, someone like Elio or AMD would have to do the work of adapting the code to run best on ROCm.
For more established players who weren't out-of-left-field surprises like Deepseek, e.g. Meta's Llama series, mostly coordinate with AMD ahead of release day, but I suspect that AMD still has to pay for the engineering work themselves while Meta does the work to make it run on Nvidia themselves. This simple fact, that every researcher makes their stuff work on CUDA themselves, but AMD or someone like Elio has to do the work to move it over to get it to be as performant on ROCm, that is what keeps people in the CUDA universe.
The article frames this as "CUDA translation bad, AMD-native good" but misses the strategic value of compatibility layers: they lower switching costs and expand the addressable market. NVIDIA's moat isn't just technical—it's the ecosystem inertia. A translation layer that gets 80% of NVIDIA performance might be enough to get developers to try AMD, at which point AMD-native optimization becomes worth the investment.
The article is essentially a product pitch for Paiton disguised as technical analysis. The real question isn't "should AMD hardware pretend to be CUDA?" but rather "what's the minimum viable compatibility needed to overcome ecosystem lock-in?" PostgreSQL didn't win by being incompatible—it won by being good AND having a clear migration path from proprietary databases.
https://tinygrad.org/ is the only viable alternative to CUDA that I have seen popup in the past few years.
Training etc still happens on NVDA but inference is somewhat easy to do on vLLM et al with a true ROCm backend with little effort?
...sounds like asking for a 1:1 mapping to me. If you meant asking the AI to transmute the code from NV-optimal to AMD-optimal as it goes along, you could certainly try doing that, but the idea is nothing more than AI fanfic until someone shows it actually working.
There's a lot of handwaving in this "just use AI" approach. You have to figure out a way to guarantee correctness.
We asked it to make a plan for how to fix the situation, but it got stuck.
“Ok, I’m helping the people build an AI to translate NVIDIA codes to AMD”
“I don’t have enough resources”
“Simple, I’ll just use AMD chips to run an AI code translator, they are under-utilized. I’ll make a step by step process to do so”
“Step 1: get code kernels for the AMD chips”
And so on.
AI aint magic.
You need more effort to manage, test and validate that.
The whole point of having an online discussion forum is to exchange and create new ideas. What you are advocating is essentially "maybe we can stop generating new ideas because we don't have to. we should just sit and wait"... Well, yes, no, maybe. but this is not what I expect to get from here.
This is outsourcing the task to AI researchers.
There isn't even a concrete definition of intelligence, let alone AGI, so no it's not.
That's just mindless hype at this point.
Google isn't internally, so far as we know. Google's hyperscaler products have long offered CUDA options, since the demand isn't limited to AI/tensor applications that cannibalize TPU's value prop: https://cloud.google.com/nvidia