Posted by ryandrake 12/16/2025
And it's quite a bit simpler than what we have in the "modern" GPU APIs atm.
Wouldnt this mean double gpu memory usage for uploading a potentially large image? (Even if just for the time the copy is finished)
Vulkan lets the user copy from cpu (host_visible) memory to gpu (device_local) memory without an intermediate gpu buffer, afaik there is no double vram usage there but i might be wrong on that.
Great article btw. I hope something comes out of this!
VAO is the last feature I was missing prior.
Also the other cores will do useful gameplay work so one CPU core for the GPU is ok.
4 CPU cores is also enough for eternity. 1GB shared RAM/VRAM too.
Let's build something good on top of the hardware/OSes/APIs/languages we have now? 3588/linux/OpenGL/C+Java specifically!
Hardware has permanently peaked in many ways, only soft internal protocols can now evolve, I write mine inside TCP/HTTP.
In the before times, upgrading CPU meant eveything runs faster. Who didn't like that? Today, we need code that infinitely scales CPU cores for that to remain true. 16 thread CPUs have been around for a long time; I'd like my software to make the most of them.
When we have 480+Hz monitors, we will probably need more than 1 CPU core for GPU rendering to make the most of them.
Uh oh https://www.amazon.com/ASUS-Swift-Gaming-Monitor-PG27AQDP/dp...
Maybe 120Hz if they come in 4:3/5:4 with matte low res panel.
But that's enough for VR which needs 2x because two eyes.
So progress ends there.
16 cores can't share memory well.
Also 15W is peak because more is hard to passively cool in a small space. So 120Hz x 2 eyes at ~1080 is limit what we can do anyways... with $1/KWh!
The limits are physical.
Thankfully later versions have added escape hatches which bypass much of that unnecessary bureaucracy, but it was grim for a while, and all that early API cruft is still there to confuse newcomers.
UMA or not doesn't matter, desktop GPUs have MMUs and are perfectly capable of reading the CPUs memory in a unified address space (even back then).
But then game/engine devs want to use the vertex shader producing a uv coordinate and a normal together with a pixel shader that only reads the uv coordinate (or neither for shadow mapping) and don't want to pay for the bandwidth of the unused vertex outputs (or the cost of calculating them).
Or they want to be able to randomly enable any other pipeline stage like tessellation or geometry and the same shader should just work without any performance overhead.
Basically do what most engines do - have preprocessor constants and use different paths based on what attributes you need.
I also don't see how separated pipeline stages are against this - you already have this functionality in existing APIs where you can swap different stages individually. Some changes might need a fixup from the driver side, but nothing which can't be added in this proposed API's `gpuSetPipeline` implementation...