Posted by amazari 11 hours ago
Once Vulkan is finally in good order, descriptor_heap and others, I really really hope we can get a WebGPU.next.
Where are we at with the "what's next for webgpu" post, from 5 quarters ago? https://developer.chrome.com/blog/next-for-webgpu https://news.ycombinator.com/item?id=42209272
My personal experience with WebGPU wasn't the best. One of my dislikes was pipelines, which is something that other people also discuss in this comment thread. Pipeline state objects are awkward to use without an extension like dynamic rendering. You get a combinatorial explosion of pipelines and usually end up storing them in a hash map.
In my opinion, pipelines state objects are a leaky abstraction that exposes the way that GPUs work: namely that some state changes may require some GPUs to recompile the shader, so all of the state should be bundled together. In my opinion, an API for the web should be concerned with abstractions from the point of view of the programmer designing the application: which state logically acts as a single unit, and which state may change frequently?
It seems that many modern APIs have gone with the pipeline abstraction; for example, SDL_GPU also has pipelines. I'm still not sure what the "best practices" are supposed to be for modern graphics programming regarding how to structure your program around pipelines.
I also wish that WebGPU had push constants, so that I do not have to use a bind group for certain data such as transformation matrices.
Because WebGPU is design-by-committee and must support the lowest common denominator hardware, I'm worried whether it will evolve too slowly to reflect whatever the best practices are in "modern" Vulkan. I hope that WebGPU could be a cross-platform API similar to Vulkan, but less verbose. However, it seems to me that by using WebGPU instead of Vulkan, you currently lose out on a lot of features. Since I'm still a beginner, I could have misconceptions that I hope other people will correct.
It's also disappointing that OpenGL 4.6, released in 2017, is a decade ahead of WebGPU.
Web graphics have never and will never be cutting edge, they can't as they have to sit on top of browsers that have to already have those features available to it. It can only ever build on top of something lower level. That's not inherently bad, not everything needs cutting edge, but "it's outdated" is also just inherently going to be always true.
Also, some things could have easily be done different and then be implemented as efficient as a particular backend allows. Like pipelines. Just don't do pipelines at all. A web graphics API does not need them, WebGL worked perfectly fine without them. The WebGPU backends can use them if necessary, or not use them if more modern systems don't require them anymore. But now we're locked-in to a needlessly cumbersome and outdated way of doing things in WebGPU.
Similarly, WebGPU could have done without that static binding mess. Just do something like commandBuffer.draw(shader, vertexBuffer, indexBuffer, texture, ...) and automatically connect the call with the shader arguments, like CUDA does. The backend can then create all that binding nonsense if necessary, or not if a newer backend does not need it anymore.
Except it didn't. In the GL programming model it's trivial to accidentially leak the wrong granular render state into the next draw call, unless you always reconfigure all states anyway (and in that case PSOs are strictly better, they just include too much state).
The basic idea of immutable state group objects is a good one, Vulkan 1.0 and D3D12 just went too far (while the state group granularity of D3D11 and Metal is just about right).
> Similarly, WebGPU could have done without that static binding mess.
This I agree with, pre-baked BindGroup objects were just a terrible idea right from the start, and AFAIK they are not even strictly necessary when targeting Vulkan 1.0.
Even if those state group objects don't match the underlying hardware directly they still reign in the combinatorial explosion dramatically and are more robust than the GL-style state soup.
AFAIK the main problem is state which needs to be compiled into the shader on some GPUs while other GPUs only have fixed-function hardware for the same state (for instance blend state).
This is where I think Vulkan and WebGPU are chasing the wrong goal: To make draw calls faster. What's even faster, however, is making fewer draw calls and that's something graphics devs can easily do when you provide them with tools like multi-draw. Preferably multi-draw that allows multiple different buffers. Doing so will naturally reduce costly state changes with little effort.
They lag behind modern hardware, and after almost 15 years, there are zero developer tools to debug from browser vendors, other than the oldie SpectorJS that hardly counts.
Graphics people, here is what you need to do.
1) Figure out a machine abstraction.
2) Figure out an abstraction for how these machines communicate with each other and the cpu on a shared memory bus.
3) Write a binary spec for code for this abstract machine.
4) Compilers target this abstract machine.
5) Programs submit code to driver for AoT compilation, and cache results.
6) Driver has some linker and dynamic module loading/unloading capability.
7) Signal the driver to start that code.
AMD64, ARM, and RISC-V are all basically differing binary specs for a C-machine+MMU+MMIO compute abstraction.
Figure out your machine abstraction and let us normies write code that’s accelerated without having to throw the baby out with the bathwater ever few years.
Oh yes, give us timing information so we can adapt workload as necessary to achieve soft real-time scheduling on hardware with differing performance.
It should be clear that I’m only interested in compute and not a GPU expert.
GPUs, from my understanding, have lost the majority of fixed-function units as they’ve become more programmable. Furthermore, GPUs clearly have a hidden scheduler and this is not fully exposed by vendors. In other words we have no control over what is being run on a GPU at any given instant, we simply queue work for it.
Given all these contrivances, why should not the interface exposed to the user be absolutely simple. It should then be up to vendors to produce hardware (and co-designed compilers) to run our software as fast as possible.
Graphics developers need to develop a narrow-waist abstraction for wide, latency-hiding, SIMD compute. On top of this Vulkan, or OpenGL, or ML inference, or whatever can be done. The memory space should also be fully unified.
This is what needs to be worked on. If you don’t agree, that’s fine, but don’t pretend that you’re not protecting entrenched interests from the likes of Microsoft, Nvidia, Epic Games, Valve and others.
Telling people to just use Unreal engine, or Unity, or even Godot, it just like telling people to just use Python, or Typescript, or Go to get their sequential compute done.
Expose the compute!
surprise, it's very difficult to do across many hw vendors and classes of devices. it's not a coincidence that metal is much easier to program for.
maybe consider joining khronos since you apparently know exactly how to achieve this very simple goal...
Tbf, Metal also works on non-Apple GPUs and with only minimal additional hints to manage resources in non-unified memory.