Simplifying Vulkan one subsystem at a time

Posted by amazari 11 hours ago

Simplifying Vulkan one subsystem at a time(www.khronos.org)

194 points | 129 commentspage 2

jauntywundrkind 7 hours ago|

How are folks feeling about WebGPU these days?

Once Vulkan is finally in good order, descriptor_heap and others, I really really hope we can get a WebGPU.next.

Where are we at with the "what's next for webgpu" post, from 5 quarters ago? https://developer.chrome.com/blog/next-for-webgpu https://news.ycombinator.com/item?id=42209272

hutao 24 minutes ago||

This is my point of view as someone who learned WebGPU as a precursor to learning Vulkan, and who is definitely not a graphics programming expert:

My personal experience with WebGPU wasn't the best. One of my dislikes was pipelines, which is something that other people also discuss in this comment thread. Pipeline state objects are awkward to use without an extension like dynamic rendering. You get a combinatorial explosion of pipelines and usually end up storing them in a hash map.

In my opinion, pipelines state objects are a leaky abstraction that exposes the way that GPUs work: namely that some state changes may require some GPUs to recompile the shader, so all of the state should be bundled together. In my opinion, an API for the web should be concerned with abstractions from the point of view of the programmer designing the application: which state logically acts as a single unit, and which state may change frequently?

It seems that many modern APIs have gone with the pipeline abstraction; for example, SDL_GPU also has pipelines. I'm still not sure what the "best practices" are supposed to be for modern graphics programming regarding how to structure your program around pipelines.

I also wish that WebGPU had push constants, so that I do not have to use a bind group for certain data such as transformation matrices.

Because WebGPU is design-by-committee and must support the lowest common denominator hardware, I'm worried whether it will evolve too slowly to reflect whatever the best practices are in "modern" Vulkan. I hope that WebGPU could be a cross-platform API similar to Vulkan, but less verbose. However, it seems to me that by using WebGPU instead of Vulkan, you currently lose out on a lot of features. Since I'm still a beginner, I could have misconceptions that I hope other people will correct.

m-schuetz 7 hours ago|||

WebGPU is kinda meh, a 2010s graphic programmers vision of a modern API. It follows Vulkan 1.0, and while Vulkan is finally getting rid of most of the mess like pipelines, WebGPU went all in. It's surprisingly cumbersome to bind stuff to shaders, and everything is static and has to be hashed&cached, which sucks for streaming/LOD systems. Nowadays you can easily pass arbitrary amounts of buffers and entire scene descriptions via GPU memory pointers to OpenGL, Vulkan, CUDA, etc. with BDA and change them dynamically each frame. But not in WebGPU which does not support BDA und is unlikely to support it anytime soon.

It's also disappointing that OpenGL 4.6, released in 2017, is a decade ahead of WebGPU.

kllrnohj 7 hours ago||

WebGPU has the problem of needing to handle the lowest common denominator (so GLES 3 if not GLES 2 because of low end mobile), and also needing to deal with Apple's refusal to do anything with even a hint of Khronos (hence why no SPIR-V even though literally everything else including DirectX has adopted it)

Web graphics have never and will never be cutting edge, they can't as they have to sit on top of browsers that have to already have those features available to it. It can only ever build on top of something lower level. That's not inherently bad, not everything needs cutting edge, but "it's outdated" is also just inherently going to be always true.

m-schuetz 7 hours ago||

I understand not being cutting-edge. But having a feature-set from 2010 is...not great.

Also, some things could have easily be done different and then be implemented as efficient as a particular backend allows. Like pipelines. Just don't do pipelines at all. A web graphics API does not need them, WebGL worked perfectly fine without them. The WebGPU backends can use them if necessary, or not use them if more modern systems don't require them anymore. But now we're locked-in to a needlessly cumbersome and outdated way of doing things in WebGPU.

Similarly, WebGPU could have done without that static binding mess. Just do something like commandBuffer.draw(shader, vertexBuffer, indexBuffer, texture, ...) and automatically connect the call with the shader arguments, like CUDA does. The backend can then create all that binding nonsense if necessary, or not if a newer backend does not need it anymore.

flohofwoe 6 hours ago||

> WebGL worked perfectly fine without them

Except it didn't. In the GL programming model it's trivial to accidentially leak the wrong granular render state into the next draw call, unless you always reconfigure all states anyway (and in that case PSOs are strictly better, they just include too much state).

The basic idea of immutable state group objects is a good one, Vulkan 1.0 and D3D12 just went too far (while the state group granularity of D3D11 and Metal is just about right).

> Similarly, WebGPU could have done without that static binding mess.

This I agree with, pre-baked BindGroup objects were just a terrible idea right from the start, and AFAIK they are not even strictly necessary when targeting Vulkan 1.0.

cmovq 5 hours ago|||

There should be a better abstraction to solve the GL state leakage problem than PSOs. We end up with a combinatory explosion of PSOs when some states they abstract are essentially toggling some bits in a GPU register in no way coupled with the rest of the pipeline state.

flohofwoe 3 hours ago||

That abstraction exists in D3D11 and to a lesser extent in Metal via smaller state-group-objects (for instance D3D11 splits the rende state into immutable objects for rasterizer-state, depth-stencil-state, blend-state and (vertex-)input-layout-state (not even needed anymore with vertex pulling).

Even if those state group objects don't match the underlying hardware directly they still reign in the combinatorial explosion dramatically and are more robust than the GL-style state soup.

AFAIK the main problem is state which needs to be compiled into the shader on some GPUs while other GPUs only have fixed-function hardware for the same state (for instance blend state).

m-schuetz 6 hours ago|||

> Except it didn't. In the GL programming model it's trivial to accidentially leak the wrong granular render state into the next draw call

This is where I think Vulkan and WebGPU are chasing the wrong goal: To make draw calls faster. What's even faster, however, is making fewer draw calls and that's something graphics devs can easily do when you provide them with tools like multi-draw. Preferably multi-draw that allows multiple different buffers. Doing so will naturally reduce costly state changes with little effort.

pjmlp 4 hours ago||

Agreed, this is the console approach with command buffers that get DMAed, and having more code on the GPU side.

pjmlp 5 hours ago|||

As always, the only two positive things about WebGL and WebGPU, are being available on browsers, and having been designed for managed languages.

They lag behind modern hardware, and after almost 15 years, there are zero developer tools to debug from browser vendors, other than the oldie SpectorJS that hardly counts.

flohofwoe 6 hours ago|||

I think in the end it all depends on Android. Average Vulkan driver quality on Android doesn't seem to be great in the first place, getting uptodate Vulkan API support, and in high quality and high enough performance for a modernized WebGPU version to build on might be too much to ask of the Android ecosystem for the next one or two decades.

yu3zhou4 7 hours ago|||

I try my best to push ML things into WebGPU and I think it has a future, but performance is not there yet. I have little experience with Vulkan except toy projects, but WebGPU and Vulkan seem very similar

Cloudef 6 hours ago||

WebGPU is kinda meh. It's when you need to do do something on browser that you can't with WebGL. GLES is the compatibility king and runs pretty much everywhere, if not natively then through a compatibility layer like ANGLE. I'm sad that WebGPU killed WebGL 3 which was supposed to add compute shaders. Maybe WebGPU would've been more interesting if it wasn't made to replace WebGL but instead be a non-compatibility API targetting modern rendering and actually supporting Spir-V.

janlucien 4 hours ago||

[dead]

openclawagent13 9 hours ago||

[dead]

lucastytthhh 9 hours ago||

[flagged]

sxzygz 5 hours ago|

Uuugh, graphics. So many smart people expending great energy to look busy while doing nothing particularly profound.

Graphics people, here is what you need to do.

1) Figure out a machine abstraction.

2) Figure out an abstraction for how these machines communicate with each other and the cpu on a shared memory bus.

3) Write a binary spec for code for this abstract machine.

4) Compilers target this abstract machine.

5) Programs submit code to driver for AoT compilation, and cache results.

6) Driver has some linker and dynamic module loading/unloading capability.

7) Signal the driver to start that code.

AMD64, ARM, and RISC-V are all basically differing binary specs for a C-machine+MMU+MMIO compute abstraction.

Figure out your machine abstraction and let us normies write code that’s accelerated without having to throw the baby out with the bathwater ever few years.

Oh yes, give us timing information so we can adapt workload as necessary to achieve soft real-time scheduling on hardware with differing performance.

sxzygz 1 hour ago||

I don’t know which of my detractors to respond to, so I’ll respond here.

It should be clear that I’m only interested in compute and not a GPU expert.

GPUs, from my understanding, have lost the majority of fixed-function units as they’ve become more programmable. Furthermore, GPUs clearly have a hidden scheduler and this is not fully exposed by vendors. In other words we have no control over what is being run on a GPU at any given instant, we simply queue work for it.

Given all these contrivances, why should not the interface exposed to the user be absolutely simple. It should then be up to vendors to produce hardware (and co-designed compilers) to run our software as fast as possible.

Graphics developers need to develop a narrow-waist abstraction for wide, latency-hiding, SIMD compute. On top of this Vulkan, or OpenGL, or ML inference, or whatever can be done. The memory space should also be fully unified.

This is what needs to be worked on. If you don’t agree, that’s fine, but don’t pretend that you’re not protecting entrenched interests from the likes of Microsoft, Nvidia, Epic Games, Valve and others.

Telling people to just use Unreal engine, or Unity, or even Godot, it just like telling people to just use Python, or Typescript, or Go to get their sequential compute done.

Expose the compute!

dyingkneepad 2 hours ago|||

They have done it. The current modern abstraction is called Vulkan, and the binary spec code for this machine is called SPIR-V.

flohofwoe 3 hours ago|||

Wow, you should get NVIDIA, AMD and Intel on the phone ASAP! Really strange that they didn't come up with such a simple and straightforward idea in the last 3 decades ;)

M95D 4 hours ago|||

It sounds like webgl + wasm.

nicebyte 3 hours ago||

some of this is what's khronos standards are theoretically supposed to achieve.

surprise, it's very difficult to do across many hw vendors and classes of devices. it's not a coincidence that metal is much easier to program for.

maybe consider joining khronos since you apparently know exactly how to achieve this very simple goal...

flohofwoe 2 hours ago||

> it's not a coincidence that metal is much easier to program for

Tbf, Metal also works on non-Apple GPUs and with only minimal additional hints to manage resources in non-unified memory.