Top
Best
New

Posted by ryandrake 12/16/2025

No Graphics API(www.sebastianaaltonen.com)
845 points | 183 commentspage 2
Bengalilol 12/16/2025|
After reading this article, I feel like I've witnessed a historic moment.
bogwog 12/17/2025|
Most of it went over my head, but there's so much knowledge and expertise on display here that it makes me proud that this person I've never met is out there proving that software development isn't entirely full of clowns.
ehaliewicz2 12/17/2025||
Seb is incredibly passionate about games and graphics programming. You can find old posts of his on various forums, talking about tricks for programming the PS2, PS3, Xbox 360, etc etc. He regularly posts demos he's working on, progress clips of various engines, etc, on twitter, after staying in the same area for 3 decades.

I wish I still had this level of motivation :)

aleph_minus_one 12/18/2025||
> I wish I still had this level of motivation :)

It's rather: can you find a company that pays you for having and extending this arcane knowledge (and even writing about it)?

Even if your job involves such topics, a lot of jobs that require this knowledge are rather "political" like getting the company's wishes into official standards.

jdashg 12/16/2025||
And the GPU API cycle of life and death continues!

I was an only-half-joking champion of ditching vertex attrib bindings when we were drafting WebGPU and WGSL, because it's a really nice simplification, but it was felt that would be too much of a departure from existing APIs. (Spending too many of our "Innovation Tokens" on something that would cause dev friction in the beginning)

In WGSL we tried (for a while?) to build language features as "sugar" when we could. You don't have to guess what order or scope a `for` loop uses when we just spec how it desugars into a simpler, more explicit (but more verbose) core form/dialect of the language.

That said, this powerpoint-driven-development flex knocks this back a whole seriousness and earnestness tier and a half: > My prototype API fits in one screen: 150 lines of code. The blog post is titled “No Graphics API”. That’s obviously an impossible goal today, but we got close enough. WebGPU has a smaller feature set and features a ~2700 line API (Emscripten C header).

Try to zoom out on the API and fit those *160* lines on one screen! My browser gives up at 30%, and I am still only seeing 127. This is just dishonesty, and we do not need more of this kind of puffery in the world.

And yeah, it's shorter because it is a toy PoC, even if one I enjoyed seeing someone else's take on it. Among other things, the author pretty dishonestly elides the number of lines the enums would take up. (A texture/data format enum on one line? That's one whole additional Pinocchio right there!)

I took WebGPU.webidl and did a quick pass through removing some of the biggest misses of this API (queries, timers, device loss, errors in general, shader introspection, feature detection) and some of the irrelevant parts (anything touching canvas, external textures), and immediately got it down to 241 declarations.

This kind of dishonest puffery holds back an otherwise interesting article.

m-schuetz 12/17/2025||
Man, how I wish WebGPU didn't go all-in on legacy Vulkan API model, and instead find a leaner approach to do the same thing. Even Vulkan stopped doing pointless boilerplate like bindings and pipelines. Ditching vertex attrib bindings and going for programmable vertex fetching would have been nice.

WebGPU could have also introduced Cuda's simple launch model for graphics APIs. Instead of all that insane binding boilerplate, just provide the bindings as launch args to the draw call like draw(numTriangles, args), with args being something like draw(numTriangles, {uniformBuffer, positions, uvs, samplers}), depending on whatever the shaders expect.

CupricTea 12/17/2025|||
>Man, how I wish WebGPU didn't go all-in on legacy Vulkan API model

WebGPU doesn't talk to the GPU directly. It requires Vulkan/D3D/Metal underneath to actually implement itself.

>Even Vulkan stopped doing pointless boilerplate like bindings and pipelines.

Vulkan did no such thing. As of today (Vulkan 1.4) they added VK_KHR_dynamic_rendering to core and added the VK_EXT_shader_object extension, which are not required to be supported and must be queried for before using. The former gets rid of render pass objects and framebuffer objects in favor of vkCmdBeginRendering(), and WebGPU already abstracts those two away so you don't see or deal with them. The latter gets rid of monolithic pipeline objects.

Many mobile GPUs still do not support VK_KHR_dynamic_rendering or VK_EXT_shader_object. Even my very own Samsung Galaxy S24 Ultra[1] doesn't support shaderObject.

Vulkan did not get rid of pipeline objects, they added extensions for modern desktop GPUs that didn't need them. Even modern mobile GPUs still need them, and WebGPU isn't going to fragment their API to wall off mobile users.

[1] https://vulkan.gpuinfo.org/displayreport.php?id=44583

m-schuetz 12/17/2025||
> WebGPU doesn't talk to the GPU directly. It requires Vulkan/D3D/Metal underneath to actually implement itself.

So does WebGL and it's doing perfectly fine without pipelines. They were never necessary. Since WebGL can do without pipelines, WebGPU can too. Backends can implement via pipelines, or they can go for the modern route and ignore them.

They are an artificial problem that Vulkan created and WebGPU mistakenly adopted, and which are now being phased out. Some devices may refuse to implement pipeline-free drivers, which is okay. I will happily ignore them. Let's move on into the 21st century without that design mistake, and let legacy devices and companies that refuse to adapt die in dignity. But let's not let them hold back everyone else.

pjmlp 12/17/2025||||
My biggest issues with WebGPU are, yet another shading language, and after 15 years, browser developers don't care one second for debugging tools.

It is either pixel debugging, or trying to replicate in native code for proper tooling.

m-schuetz 12/17/2025||
Ironically, WebGPU was way more powerful about 5 years ago before WGSL was made mandatory. Back then you could just use any Spirv with all sorts of extensions, including stuff like 64bit types and atomics.

Then wgsl came and crippled WebGPU.

p_l 12/17/2025|||
My understanding is that pipelines in Vulkan still matter if you target certain GPUs though.
m-schuetz 12/17/2025||
At some point, we need to let legacy hardware go. Also, WebGL did just fine without pipelines, despite being mapped to Vulkan and DirectX code under the hood. Meaning WebGPU could have also worked without pipelines just fine as well. The backends can then map to whatever they want, using modern code paths for modern GPUs.
p_l 12/17/2025|||
Quoting things I only heard about, because I don't do enough development in this area, but I recall reading that it impacted performance on pretty much every mobile chip (discounting Apple's because there you go through a completely different API and they got to design the hw together with API).

Among other things, that covers everything running on non-apple, non-nvidia ARM devices, including freshly bought.

p_l 12/18/2025||
After going through a bunch of docs and making sure I had the right reference.

The "legacy" part of Vulkan that everyone on desktop is itching to drop (including popular tutorials) is renderpasses... which remain critical for performance on tiled GPUs where utilization of subpasses means major performance differences (also, major mobile GPUs have considerable differences in command submission which impact that as well)

m-schuetz 12/18/2025||
Also pipelines and bindings. BDA, shader objects and dynamic rendering are just way better than the legacy Vulkan without these features.
flohofwoe 12/17/2025|||
> Also, WebGL did just fine without pipelines, despite being mapped to Vulkan and DirectX code under the hood.

...at the cost of creating PSOs at random times which is an expensive operation :/

m-schuetz 12/17/2025||
No longer an issue with dynamic rendering and shader objects. And never was an issue with OpenGL. Static pipelines are an artificial problem that Vulkan imposed for no good reason, and which they reverted in recent years.
MindSpunk 12/22/2025|||
That's not at all what dynamic rendering is for. Dynamic rendering avoids creating render pass objects, and does nothing to solve problems with PSOs. We should be glad for the demise of render pass objects, they were truly a failed experiment and weren't even particularly effective at their original goal.

Trying to say pipelines weren't a problem with OpenGL is monumental levels of revisionism. Vulkan (and D3D12, and Metal) didn't invent them for no reason. OpenGL and DirectX drivers spent a substantial amount of effort to hide PSO compilation stutter, because they still had to compile shader bytecode to ISA all the same. They were often not successful and developers had very limited tools to work around the stutter problems.

Often older games would issue dummy draw calls to an off screen render target to force the driver to compile the shader in a loading screen instead of in the middle of your frame. The problem was always hard, you could just ignore it in the older APIs. Pipelines exist to make this explicit.

The mistake Vulkan made was putting too much state in the pipeline, as much of that state is dynamic in modern hardware now. As long as we need to compile shader bytecode to ISA we need some kind of state object to represent the compiled code and APIs to control when that is compiled.

flohofwoe 12/17/2025|||
Going entirely back to the granular GL-style state soup would have significant 'usability problems'. It's too easy to accidentially leak incorrect state from a previous draw call.

IMHO a small number of immutable state objects is the best middle ground (similar to D3D11 or Metal, but reshuffled like described in Seb's post).

m-schuetz 12/17/2025||
Not using static pipelines does not imply having to use a global state machine like OpenGL. You could also make an API that uses a struct for rasterizer configs and pass it as an argument to a multi draw call. I would have actually preferred that over all the individual setters in Vulkan's dynamic rendering approach.
xyzsparetimexyz 12/16/2025|||
Who cares about dev friction in the beginning? That was a bad choice.
ksec 12/16/2025||
I wonder why M$ stopped putting out new Direct X? Direct X Ultimate or 12.1 or 12.2 is largely the same as Direct X 12.

Or has the use of Middleware like Unreal Engine largely made them irrelevant? Or should EPIC put out a new Graphics API proposal?

pjmlp 12/16/2025||
That has always been the case, it is mostly FOSS circles that argue about APIs.

Game developers create a RHI (rendering hardware interface) like discussed on the article, and go on with game development.

Because the greatest innovation thus far has been ray tracing and mesh shaders, and still they are largely ignored, so why keep on pushing forward?

djmips 12/16/2025||
I disagree that ray tracing and mesh shaders are largely ignored - at least within AAA game engines they are leaned on quite a lot. Particularly ray tracing.
pjmlp 12/17/2025||
Game engines aren't games, or sales.
reactordev 12/16/2025|||
Both-ish.

Yes, the centralization of engines to Unreal, Unity, etc makes it so there’s less interest in pushing the boundaries, they are still pushed just on the GPU side.

From a CPU API perspective, it’s very close to just plain old buffer mapping and go. We would need a hardware shift that would add something more to the pipeline than what we currently do. Like when tesselation shaders came about from geometry shader practices.

djmips 12/16/2025||
The frontier of graphics APIs might be the consoles and they don't get a bump until the hardware gets a bump and the console hardware is a little bit behind.
klaussilveira 12/16/2025||
NVIDIA's NVRHI has been my favorite abstraction layer over the complexity that modern APIs bring.

In particular, this fork: https://github.com/RobertBeckebans/nvrhi which adds some niceties and quality of life improvements.

wg0 12/16/2025||
Very well written but I can't understand much of this article.

What would be one good primer to be able to comprehend all the design issues raised?

adrian17 12/16/2025||
IMO the minimum is to be able to read a “hello world / first triangle” example for any of the modern graphics APIs (OpenGL/WebGL doesn’t count, WebGPU does), and have a general understanding of each step performed (resource creation, pipeline setup, passing data to shaders, draws, synchronization). Also to understand where the pipeline explosion issue comes from.

Bonus points if you then look at CUDA “hello world” and consider that it can do nontrivial work on the same hardware (sans fixed function accelerators) with much less boilerplate (and driver overhead).

arduinomancer 12/16/2025|||
To be honest there isn't really one, a lot of these concepts are advanced even for graphics programmers
cmovq 12/17/2025|||
https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-...
jplusequalt 12/17/2025||
A working understanding of legacy graphics APIs, GPU hardware, and some knowledge of Vulkan/DirectX 12/CUDA.

I have all of that but DX12 knowledge, and 50% of this article still went over my head.

modeless 12/16/2025||
I don't understand this part:

> Meshlet has no clear 1:1 lane to vertex mapping, there’s no straightforward way to run a partial mesh shader wave for selected triangles. This is the main reason mobile GPU vendors haven’t been keen to adapt the desktop centric mesh shader API designed by Nvidia and AMD. Vertex shaders are still important for mobile.

I get that there's no mapping from vertex/triangle to tile until after the mesh shader runs. But even with vertex shaders there's also no mapping from vertex/triangle to tile until after the vertex shader runs. The binning of triangles to tiles has to happen after the vertex/mesh shader stage. So I don't understand why mesh shaders would be worse for mobile TBDR.

I guess this is suggesting that TBDR implementations split the vertex shader into two parts, one that runs before binning and only calculates positions, and one that runs after and computes everything else. I guess this could be done but it sounds crazy to me, probably duplicating most of the work. And if that's the case why isn't there an extension allowing applications to explicitly separate position and attribute calculations for better efficiency? (Maybe there is?)

Edit: I found docs on Intel's site about this. I think I understand now. https://www.intel.com/content/www/us/en/developer/articles/g...

Yes, you have to execute the vertex shader twice, which is extra work. But if your main constraint is memory bandwidth, not FLOPS, then I guess it can be better to throw away the entire output of the vertex shader except the position, rather than save all the output in memory and read it back later during rasterization. At rasterization time when the vertex shader is executed again, you only shade the triangles that actually went into your tile, and the vertex shader outputs stay in local cache and never hit main memory. And this doesn't work with mesh shaders because you can't pick a subset of the mesh's triangles to shade.

It does seem like there ought to be an extension to add separate position-only and attribute-only vertex shaders. But it wouldn't help the mesh shader situation.

yuriks 12/17/2025|
I thought that the implication was that the shader compiler produces a second shader from the same source that went through a dead code elimination pass which maintains only the code necessary to calculate the position, ignoring other attributes.
modeless 12/17/2025||
Sure, but that only goes so far, especially when users aren't writing their shaders with knowledge that this transform is going to be applied or any tools to verify that it's able to eliminate anything.
hrydgard 12/17/2025|||
Well, it is what is done on several tiler architectures, and it generally works just fine. Normally your computations of the position aren't really intertwined with the computation of the other outputs, so dead code elimination does a good job.
kasool 12/17/2025|||
Why would it be difficult? There are explicit shader semantics to specify output position.

In fact, Qualcomm's documentation spells this out: https://docs.qualcomm.com/nav/home/overview.html?product=160...

blakepelton 12/16/2025||
Great post, it brings back a lot of memories. Two additional factors that designers of these APIs consider are:

* GPU virtualization (e.g., the D3D residency APIs), to allow many applications to share GPU resources (e.g., HBM).

* Undefined behavior: how easy is it for applications to accidentally or intentionally take a dependency on undefined behavior? This can make it harder to translate this new API to an even newer API in the future.

qingcharles 12/17/2025||
I started my career writing software 3D renderers before switching to Direct3D in the later 90s. What I wonder is if all of this is going to just get completely washed away and made totally redundant by the incoming flood of hallucinated game rendering?

Will it be possible to hallucinate the frame of a game at a similar speed to rendering it with a mesh and textures?

We're already seeing the hybrid version of this where you render a lower res mesh and hallucinate the upscaled, more detailed, more realistic looking skin over the top.

I wouldn't want to be in the game engine business right now :/

cubefox 12/18/2025||
It is more likely that machine learning models will be used by the game artists for asset generation, but not for rendering those assets at the client side, which would be extremely expensive.

But another upcoming use case of ML on the client side is neural texture compression, which somehow needs not just less storage but also less RAM. Though it comes at a computational (frame time) cost on the client side, though not as bad as generative AI.

Neural mesh compression could be another potential thing we get in the future. (All lossy compression seems to go in the ML direction: currently there is a lot of work going on with next generation neural audio and video codecs. E.g. https://arxiv.org/abs/2502.20762)

jsheard 12/17/2025|||
You can't really do a whole lot of inference in 16ms on consumer hardware. Not to say that inference isn't useful in realtime graphics, DLSS has proven itself well enough, but that's a very small model laser-targetted at one specific problem and even that takes a few milliseconds to do its thing. Fitting behemoth generative models into those time constraints seems like an uphill battle.
webdevver 12/17/2025|||
reminds me of this remark made by Carmack on hidden surface removal

https://www.youtube.com/watch?v=P6UKhR0T6cs&t=2315s

> "research from the 70s especially, there was tons of work going on on hidden surface removal, these clever different algorithmic ways - today we just kill it with a depth buffer. We just throw megabytes and megabytes of memory and the problem gets solved much much easier."

ofcourse "megabytes" of memory was unthinkiable in the 70s. but for us, its unthinkable to have real-time frame inferencing. I cant help but draw the parallels between our current-day "clever algorithmic ways" of drawing pixels to the screen.

I definitely agree with the take that in the grand scheme of things, all this pixel rasterizing business will be a transient moment that will be washed away with a much simpler petaflop/exaflop local TPU that runs at 60W under load, and it simply 'dreams' frames and textures for you.

qingcharles 12/17/2025|||
Agree. If you look at the GPU in an iPhone 17 and compare to the desktop GPU I had in 1998, the difference is startling.

Voodoo in 1998 could render about 3m poly/sec on a Utah teapot, which was absurd number at the time, where I was coming from software renderers that were considered amazing at 100K/sec.

A19 Pro GPU could do about 5bn/sec at about 4X the resolution. And it fits in your pocket. And runs off a tiny battery. Which also powers the screen.

25 years from now a 5090 GPU will be laughably bad. I have no idea how fast we'll be able to hallucinate entire scenes, but my guess is that it'll be above 60fps.

aj_hackman 12/17/2025|||
What happens when you want to do something very new, or very specific?
8n4vidtmkvmk 12/17/2025||
I just assumed hallucinated rendering was a stepping stone to training AGIs or something. No one is actually seriously trying to build games that way, are they? Seems horribly inefficient at best, and incoherent at worst.
henning 12/16/2025||
This looks very similar to the SDL3 GPU API and other RHI libraries that have been created at first glance.
cyber_kinetist 12/17/2025|
If you look at the details you can clearly see SDL3_GPU is wildly different from this proposal, such as:

- It's not exposing raw GPU addresses, SDL3_GPU has buffer objects instead. Also you're much more limited with how you use buffers in SDL3 (ex. no coherent buffers, you're forced to use a transfer buffer if you want to do a CPU -> GPU upload)

- in SDL3_GPU synchronization is done automatically, without the user specifying barriers (helped by a technique called cycling: https://moonside.games/posts/sdl-gpu-concepts-cycling/),

- More modern features such as mesh shading are not exposed in SDL3_GPU, and keeps the traditional rendering pipeline as the main way to draw stuff. Also, bindless is a first class citizen in Aaltonen's proposal (and the main reason for the simplification of the API), while SDL3_GPU doesn't support it at all and instead opts for a traditional descriptor binding system.

Scaevolus 12/17/2025||
SDL3 is kind of the intersection of features found in DX12/Vulkan 1.0/Metal: if it's not easily supported in all of them, it's not in SDL3-- hence the lack of bindless support. That means you can run on nearly every device in the last 10-15 years.

This "no api" proposal requires hardware from the last 5-10 years :)

cyber_kinetist 12/17/2025||
Yup you've actually pointed out the most important difference: SDL3 is designed to be compatible with the APIs and devices of the past (2010s), whereas this proposal is designed to be compatible with the newer 2020s batch of consumer devices.
overgard 12/17/2025|
I'm kind of curious about something.. most of my graphics experience has been OpenGL or WebGL (tiny bit of Vulkan) or big engines like Unreal or Unity. I've noticed over the years the uptake of DX12 always seemed marginal though (a lot of things stayed on D3D11 for a really long time). Is Direct3D 12 super awful to work with or something? I know it requires more resource management than 11, but so does Vulkan which doesn't seem to have the same issue..
canyp 12/17/2025||
Most AAA titles are on DX12 now. ID is on Vulkan. E-sports titles remain largely on the DX11 camp.

What the modern APIs give you is less CPU driver overhead and new functionality like ray tracing. If you're not CPU-bound to begin with and don't need those new features, then there's not much of a reason to switch. The modern APIs require way more management than the prior ones; memory management, CPU-GPU synchronization, avoiding resource hazards, etc.

Also, many of those AAA games are also moving to UE5, which is basically DX12 under the hood (presumably it should have a Vulkan backend too, but I don't see it used much?)

kasool 12/17/2025||
UE5 has a fairly mature Vulkan backend but as you might guess is second class to DX12.
flohofwoe 12/17/2025||
> but so does Vulkan which doesn't seem to have the same issue

Vulkan has the same issues (and more) as D3D12, you just don't hear much about it because there are hardly any games built directly on top of Vulkan. Vulkan is mainly useful as Proton backend on Linux.

More comments...