Posted by ZenithExtreme 3 days ago
https://www.tomsguide.com/gaming/playstation/sonys-mark-cern...
2 vs 1 days
Next is UDNA1, a converged architecture with it's older sibling, CDNA (formerly GCN).
Like, the article actually states this, but runs an RDNA 5 headline anyways.
"Big chunks of RDNA 5, or whatever AMD ends up calling it, are coming out of engineering I am doing on the project"
Whats to stop sony being like we dont want UDNA 1, we want a iteration of RDNA 4.
For all we know, it IS RDNA 5... it just wont be available to the public.
CDNA was for HPC / Supercomputers and Data center. GCN always was a better architecture than RDNA for that.
RDNA itself was trying to be more NVidia like. Fewer FLOPs but better latency.
Someone is getting the axe. Only one of these architectures will win out in the long run, and the teams will also converge allowing AMD to consolidate engineers to improving the same architecture.
We won't know what the consolidated team will release yet. But it's a big organizational shift that surely will affect AMDs architectural decisions.
https://gpuopen.com/download/RDNA_Architecture_public.pdf
I've been showing this one to people for a few years as a good introduction on how RDNA diverged from GCN->CDNA.
The main thing they did was change where wavefront steps (essentially, quasi-VLIW packets) execute: instead of being at the head of the pipeline (which owns 4x SIMD16 ALUs = 64 items) and requires executing 64 threads concurrently (thus, 64x registers/LDS/etc space), it issues non-blocking segments of the packet into per-ALU sub-pipelines, requiring far fewer concurrent threads to maintain peak performance (and, in many cases, far less concurrent registers used for intermediates that don't leave the packet).
GCN is optimized for low instruction parallelism but high parallelism workloads. Nvidia since the dawn of their current architecture family tree has been optimized for high instruction parallelism but not simple highly parallel workloads. RDNA is optimized to handle both GCN-optimal and NVidia-optimal cases.
RDNA, since this document has been written, also has been removing all the roadblocks to improve performance on this fundamental difference. RDNA4, the one that just came out, increased the packet processing queue to be able to schedule more packets in parallel and more segments of the packets into their per-ALU slots, is probably the most influential change: in software that performed bad on all GPUs (GCN, previous RDNA, anything Nvidia), a 9070XT can perform like a 7900XTX with 2/3rds the watts and 2/3rds the dollars.
While CDNA has been blow for blow against Nvidia's offerings since it's name change, RDNA has eradicated the gap in gaming performance. Nvidia functionally doesn't have a desktop product below a 5090 now, and early series 60 rumors aren't spicy enough to make me think Nvidia has an answer in the future, either.
CDNA is 64 wide per work item. And CDNA1 I believe was even 16 lanes executed over 4 clock ticks repeatedly (ie: minimum latency of all operations, even add or xor, was 4 clock ticks). It looks like CDNA3 might not do that anymore but that's still a lot of differences...
RDNA actually executes 32-at-a-time and per clock tick. It's a grossly different architecture.
That doesn't even get to Infinity Cache, 64-bit support, AI instructions, Raytracing, or any of the other differences....
It seems that we are the stage where incremental improvements in graphics will require exponentially more computing capability.
Or the game engines have become super bloated.
Edit: I stand corrected in previous cycles we had orders of magnitude improvement in FLOPS.
Don't forget one reason that studios tend to favour consoles has been regular hardware, and that is no longer the case.
When middleware starts to be the option, it is relatively hard to have game features that are hardware specific.
This doesn’t affect me too much since my backlog is long and by the time I play games, they’re old enough that current hardware trivializes them, but it’s disappointing nonetheless. It almost makes me wish for a good decade or so of performance stagnation to curb this behavior. Graphical fidelity is well past the point of diminishing returns at this point anyway.
Compare PS1 with PS3 (just over 10 years apart).
PS1: 0.03 GFLOPS (approx given it didn't really do FLOPS per se) PS3: 230 GFLOPS
Nearly 1000x faster.
Now compare PS4 with PS5 pro (also just over 10 years apart):
PS4: ~2TFLOPS PS5 Pro: ~33.5TFLOPS
Bit over 10x faster. So the speed of improvement has fallen dramatically.
Arguably you could say the real drop in optimization happened in that PS1 -> PS3 era - everything went from hand optimized assembly code to running (generally) higher level languages and using abstrated graphics frameworks like DirectX and OpenGL. Just noone noticed because we had 1000x the compute to make up for it :)
Consoles/games got hit hard by first crypto and now AI needing GPUs. I suspect if it wasn't for that we'd have vastly cheaper and vastly faster gaming GPUs, but when you were making boatloads of cash off crypto miners and then AI I suspect the rate of progress fell dramatically for gaming at least (most of the the innovation I suspect went more into high VRAM/memory controllers and datacentre scale interconnects).
Like all this path tracing/ray tracing stuff, yes it is very cool and can add to a scene but most people can barely tell it is there unless you show it side by side. And that takes a lot of compute to do.
We are polishing an already very polished rock.
I agree that 10x doesn't move much, but that's sort of my point - what could be done with 1000x?
One potential forcing factor may be the rise of iGPUs, which have become powerful enough to play many titles well while remaining dramatically more affordable than their discrete counterparts (and sometimes not carrying crippling VRAM limits to boot), as well as the growing sector of PC handhelds like the Steam Deck. It’s not difficult to imagine that iGPUs will come to dominate the PC gaming sphere, and if that happens it’ll be financial suicide to not make sure your game plays reasonably well on such hardware.
And given most of these assets are human made (well, until very recently) this requires more and more artists. So I wonder if games studios are more just art studios with a bit of programming bolted on, vs before with lower res graphics where you maybe had one artist for 10 programmers, now it is more flipped the other way. I feel that at some point over the past ~decade we hit a "organisational" wall with this and very very few studios can successfully manage teams of hundreds (thousands?) of artists effectively?
Many AAA engine's number one focus isn't "performance at all costs", it's "how do we most efficiently let artists build their vision". And efficiency isn't runtime performance, efficiency is how much time it takes for an artist to create something. Performance is only a goal insofar as to free artists from being limited by it.
> So I wonder if games studios are more just art studios with a bit of programming bolted on.
Not quite, but the ratio is very in favor of artists compared to 'the old days'. Programming is still a huge part of what we do. It's still a deeply technical field, but often "programming workflows" are lower priority than "artist workflows" in AAA engines because art time is more expensive than programmer time from the huge number of artists working on any one project compared to programmers.
Just go look at the credits for any recent AAA game. Look at how many artists positions there are compared to programmer positions and it becomes pretty clear.
It used to be that the technology tended to drive the art. Nowadays the art drives the tech. We only need to look at all the advertised features of UE5 to see that. Nanite allows artists to spend less time tweaking LODs and optimizing meshes as well as flattening the cost of small triangle rendering. Lumen gives us realtime global illumination everywhere so artists don’t have to spend a million hours baking multiple light maps. Megalights lifts restrictions on the number of dynamic lights and shadows a lighting artist can place in the scene. The new Nanite foliage shown off in the Witcher 4 allows foliage artists to go ham with modeling their trees
If “realistic” graphics are the objective though, then yes, better displays pose serious problems. Personally I think it’s probably better to avoid art styles that age like milk, though, or to go for a pseudo-realistic direction that is reasonably true to life while mixing in just enough stylization to scale well and not look dated at record speeds. Japanese studios seem pretty good at this.
It's no wonder nothing comes out in a playable state.
This feels very out of touch since AMD's latest GPU series is specialized in gaming only, to the point where they sell variants with 8GB, which is becoming a bit tight if you want to play modern games.
Maybe / Kind of. Consoles in the PS1/N64 they were not running optimised assembly code. The 8bit and 16 bit machines were.
As for DirectX / OpenGL / Glide actually massively improved performance over running stuff on the CPU. You only ran stuff with software rendering if you had a really low performance GPU. Just look at Quake running in software vs Glide. It easily doubles on a Pentium based system.
> Consoles/games got hit hard by first crypto and now AI needing GPUs. I suspect if it wasn't for that we'd have vastly cheaper and vastly faster gaming GPUs, but when you were making boatloads of cash off crypto miners and then AI I suspect the rate of progress fell dramatically for gaming at least (most of the the innovation I suspect went more into high VRAM/memory controllers and datacentre scale interconnects).
The PC graphics card market got hit hard by those. Console markets were largely unaffected. There are many reasons why performance has stagnated. One of them I would argue is the use of the Unreal 4/5 engine. Every game that runs either of these engines has significant performance issues. Just look at Star wars: Jedi Survivor and the previous game Star wars Jedi: Fallen Order. Both games run poorly even on a well spec'd PC and even runs poorly on my PS5. Doesn't really matter though as Jedi Survivor sold well and I think Fallen Order also sold well.
The PS5 is basically a fixed PS4 (I've owned both). They've put a lot of effort into the PS5 into reducing loading times. Loading times on the PS4 were painful and were far longer than the PS3 (even games loading from Bluray). This was something Sony was focusing on. Every presentation about the PS5 talked about the new NVME drives and the external drive and the requirements for it.
The other reason is that the level of graphical fidelity achieved in the mid-2000s to early-2010s is good enough. A lot of reasons why some games age worse than others is due to the art style, rather than the graphical fidelity. Many of the high earning games don't have state of the art graphics e.g Fortnite prints cash and the graphics are pretty bad IMO.
Performance and Graphics just isn't the focus anymore. It doesn't really sell games like it used to.
Even Naughty Dog went with their own LISP engine for optimization versus ASM.
And this thread comes full circle: Mark Cerny actually significantly improved the performance of my original version of the Crash collision detection R3000 code. His work on this code finally made it fast enough, so it’s a really good thing he was around to help out. Getting the collision detection code correct and fast enough took over 9 months —- it was very difficult on the PS1 hardware, and ended up requiring use of the weird 2K static RAM scratchpad Sony including in place of the (removed) floating point unit.
GOOL was mainly used for creature control logic and other stuff that didn’t have to be optimized so much to be feasible. Being able to use a lisp dialect for a bunch of the code in the game saved us a ton of time. The modern analogue would be writing most of the code in Python but incorporating C extensions when necessary for performance.
Andy made GOAL (the successor lisp to GOOL) much more low-level, and it indeed allowed coding essentially at the assembly level (albeit with lispy syntax). But GOOL wasn’t like this.
Additionally, I have written my own PSX software as well as reviewed plenty of contemporaneous PSX software. While many have some bit of assembler, it's usually specifically around the graphics pipeline. About 90+% of all code is C. This is in line with interviews from developers at the time, as well.
The point wasn't that ASM wasn't used at all (in fact, I specifically acknowledged it in my original post), it was that the PSX was in an era passed the time when entire codebases were hand massaged/tuned assembler (e.g. "the 16-bit era" and before).
My understanding is that the mental model of programming in PS2 era was originally still very assembly like outside of few places (like Naughty Dog) and that GTA3 on PS2 made possibly its biggest impact by showing it's not necessary.
The vast majority of PSX games were done completely in C, period. Some had small bits of asm here and there, but so do the occasional modern C/C++ apps.
To your last point, before there was GOAL there was GOOL (from the horse's mouth itself):
https://all-things-andy-gavin.com/tag/lisp-programming/
And it was used in all of Naughty Dog's PSX library.
Because outside of ports from PC, large amount of console game developers at the time were experienced a lot with with programming earlier consoles which had a lot more assembly level coding involved. GTA3 proved that "PC style" engine was good enough despite Emotion Engine design.
Didn't help that PS2 was very much oriented towards assembly coding at pretty low level, because getting the most of the hardware involved writing code for the multiple coprocessors to work somewhat in-sync - which at least for GOAL was done by implementing special support for writing the assembly code in line with rest of the code (because IIRC not all assembly involved was executed from the same instruction stream)
As for GOOL, it was the way more classic approach (used by ND on PS3 and newer consoles too) of core engine in C and "scripting" language on top to drive gameplay.
You could read that in pretty much any book about C, until the mid-00s. C was called "portable assembler" for the longest time because it went against the grain of ALGOL, Fortran, Pascal, etc by encouraging use of pointers and being direct to the machine. Thus why it only holds a viability in embedded development these days.
I've written C on the PSX, using contemporaneous SDKs and tooling, and I've reviewed source code from games at the time. There's nothing assembler about it, at least not more so than any systems development done then or today. If you don't believe me, there are plenty of retail PSX games that accidentally released their own source code that you can review yourself:
https://www.retroreversing.com/source-code/retail-console-so...
You're just arguing for the sake of arguing at this point and, I feel, being intellectually dishonest. Believe what you'd like to believe, or massage the facts how you like; I'm not interested in chasing goal (heh) posts.
I am a bit uncomfortable with the performance/quality stuff that people have set up but I personally feel that the quality floor for perf is way higher than it used to be. Though there seem to be less people parking themselves at "60fps locked", which felt like a thing for a while
That said, you should read that I did not say Moore’s Law was entirely dead. It is dead for SRAM and IO logic, but is still around for compute logic. However, pricing is shooting upward with each die shrink far faster than it did in the past.
Your loss.
As brownie points, keep the GPU busy as well, beyond twirling its fingers while keeping the GUI desktop going.
Even more points if the CPU happens to have a NPU or integrated FPGA, and you manage to also keep them going alongside those 32 cores, and GPU.
Switching an iter to par_iter does this. So long as there are enough iterations to work through, it'll exhaust 1024 cores or more.
> all the time, not only at peak execution.
What are you doing that keeps a desktop or phone at 100% utilization? That kind of workload exists in datacenters, but end user devices are inherently bursty. Idle when not in use, race to idle while in use.
> As brownie points, keep the GPU busy as well... Even more points if the CPU happens to have a NPU or integrated FPGA
In a recent project I serve a WASM binary from an ESP32 via Wifi / HTTP, which makes use of the GPU via WebGL to draw the GUI, perform CSG, calculate toolpaths, and drip feed motion control commands back to the ESP. This took about 12k lines of Rust including the multithreaded CAD library I wrote for the project, only a couple hundred lines of which are gated behind the "parallel" feature flag. It was way less work than the inferior C++ version I wrote as part of the RepRap project 20 years ago. Hence my stance that software has become increasingly sophisticated.
https://github.com/timschmidt/alumina-firmware
https://github.com/timschmidt/alumina-ui
https://github.com/timschmidt/csgrs
What's your point?
Most consumer software even less, hence why anyone will hardly see a computer on the shopping mall with higher than 16 core count, and on average most shops will have something between 4 and 8.
Also a reason why systems with built-in FPGAs failed in the consumer market, specialised tools without consumer software to help sell them.
If your workload demands 24/7 100% CPU usage, Epyc and Xeon are for you. There you can have multiple sockets with 256 or more cores each.
> Most consumer software even less
And yet, even in consumer gear which is built to a minimum spec budget, core counts, memory capacity, pcie lanes, bus bandwidth, IPC, cache sizes, GPU shaders, NPU TOPS, all increasing year over year.
> systems with built-in FPGAs failed in the consumer market
Talk about niche. I've never met an end user with a use for an FPGA or the willingness to learn what one is. I'd say that has more to do with it. Write a killer app that regular folks want to use that requires one, and they'll become popular. Rooting for you.
Cyberpunk is a good example of a game that straddled the in between, many of it's performance problems on the PS4 were due to constrained serialization speed.
Nanite and games like FF16 and Death Stranding 2 do a good job of drawing complex geometry and textures that wouldn't be possible on the previous generation
It’s also completely optional in Unreal 5. You use it if it’s better. Many published UE5 games don’t use it.
Small RAM space with the hard CPU/GPU split (so no reallocation) feeding off a slow HDD which is being fed by an even slower Bluray disc, you are sitting around for a while.
So there literally were no "loading" times for these assets. This might not even be realistically possible with NAND flash based SSDs, e.g. because of considerations like latency.
Though directly accessing ROM memory would also prevent things like texture block compression I believe.
The only thing I value is a consistent stream of frames on a console.
From PS5 Pro reveal https://youtu.be/X24BzyzQQ-8?t=172
I wonder if players of single player action/adventure games make the same choice. Those games are played less (can be finished in 10-30 hours instead of endlessly) so the statistics might be skewed to favor performance mode.
Anecdotally, I do. Because modern displays are horrible blurry messes at lower framerates. I don't care about my input latency, I care about my image not being a smear every time the camera viewport moves.
I'm sure it would have been even more successful with modern 60 FPS, but that difference couldn't have been very large, because other 60 FPS games did exist back then as well, mostly without being nearly as popular.
Excessively high detail models require extra artist time too.
The path nowadays is to use all kinds of upscaling and temporal detail junk that is actively recreating late 90s LCD blur. Cool. :(
There have been a few decent sized games, but nothing at grand scale I can think of, until GTA6 next year.
For graphics, I agree it looks like diminishing returns.
https://www.gamespot.com/gallery/console-gpu-power-compared-...
"Bloated" might be the wrong word to describe it, but there's some reason to believe that the dominance of Unreal is holding performance back. I've seen several discussions about Unreal's default rendering pipeline being optimized for dynamic realtime photorealistic-ish lighting with complex moving scenes, since that's much of what Epic needs for Fortnite. But most games are not that and don't make remotely effective use of the compute available to them because Unreal hasn't been designed around those goals.
TAA (temporal anti-aliasing) is an example of the kind of postprocessing effect that gamedevs are relying on to recover performance lost in unoptimized rendering pipelines, at the cost of introducing ghosting and loss of visual fidelity.
Your other options for AA are
* Supersampling. Rendering the game at a higher resolution than the display and downscaling it. This is incredibly expensive.
* MSAA. This samples ~~vertices~~surfaces more than once per pixel, smoothing over jaggies. This worked really well back before we started covering every surface with pixel shaders. Nowadays it just makes pushing triangles more expensive with very little visual benefit, because the pixel shaders are still done at 1x scale and thus still aliased.
* Post-process AA (FXAA,SMAA, etc). These are a post-process shader applied to the whole screen after the scene has been fully rendered. They often just use a cheap edge detection algorithm and try to blur them. I've never seen one that was actually effective at producing a clean image, as they rarely catch all the edges and do almost nothing to alleviate shimmering.
I've seen a lot of "tech" YouTubers try to claim TAA is a product of lazy developers, but not one of them has been able to demonstrate a viable alternative antialiasing solution that solves the same problem set with the same or better performance. Meanwhile TAA and its various derivatives like DLAA have only gotten better in the last 5 years, alleviating many of the problems TAA became notorious for in the latter '10s.
It's more similar to supersampling, but without the higher pixel shader cost (the pixel shader still only runs once per "display pixel", not once per "sample" like in supersampling).
A pixel shader's output is written to multiple (typically 2, 4 or 8) samples, with a coverage mask deciding which samples are written (this coverage mask is all 1s inside a triangle and a combo of 1s and 0s along triangle edges). After rendering to the MSAA render target is complete, an MSAA resolve operation is performed which merges samples into pixels (and this gives you the smoothed triangle edges).
In the past, MSAA worked reasonably well, but it was relatively expensive, doesn't apply to all forms of high frequency aliasing, and it doesn't work anymore with the modern rendering paradigm anyway.
The games industry has spent the last decade adopting techniques that misleadingly inflate the simple, easily-quantified metrics of FPS and resolution, by sacrificing quality in ways that are harder to quantify. Until you have good metrics for quantifying the motion artifacts and blurring introduced by post-processing AA, upscaling, and temporal AA or frame generation, it's dishonest to claim that those techniques solve the same problem with better performance. They're giving you a worse image, and pointing to the FPS numbers as evidence that they're adequate is focusing on entirely the wrong side of the problem.
That's not to say those techniques aren't sometimes the best available tradeoff, but it's wrong to straight-up ignore the downsides because they're hard to measure.
Fully dynamic interactive environments are liberating. Pursuing them in is the right thing to do.
> Fully dynamic interactive environments are liberating. Pursuing them in is the right thing to do.
great video from digital foundry that goes into that (for doom: the dark ages)I mean, look at Uncharted, Tomb Raider, Spider-Man, God of War, TLOU, HZD, Ghost of Tsushima, Control, Assassins Creed, Jedi Fallen Order / Survivor. Many of those games were not made in Unreal, but they're all stylistically well suited to what Unreal is doing.
So, Mark Cerny is contributing on the next Xbox? At the end, today all consoles are basically PCs with different frontends and storefronts (and that is also opening up, starting with xbox but probably PS will follow eventually)
I feel like finally they are turning the corner on software and drivers.