Top
Best
New

Posted by allenleee 19 hours ago

RTX 5090 and M4 MacBook Air: Can It Game?(scottjg.com)
613 points | 145 comments
matthewfcarlson 18 hours ago|
I have been bothering the VM team for years for VM GPU pass through. I worked on the Apple Silicon Mac Pro and it would have made way more sense if you could run a linux VM and pass through the GPU that goes inside the case!

Sadly, as you can tell, they have not taken me up on my requests. Awesome that other people got it working!

m132 18 hours ago||
It looks like the pass through part here was implemented using standard DriverKit interfaces, if I'm not mistaken. That is, the PCIe BAR can already be mapped from the user-space, without any extra modifications to macOS. It's just a matter of VMMs, such as QEMU, adopting this interface in addition to Linux VFIO and the like (unless you're talking about Virtualization.framework, which is kind of a VMM of its own).

What exactly do you feel macOS is missing?

anp 17 hours ago|||
I’m not very familiar with the specifics of pass through but IIUC only being able to map 1.5gb of active DMA buffers at a time is pretty limiting.
monocasa 11 hours ago||||
Isn't driverkit essentially a separate user space stack compared to regular code? I remember seeing the driverkit specific dyld caches in macos root partition images that included their own copies of everything down to libsystem. Getting driverkit code to run in the same process as normal user code seems like it'd be quite an uphill battle.

Presumably with the right entitlements you can just hit the same (presumably IOKit) syscalls that driverkit does. But that's an extra layer of reverse engineering, and you're not really using driverkit anymore.

scottjg 10 hours ago||
it is a separate stack, but that probably doesn't matter much. a user process (in my case, qemu) can communicate with a driverkit driver. the user process can also map memory through the driver, which is how this pci passthrough system works.

i don't think the issues with the project really are specific to driverkit.

mikae1 15 hours ago|||
>> This project requires a special entitlement from Apple. I’ve requested it, and heard they may be open to granting it, but I have not yet heard back, and I’m told that the wait time could be months.

> I have been bothering the VM team for years for VM GPU pass through.

Good luck. I'm sure they're keen on giving people access to this so that people can spend their money on NVIDIA GPUs instead of buying more expensive Macs. :)

Would of course be awesome, but I'd be very surprised if it happened.

codebje 12 hours ago|||
There isn't a more expensive Mac option to buy if what you're after is a gaming GPU. It's more likely that the VM team sees this as a very low benefit ticket to pursue given the tiny segment of Mac gamers hoping to improve their options with a Linux VM for gaming.

(Meanwhile, I'm recompiling Wine to see if I can patch it to address an issue that was hotfixed in Proton two weeks ago but isn't in a CrossOver build yet, so yeah, there's maybe some arguments to be made here that I'd be a potential beneficiary. If I weren't too cheap to spring for an eGPU in today's market, anyway.)

m132 11 hours ago|||
The entitlement in question is the standard `com.apple.developer.driverkit.transport.pci` [0], required for anything that touches the PCIe bus [1]. Apple is generally restrictive with how much third-party applications can do on machines with SIP/"full security", so I'm not exactly surprised. It's not an Apple-private entitlement, however.

The VFIO-style driver made by the author of this also appears generic enough to support all kinds of PCIe, not just GPUs. Apple might find a way to weasel out of this ("hey, this is for hardware companies and you don't seem to be affiliated with one", "your driver requests too broad access", etc.) if there really is a conflict of interest, but so far, there's a chance it will just get rubber-stamped.

I can see them rejecting it for legitimate reasons, though, at least as far as "legitimate" with Apple goes. This driver is essentially a thin layer over PCIDriverKit, exposing all functionality that's supposed to be behind the entitlement to arbitrary applications, in similar fashion to WinRing0. They probably didn't come up with all this bureaucracy only to sign something like that in the end. We'll see what happens.

[0] https://github.com/scottjg/qemu-vfio-apple/blob/84ecdcf5db6b...

[1] https://developer.apple.com/documentation/pcidriverkit/creat...

scottjg 16 hours ago|||
two semi interesting things to note around this:

1. Virtualization.framework seems to support some form of GPU passthrough from the host (granted, not eGPU - it's for the integrated GPU). I think the primary use case is having macOS guests get acceleration, while still sharing GPU time with the host. There is also a patch that recently hit QEMU mainline that supports using the "venus server" with virtio-gpu to support a similar functionality for Linux guests under Hypervisor.framework.

2. Apple internally has some kind of PCI Passthrough support available in Virtualization.framework. It seems like the code is shipped to customers in the framework, but it relies on some kind of kext or kernel component that isn't shipped in retail macOS. I can't say if that's intended to ever be released to customers, but clearly someone at Apple has thought about this the feature.

m132 11 hours ago||
I experimented with booting Arm macOS 14-26 in QEMU a while back, building on the work of Alexander Graf for macOS 12-13, and reverse-engineered substantial parts of Hypervisor.framework, the in-kernel hypervisor, and a bit of Virtualization.framework. Got newer versions of Sequoia to boot past the log in screen, with GPU acceleration too.

Unless there's another method I missed, the internal GPU "pass through" of Virtualization.framework you're thinking of might actually just be paravirualization, at least that's what the name suggests. It's implemented in the public ParavirtualizedGraphics framework [0], albeit for PG on Arm macOS, the relevant interfaces are private [1]. I haven't looked that deep into it per se, but, fixing the bugs around it, I've run into a few clues suggesting that it's just a command stream + shared memory being passed around. It also uses its own generic driver on the guest side.

Great job, by the way! Love how authors of pieces like this casually come here to comment :)

[0] https://developer.apple.com/documentation/paravirtualizedgra...

[1] https://github.com/qemu/qemu/blob/edcc429e9e41a8e0e415dcdab6...

my123 1 hour ago|||
FYI: https://patchew.org/QEMU/20260324204855.29759-1-mohamed@unpr...

There's some randomness around Tahoe for FileVault and it crashing because Data is detected as not encrypted (and that's not OK on bare metal). If hitting that case you might need to enable FileVault inside the VM (and remember to sync aux storage afterwards if not done)

scottjg 10 hours ago|||
thanks!

there also appears to be a generic pci passthrough path. we were discussing it on the qemu-devel list: https://lore.kernel.org/qemu-devel/C35B5E97-73F2-4A60-951B-B...

m132 10 hours ago||
Oh, thanks for letting me know, and for the upstreaming work too! I might join the party once I find some more time :)
caycep 17 hours ago|||
What are the chances there will be another Mac Pro in the future?

Will Apple ever make a computer that makes Siracusa happy? (and do you have the "Believe" shirt?)

pjmlp 17 hours ago|||
Never, a couple of years ago Apple gave up on the server market, that is why having Swift on Linux is so relevant for app developers.

Now they gave up on the workstation market that really enjoys their slots for all myriad of cards.

Having a thunderbolt cable salad is only for those that miss external extensions from 8 and 16 bit home computer days.

Which is clearly what Apple is nowadays focused, if you look back at the vertical integrations before the PC clones market took off.

So now if you really need a workstation, it is either Windows, or one of those systems sold with Red-Hat Enterprise/Ubuntu from IBM, Dell , HP.

hedora 16 hours ago||
If you want a workstation, you are probably better off building it yourself, or having your local computer store do it. The primary exceptions are AMD strix halos or the nvidia dgx spark.

I haven’t seen a non-laughable workstation config from the big vendors since the dot com bubble. Presumably they exist, I guess?

binarycrusader 15 hours ago|||
DISCLAIMER: Only speaking for myself, not employers or affiliates.

I've been pretty darn happy with the Puget Systems custom workstation I ordered last year before the memory craze started (especially since it has 192GiB of DDR5).

I also ordered another family member a custom "Tiki" system from Falcon Northwest and that has also been quite excellent from what I've seen and they've told me.

Now is obviously not the most economical time to order a new system, but when it is appropriate (and for what it's worth) I think those are two great system builders.

hedora 14 hours ago||
I wouldn’t count them as a big vendor, but I’ve only heard good things. Local shops around here charge like $99 to put a machine together, install an OS and run burn in testing. You get more choice than an outfit like puget, but less carefully tested part / cooling selection, etc.

The last I checked, the really big players tended to add value add gimmicks (water cooling is a common one, custom psu form factors are another) with reliability / compatibility issues. That’s the tier to avoid, not the Puget systems of the world.

binarycrusader 6 hours ago|||
I picked both Puget Systems and Falcon Northwest because for the most part, both focus on pre-tested off-the-shelf parts with good reliability data from their own servicing.

My Puget Systems workstation for example has a simple AIO for cooling with some Noctua fans and a Fractal Design 7 XL full tower case.

The Tiki system I ordered for a family member from Falcon Northwest does have a custom case, but almost everything else is fairly standard inside. The super small form factor was important to them.

Could I have built either of these systems myself? Absolutely -- I've done that for at least prior 20 years or so, and I've built dozens for employers, but it sure was nice to buy one that just worked this time instead of having to having to fiddle with memory sticks or find exactly the right bios settings for stability, etc.

I'm well aware of the premium I paid but I can honestly say it has been incredibly nice to have a workstation that just works without having to fiddle with bios updates or hardware. I also don't really have the time to spare so I was entirely willing to trade funds for time.

fluoridation 13 hours ago|||
Non-standard parts are not about value-adding, they're about cost-cutting if you're feeling charitable, and about forcing vendor lock-in if you're not.
pjmlp 15 hours ago||||
Yes they exist, and business aren't building PCs from parts themselves.
esseph 15 hours ago||||
They get features that us plebs buying retail don't get, at prices the vast majority of us wouldn't pay if it were our own cash.
fragmede 15 hours ago|||
Just because you're cheap and don't value your time, doesn't mean they don't exist.
dwaite 14 hours ago||||
IMHO - extremely little.

It is too inefficient to design a machine which _might_ have two GPU and a flock of additional drives installed into it. It just makes sense to instead design around having independent hardware in its own case, which can meet its own power/cooling needs. This has been a design goal since the trashcan Mac.

Having a PCIe bus increases bandwidth and reduces latency, but once you account for eGPU and for people who would be happy building custom solutions on platforms other than macOS, there's likely not enough identified market for a modular design.

kahrl 17 hours ago|||
[flagged]
crdrost 18 hours ago|||
It feels like half the problem in this blog post is dealing with memory access issues induced by QEMU and the VM boundary... it's probably something dumb I'm missing, but if you boot up Ubuntu in Docker, wouldn't the NVIDIA drivers still load? And then you wouldn't have to fight Apple about the memory management because OSX would still own the memory?
swiftcoder 18 hours ago|||
> but if you boot up Ubuntu in Docker, wouldn't the NVIDIA drivers still load?

Even if the drivers loaded, they can't talk to the GPU from within docker (unless one implements PCI passthrough). MacOS owns the PCI bus in this scenario.

smw 17 hours ago||||
docker on macos runs in a linux vm
jmalicki 18 hours ago|||
The driver wants to own the memory is the problem.
brcmthrowaway 18 hours ago|||
I still believe the lack of NVIDIA GPU support in the Mac Pro will go down as one of the greatest missed opportunities in tech.

Anyway, the Mac Pro is dead now. There's only so much sales audio and video professionals can provide.

runjake 14 hours ago|||
There was some bad history between Apple and Nvidia. Perhaps with a new generation of leadership at Apple things might change.

https://www.reddit.com/r/hardware/comments/1hmgmuf/apples_hi...

mercutio2 10 hours ago|||
I wasn't in the room when it happened, but this is very different than the story told internally about why Apple became allergic to Nvidia.

Arguably more petty. SJ has been dead for almost 15 year now, I imagine the C-suite might get over it at some point.

kalleboo 2 hours ago||
> Arguably more petty

I can believe it. IIRC Jobs also snubbed ATI once after they leaked the GPUs going in the next PowerMac model.

firecall 10 hours ago|||
Maybe with Tim and Jensen going on holiday together in China, the relationship might be healed somewhat.

Things have moved on since the days where GPUs in Macs were a priority.

But then the AI race has changed things. So who knows - maybe we will one day see official eGPU support from Apple and new drivers from nVidia. Wouldn't put on money on it though....

Aurornis 18 hours ago||||
> I still believe the lack of NVIDIA GPU support in the Mac Pro will go down as one of the greatest missed opportunities in tech.

I don’t know about that. Apple supported some full size GPUs in past product lines and the number of users was very small. Granted, LLMs change that demand but the audience for Mac Pro buyers who would use a full-size GPU that is impossible to obtain is almost nothing compared to their laptop sales.

bigyabai 18 hours ago||
The audience for Mac Pro buyers is almost nothing, full stop. It failed to find a niche, and now Apple is getting rid of it: https://www.macrumors.com/2026/03/26/apple-discontinues-mac-...

Part of the reason the new Mac Pro failed to find an audience can definitely be blamed on macOS' hostility to third party hardware. Who knows what Apple would be worth if they beat Nvidia's Grace CPU to the datacenter market. It was certainly their opportunity.

pjmlp 16 hours ago|||
Yes, because they already moved on to workstations powered by either Windows or Red-Hat Linux/Ubuntu.

The only ones left were people like John Siracusa that still hoped to the very last minute, that Apple would change their mind.

brcmthrowaway 17 hours ago|||
True, they could do any number of things. But a datacenter play would appear quite random to investors and their core audience. Broadcom + Nvidia however...
trollbridge 17 hours ago||
Apple seems to be content to sell shovels in the AI gold rush.

Admittedly… what’s on my desk? A MacBook M4 Air, a Mac Studio, and there’s an x86 iMac in the corner.

What goes in the travel bag? A MacBook Pro or the Air.

Every time I look at buying something else the math doesn’t add up.

The 5090 sits in a commodity PC chassis. It’s not like I need a model running on my own computer.

pjmlp 17 hours ago||||
The missed opportunity is like with server market, now giving the workstation market to Windows and Linux.

It isn't only audio and video.

jbverschoor 18 hours ago||||
I guess that little problem with the Nvidia chips overheating in the MacBook Pro didn’t give Apple a lot of confidence
bigyabai 17 hours ago||
The Mac Pro isn't a Macbook Pro. It has socketed PCI slots and should be able to support the user's hardware in macOS' software, regardless of how Apple feels.
xp84 14 hours ago||
Seriously, the decades-long grudge against Nvidia that we always hear about seems like the most ridiculous and immature business move. I expect that kind of thing from an individual, you know, “I NEVER fly American Airlines!!!” but in business, such a permanent ban on one of the two players in a market, the leader no less. I don’t get it.

Maybe it doesn’t matter that much now because they’ve literally exited all the businesses where an external GPU is going to matter. But sticking with AMD all that time out of spite is just wild.

Melatonic 14 hours ago|||
Audio and Video professionals jumped ship around the time Apple canned all the pro software
SilentM68 12 hours ago||
In your view why have they refused to implement a "Linux VM and pass through the GPU that goes inside the case?"
Aurornis 18 hours ago||
Excellent article.

The game benchmarks are fun but the LLM improvements are where this gets really interesting for practical use. I love Apple platforms as an approachable way to run local models with a lot of RAM, but their relatively slow prompt processing speed is often overlooked.

> Here you can see the big issue with Macs: the prompt processing (aka “prefill”) speed. It just gets worse and worse, the longer the prompt gets. At a 4K-token prompt, which doesn’t seem very long, it takes 17 seconds for the M4 MacBook Air to parse before we even start generating a response. Meanwhile, if you strap the eGPU to it, it’ll only take 150ms. It’s 120x faster.

The prefill problem goes unnoticed when you’re playing around with the LLM with small chats. When you start trying to use it for bigger work pieces the compute limit becomes a bottleneck.

The time to first token (TTFT) charts don’t look bad until you notice that they had to be shown on a logarithmic scale because the Mac platforms were so much slower than full GPU compute.

superlopuh 18 hours ago||
I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac? To elaborate, the article just says that this step is compute bound, but I'm wondering whether it is just that simple or if it might also be less optimised in MLX?
Aurornis 17 hours ago|||
Prefill (prompt processing) is compute bound doing large matrix operations. Token generation (aka tokens/s) is memory bandwidth bound.

The RTX 5090 has an incredible amount of compute performance for matrix operations and a lot of memory bandwidth. The Apple Silicon parts have unusually high memory bandwidth for general purpose compute chips, which is why they can generate tokens so fast. Their raw matrix compute performance is amazing for their power envelope but not nearly as fast as a dedicated GPU consuming 400-500W.

Apple added tensor cores on the M5 generation which help with those matrix operations, which is why the M5 performs so much better than the M4 Max in that article.

Dedicate GPUs like the RTX 5090 are in another league, though.

You can see the divergence in the high resolution gaming benchmarks, too. Once he starts benchmarking at 4K or 6K where the CPU emulation stops being a bottleneck, the raw compute of the 5090 completely crushes any of the Apple Silicon GPUs.

ademeure 17 hours ago||||
Apple GPUs didn’t have tensor cores until the M5 (aka “a neural accelerator in each core”) and in the article’s charts that a M5 Pro significantly beats a M4 Max (while in other workloads it would be much smaller since Pro is ~1/2 Max).

EDIT: since Aurornis beat me by 3 minutes, I’ll add another interesting tidbit instead :)

NVIDIA tensor cores on consumer GPUs are massively less powerful per SM core than on their datacenter counterparts-parts (which also makes them easier to get to peak efficiency on consumer GPUs because the rest of the pipeline is much more quickly a bottleneck as per Amdahl’s Law).

This is potentially changing with Vera Rubin CPX which looks an awful lot like a RTX 5090 replacement but with the full-blown datacenter tensor cores (that won’t be available unless you pay for the datacenter SKU) - so it will have very high TFLOPS relative to its bandwidth.

The target market for the CPX is exactly this: prefill and Time To First Token. You can basically just throw compute at the problem for (parts of) prefill performance (but it won’t help anything else past a certain point) and the 5090/M5 are nowhere near that limit.

So the design choice for NVIDIA/Apple/etc of how much silicon to spend for this on consumer GPUs is mostly dictated by economics and how much they can reuse the same chips for the different markets.

Melatonic 14 hours ago||
Does that include stuff like the Pro Blackwell 6000? Or are the tensor cores as good per SM comparably? They perform quite well on many tests
aviinuo 13 hours ago||
Pro Blackwell 6000 is just a 5090 with more VRAM. It does not have the tcgen05 (5th gen tensor core) instructions despite the "5th gen tensor core) branding and thus do not support any optimized Blackwell (sm100) kernels.

Every Blackwell card other than the (G)B100, (G)B200, (G)B300 and Jetson Thor, use the Ampere tensor core instruction (mma.sync) but with fp4/6/8 added on. Beyond that the DGX Spark (which is advertised as having the same architecture as B200) has especially weak (not tcgen05) tensor cores that have a very narrow operating window and low utilization.

mathisfun123 17 hours ago|||
> I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac?

because the GPUs aren't as fantastic as everyone assumes?

> might also be less optimised in MLX?

prefill has gotta be one of the most optimized paths in MLX...

bigyabai 15 hours ago||
No you don't understand, on Apple Silicon my CPU has comparable memory bandwidth to a $400 Pascal-era GPU. With the unified memory architecture, that means my iGPU gets 2016-levels of DDR transfer speed with none of the upsides of CUDA. It's the most cutting-edge hardware ever put in a personal computer, without a doubt.
fgfarben 3 hours ago||
Please show me on the 2016-era $400 Pascal GPU where you can install the 256 GB of VRAM.
Moosdijk 17 hours ago|||
It feels pedantic to point it out, but it’s actually 113x faster.

Seeing the author present their results like this give off the impression that they’re biased, which I am sure they aren’t.

scottjg 16 hours ago||
the exact numbers in the graph are 17019ms vs 142ms. so you're right, it's not 120x, it's 119.85x.
Moosdijk 15 hours ago||
That explains it. Thanks!
brcmthrowaway 14 hours ago||
Use oMLX. Qwen3.6 - 300tok/s PP, 30tok/s TG.
mercutio2 10 hours ago||
This is The Way.
djmips 17 hours ago||
> Because OpenGL is not well-supported anymore on macOS, the game is completely unplayable there, even with CrossOver. Ironically, it plays totally fine on a Windows PC, but this is a game you literally can’t play on Mac without this eGPU setup.

I understand that this is true it seems that Doom does support Vulkan but you would need to add VK_NV_glsl_shader to MoltenVK. Probably much less work than what went into hanging an RTX 5090 off of a M4. Still, kudos to the scott and the local AI Inference speeds are pretty cool. What a crazy project! <applause>

scottjg 10 hours ago||
interesting. that might be a fun intro project to MoltenVK. I hadn't dug into what was missing for Doom. I thought maybe the issue was that the intro/menu always ran in opengl mode or something. If it's just one missing op, that's way easier.
divbzero 18 hours ago||
This is pretty impressive. My impression was that eGPUs simply do not work with Apple Silicon.

(EDIT: Apple agrees with my impression. “To use an eGPU, a Mac with an Intel processor is required.” And, on top of that, the officially supported eGPUs were all AMD not NVIDIA. https://support.apple.com/en-us/102363)

steelbrain 14 hours ago|
This is not using an eGPU with macOS, ie you can't run your chrome on macOS with its GPU acceleration coming from this eGPU. This is tunneling that eGPU to a Linux VM.
geerlingguy 15 hours ago||
I came into the post thinking it would be running a VM through the slow tinygrad driver... but this is much, much better.

It'd be amazing if Apple would provide better support, and allow more than that 1.5 GB window to make this easier. Arm overall has some quirks with PCIe devices, but at least in Linux, it's gotten so much easier since most modern drivers treat arm64 as a first class citizen.

scottjg 10 hours ago|
i don't know for sure, but i suspect what makes the tinygrad stuff slow isn't the macos host driver itself. i think they're doing something very similar to what i'm doing, which is just mapping the PCI BARs to userspace, then they have a bunch of python code that drives the GPU.

this is only speculation, but i think the big thing that makes tinygrad slow is that the tinygrad inference engine has not really been optimized much for all these open LLM models. probably most of the work has gone towards optimizing the stack for george's self-driving hardware company. since you can't just run the existing CUDA kernels on their engine, that makes things a lot tougher, engineering-wise.

i am actually curious if my project could share a macos host driver with them. i think it would need some changes, but it seems like there's a lot of overlap

swiftcoder 19 hours ago||
This is proper mad science, love it
delbronski 19 hours ago||
Nicely done! Glad to see real hacking is still alive in the age of AI.
Riany 4 hours ago||
The gaming part is fun, so does the local AI numbers. As fast prefill changes the whole experience, it makes local inference feel practical
bilekas 13 hours ago||
I love how its listed as "RTX 5090 Discrete' Sir that is anything but discrete!
scottjg 10 hours ago|
i admit, you got me chuckling with that one.
arjie 17 hours ago|
Wait, this is incredible. I have a spare 5090 lying around and run a claw-like on my M4 Mini. Just plugging it into some sort of 3D print frame for stability and plugging it into the TB port might get me a pretty viable tool for local inference. Would need something neat to ensure the power etc. is well fed.

The problem is `max-num-seqs` and `max-model-len` fight each other, and unless you're in the pure single-client mode you'll need multiple slots so to speak.

pat_space 15 hours ago||
If you get too busy to take advantage, I'll take that spare 5090 off your hands, free of charge :)
More comments...