Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

mikdan 4 hours ago|

[dead]

nryoo 7 hours ago||

[dead]

Jimmymenk2 6 hours ago||

[flagged]

Hfuffzehn 4 hours ago|

That's really nice of them.

That means Jensen can add another 30 times faster when comparing Rubin to Blackwell without having to actually do anything.

Hopefully that means he won't have any problem to make another 150 billion in profit in the next year.

Sorry for the sarcasm. Looks like interesting work.