Top
Best
New

Posted by NicoConstant 7 hours ago

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request(blog.kog.ai)
127 points | 65 commentspage 3
mikdan 4 hours ago|
[dead]
nryoo 7 hours ago||
[dead]
Jimmymenk2 6 hours ago||
[flagged]
Hfuffzehn 4 hours ago|
That's really nice of them.

That means Jensen can add another 30 times faster when comparing Rubin to Blackwell without having to actually do anything.

Hopefully that means he won't have any problem to make another 150 billion in profit in the next year.

Sorry for the sarcasm. Looks like interesting work.