The path to ubiquitous AI (17k tokens/sec)

Posted by sidnarsipur 1 day ago

The path to ubiquitous AI (17k tokens/sec)(taalas.com)

727 points | 409 commentspage 8

impossiblefork 23 hours ago|

So I'm guessing this is some kind of weights as ROM type of thing? At least that's how I interpret the product page, or maybe even a sort of ROM type thing that you can only access by doing matrix multiplies.

readitalready 23 hours ago|

You shouldn't need any ROM. It's likely the architecture is just fixed hardware with weights loaded in via scan flip-flows. If it was me making it, I'd just design a systolic array. Just multipliers feeding into multipliers, without even going through RAM.

Havoc 23 hours ago||

That seems promising for applications that require raw speed. Wonder how much they can scale it up - 8B model quantized is very usable but still quite small compared to even bottom end cloud models.

gozucito 23 hours ago||

Can it scale to an 800 billion param model? 8B parameter models are too far behind the frontier to be useful to me for SWE work.

Or is that the catch? Either way I am sure there will be some niche uses for it.

taneq 23 hours ago|

Spam. :P

Lionga 22 hours ago||

so 90% of the AI market?

mlboss 16 hours ago||

Inference is crazy fast! I can see lot of potential for this kind of chip for IOT devices and Robotics.

Dave3of5 23 hours ago||

Fast but the output is shit due to the contrained model they used. Doubt we'll ever get something like this for the large Param decent models.

maelito 16 hours ago||

Talks about ubiquitous AI but can't make a blog post readable for humans :/

b0rbb 15 hours ago|

That animated background is terrible.

Incredibly distracting. No way to turn it off (at least within what's provided without using something like devtools.)

ramshanker 21 hours ago||

I was all praise for Cerberus, and now this ! $30 M for PCIe card in hand, really makes it approachable for many startups.

8cvor6j844qw_d6 22 hours ago||

Amazing speed. Imagine if its standardised like the GPU card equivalent in the future.

New models come out, time to upgrade your AI card, etc.

hkt 23 hours ago||

Reminds me of when bitcoin started running on ASICs. This will always lag behind the state of the art, but incredibly fast, (presumably) power efficient LLMs will be great to see. I sincerely hope they opt for a path of selling products rather than cloud services in the long run, though.

retrac98 22 hours ago|

Wow. I’m finding it hard to even conceive of what it’d be like to have one of the frontier models on hardware at this speed.

More comments...