The path to ubiquitous AI (17k tokens/sec)

Posted by sidnarsipur 1 day ago

The path to ubiquitous AI (17k tokens/sec)(taalas.com)

795 points | 431 commentspage 13

OrvalWintermute 1 day ago|

wow that is fast!

heliumtera 1 day ago||

Yep, this is the most exciting demo for me yet. Holy cow this is unbelievably fast.

The most impressive demo since gpt 3, honestly.

Since we already have open source models that are plenty good, like the new kimi k2.5, all I need is the ability to run it at moderate speed.

Honestly I am not bullish on capabilities that models do not yet have, seems we have seen it all and the only advancement have been context size.

And honestly I would claim this is the market sentiment aswell, anthropic showed opus 4.6 first and the big release was actually sonnet, the model people would use routinely. Nobody gave a shit about Gemini 3.1 pro, 3.0 flash was very successful...

Given all the recent developments in the last 12 months, no new use cases have opened for me. Given this insane speed, even on a limited model/context size, we would approach IA very differently.

oofbey 15 hours ago||

I have a hard time reading beyond factual lies like:

> On the cost front, deploying modern models demands massive engineering and capital: room-sized supercomputers consuming hundreds of kilowatts…

This is just wrong. The largest models are probably 1-2 trillion parameters. Say 2 trillion and let’s pretend it’s only quantized to 8bit (even though it could easily be half that.) So we need 2TB of VRAM. Not even using the latest hardware, lets say H100 chips with 80GB of vram each, with 8 of them in say an 8U. (Although you can certainly fit these in 6U still air cooled or even 4U water cooled.) Three of these server would almost do, but let’s call it four to include plenty of room for context. The largest physical size would be 32U - most of a single rack. Which is hardly the size of a room, even in Manhattan. Total power maybe 40kW. And you could easily drop these numbers to a half or quarter of that with reasonable modifications or upgrades.

If you want to sell your hardware, start by being honest about the problem you’re addressing.

hagbard_c 1 day ago||

Holy Hallucinations, Batman...

User You: What is the airspeed of an unladen swallow/

Jimmy:

A reference to the classic line from Monty Python and the Holy Grail!

Thus far it is correct but here it comes:

In the movie, a group of medieval scholars, the Knights who say "Ni!" (also known as the Bridgekeepers), ask a king's servant, a blacksmith, if an unladen swallow makes good time. The blacksmith, played by John Cleese, responds with the now-famous line:

"And what's the airspeed velocity of an unladen swallow?"

To which the blacksmith responds:

"What do you mean? An unladen swallow?"

The blacksmith then explains that an unladen swallow is when a swallow (a type of bird) is flying in the air without carrying any burden, implying that it is moving at a speed that is not affected by carrying any weight or load.

Nope, that is not at all how it went. No blacksmiths, not related to the Knights who say 'Ni!' - who have nothing scholarly about them, they wanted a shrubbery and demanded another one when their wish was fulfilled. It was the Bridgekeeper - not related to the aforementioned knights - who asked 3 questions and would throw anyone who could not answer all three into the chasm. When he asked Arthur - played by Cleese - the question about the airspeed of the swallow he asked the Bridgekeeper whether he meant an African or European swallow. The Bridgekeeper did not have an answer and was thrown into the chasm, problem solved.

However, in reality, swallows are not typically "unladen" in flight. They are small birds that fly at relatively fast speeds, usually around 10-15 km/h (6-9 mph), but they are not usually carrying any burdens!

Needless LLM-blabber.

The "airspeed velocity of an unladen swallow" has become a meme and a cultural reference point, often used humorously or ironically to refer to situations where someone is trying to make an absurd or non-sensical argument or ask an absurd question.

Somewhat correct but not necessary in this context.

The correct answer to the question would have been Do you mean an African or European swallow? followed by a short reference to the movie.

Of course this demo is not about the accuracy of the model - 'an old Llama' as mentioned elsewhere in this thread - but it does show that speed isn't everything. For generating LLM-slop this hardware implementation probably offers an unbeatable price/performance ratio but it remains to be seen if it can be combined with larger and less hallucination-prone models.

cheema33 1 day ago|

> Holy Hallucinations, Batman...

Congratulations! You figured out that this is a demo of a very small 8B model from 2022.

pelasaco 1 day ago||

Is it already available to buy, or is this a “pay now, get it later” kind of new ASIC miner? Sorry for being skeptical, but AI is the new "crypto coin", and the crypto bros are still around.

zamadatix 1 day ago|

There doesn't seem to be any form of buying the HC1 hardware at the moment. There is a free chatbot demo and then a form to request access to API. They seem to intend HC1 to be for demonstration and HC2 for "real" use, but they don't seem to be taking payment for either at the moment.

johnjames87 1 day ago||

[dead]

small_model 1 day ago||

Scale this then close the loop and have fabs spit out new chips with latest weights every week that get placed in a server using a robot, how long before AGI?

fragkakis 1 day ago||

The article doesn't say anything about the price (it will be expensive), but it doesn't look like something that the average developer would purchase.

An LLM's effective lifespan is a few months (ie the amount of time it is considered top-tier), it wouldn't make sense for a user to purchase something that would be superseded in a couple of months.

An LLM hosting service however, where it would operate 24/7, would be able to make up for the investment.

viftodi 1 day ago|

I tried the trick question I saw here before, about the make 1000 with 9 8s and additions only

I know it's not a resonating model, but I keep pushing it and eventually it gave me this as part of it's output

888 + 88 + 88 + 8 + 8 = 1060, too high... 8888 + 8 = 10000, too high... 888 + 8 + 8 +ประก 8 = 1000,ประก

I googled the strange symbol, it seems to mean Set in thai?

danpalmer 1 day ago|

I don't think it's very valuable to talk about the model here, the model is just an old Llama. It's the hardware that matters.