Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s

Posted by campers 2 days ago

Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s(cerebras.ai)

145 points | 82 commentspage 2

neals 2 days ago|

So what is inference?

jonplackett 2 days ago|

Inference just means using the model, rather than training it.

As far as I know Nvidia still has a monopoly on the training part.

AIFounder 2 days ago||

[dead]

anonzzzies 2 days ago||

Demo, API?

selcuka 2 days ago|

Demo: https://inference.cerebras.ai/

API: https://cloud.cerebras.ai/

aliljet 2 days ago|||

That's odd, attempting a prompt fails because auth isn't working.

bestest 2 days ago|||

I filled out a lengthy prompt in the demo. submitted it. an auth window pops up. I don't want to login. I want the demo. such a repulsive approach.

swyx 2 days ago||

chill with the emotionally charged words. their hardware, their rules. if this upsets you you will not have a good time on the modern internet.

okwhateverdude 2 days ago||

You're not wrong, but how it is currently implemented is pretty deceptive. I would have appreciated knowing the login prompt before interacting with the page. I am curious how many bounces they have because of this one dark pattern.

andrewstuart 2 days ago|

Could someone please bring Microsoft's Bitnet into the discussion and explain how its performance relates to this announcement, if at all?

https://github.com/microsoft/BitNet

"bitnet.cpp achieves speedups of 1.37x to 5.07x on ARM CPUs, with larger models experiencing greater performance gains. Additionally, it reduces energy consumption by 55.4% to 70.0%, further boosting overall efficiency. On x86 CPUs, speedups range from 2.37x to 6.17x with energy reductions between 71.9% to 82.2%. Furthermore, bitnet.cpp can run a 100B BitNet b1.58 model on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second), significantly enhancing the potential for running LLMs on local devices. "

eptcyka 2 days ago||

It is an inference engine for 1bit LLMs, not really comparable.

BoorishBears 2 days ago||

The novelty of the inexplicable bitnet obsession has worn off I think.

qwertox 2 days ago|||

IDK, they remind me of Sigma-Delta ADCs [0], which are single bit ADCs but used in high resolution scenarios.

I believe we'll get to hear more interesting things about Bitnet in the future.

[0] https://en.wikipedia.org/wiki/Delta-sigma_modulation

Tepix 2 days ago|||

We have yet to see a large model trained using it, haven't we?

BoorishBears 2 days ago||

Bitnet models are just another piece in the ocean of techniques where there may possibly be alpha at large parameter counts... but no one will know until a massive investment is made, and that investment hasn't happened because the people with resources have much surer things to invest in.

There's this insufferable crowd of people who just keep going on and on about it like it's some magic bullet that will let them run 405B on their home PC but if it was so simple it's not like the 5 or so companies in the world putting out frontier models need little Timmy 3090 to tell them about the technique: we don't need it shoehorned into every single release.