Top
Best
New

Posted by campers 10/25/2024

Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s(cerebras.ai)
147 points | 84 commentspage 2
neals 10/25/2024|
So what is inference?
jonplackett 10/25/2024|
Inference just means using the model, rather than training it.

As far as I know Nvidia still has a monopoly on the training part.

AIFounder 10/25/2024||
[dead]
anonzzzies 10/25/2024||
Demo, API?
selcuka 10/25/2024|
Demo: https://inference.cerebras.ai/

API: https://cloud.cerebras.ai/

aliljet 10/25/2024|||
That's odd, attempting a prompt fails because auth isn't working.
bestest 10/25/2024|||
I filled out a lengthy prompt in the demo. submitted it. an auth window pops up. I don't want to login. I want the demo. such a repulsive approach.
swyx 10/25/2024||
chill with the emotionally charged words. their hardware, their rules. if this upsets you you will not have a good time on the modern internet.
okwhateverdude 10/25/2024||
You're not wrong, but how it is currently implemented is pretty deceptive. I would have appreciated knowing the login prompt before interacting with the page. I am curious how many bounces they have because of this one dark pattern.
andrewstuart 10/25/2024|
Could someone please bring Microsoft's Bitnet into the discussion and explain how its performance relates to this announcement, if at all?

https://github.com/microsoft/BitNet

"bitnet.cpp achieves speedups of 1.37x to 5.07x on ARM CPUs, with larger models experiencing greater performance gains. Additionally, it reduces energy consumption by 55.4% to 70.0%, further boosting overall efficiency. On x86 CPUs, speedups range from 2.37x to 6.17x with energy reductions between 71.9% to 82.2%. Furthermore, bitnet.cpp can run a 100B BitNet b1.58 model on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second), significantly enhancing the potential for running LLMs on local devices. "

eptcyka 10/25/2024||
It is an inference engine for 1bit LLMs, not really comparable.
BoorishBears 10/25/2024||
The novelty of the inexplicable bitnet obsession has worn off I think.
qwertox 10/25/2024|||
IDK, they remind me of Sigma-Delta ADCs [0], which are single bit ADCs but used in high resolution scenarios.

I believe we'll get to hear more interesting things about Bitnet in the future.

[0] https://en.wikipedia.org/wiki/Delta-sigma_modulation

Tepix 10/25/2024|||
We have yet to see a large model trained using it, haven't we?
BoorishBears 10/25/2024||
Bitnet models are just another piece in the ocean of techniques where there may possibly be alpha at large parameter counts... but no one will know until a massive investment is made, and that investment hasn't happened because the people with resources have much surer things to invest in.

There's this insufferable crowd of people who just keep going on and on about it like it's some magic bullet that will let them run 405B on their home PC but if it was so simple it's not like the 5 or so companies in the world putting out frontier models need little Timmy 3090 to tell them about the technique: we don't need it shoehorned into every single release.