Posted by sidnarsipur 20 hours ago
Also, "10k tokens per second would be fantastic" might not be sufficient (even remotely) if you want to "process millions of log lines per minute".
Assuming a single log line at just 100 tokens, you need (100 * 2 million / 60) ~ 3.3 million tokens per second processing speed :)
Also, what if Cerebras decided to make a wafer-sized FPGA array and turned large language models into lots and lots of logical gates?
I just wanted some toast, but here I am installing an app, dismissing 10 popups, and maybe now arguing with a chat bot about how I don’t in fact want to turn on notifications.
Show me something at a model size 80GB+ or this feels like "positive results in mice"
This is great even if it can't ever run Opus. Many people will be extremely happy about something like Phi accessible at lightning speed.
What does that mean for 8b models 24mo from now?
Asides from the obvious concern that this is a tiny 8B model, I'm also a bit skeptical of the power draw. 2.4 kW feels a little bit high, but someone else should try doing the napkin math compared to the total throughput to power ratio on the H200 and other chips.