The path to ubiquitous AI (17k tokens/sec)

Posted by sidnarsipur 18 hours ago

The path to ubiquitous AI (17k tokens/sec)(taalas.com)

676 points | 393 commentspage 5

soleveloper 16 hours ago|

There are so many use cases for small and super fast models that are already in size capacity -

* Many top quality tts and stt models

* Image recognition, object tracking

* speculative decoding, attached to a much bigger model (big/small architecture?)

* agentic loop trying 20 different approaches / algorithms, and then picking the best one

* edited to add! Put 50 such small models to create a SOTA super fast model

xnorswap 15 hours ago||

Asking it what its knowledge cut-off is interesting, it doesn't seem to be consistent even within a single response. Sometimes it responds to say it cuts off 2020 too.

    You
    What is your knowledge cut-off?

    Jimmy
    My knowledge cut-off is 2022, which means that my training data is current up to 2021, but 
    I have been trained on a   dataset that is updated periodically. If you have any specific 
    questions about events or topics that occurred after 2021, I may not have information on 
    those topics. However, I can still provide general information and context on those topics 
    to help guide further research.

The instantaneous response is impressive though. I'm sure there will be applications for this, I just lack the imagination to know what they'll be.

wat10000 15 hours ago|

Sounds like they aren’t providing a system prompt which tells it what its knowledge cutoff is. It doesn’t know unless it’s told. Given the limited context, it makes sense they wouldn’t use up valuable space for that.

mike_hearn 10 hours ago||

The system prompt is configurable in the sidebar and minimal. It doesn't give a knowledge cutoff. This is a tech demo of the fact it works at all, it's not meant to be a good chatbot.

piker 15 hours ago||

The company slogan is great: "The Model is The Computer"

It's an homage to Jensen: "The display is the computer"

https://www.wired.com/2002/07/nvidia/

mips_avatar 17 hours ago||

I think the thing that makes 8b sized models interesting is the ability to train unique custom domain knowledge intelligence and this is the opposite of that. Like if you could deploy any 8b sized model on it and be this fast that would be super interesting, but being stuck with llama3 8b isn't that interesting.

ACCount37 17 hours ago|

The "small model with unique custom domain knowledge" approach has a very low capability ceiling.

Model intelligence is, in many ways, a function of model size. A small model tuned for a given domain is still crippled by being small.

Some things don't benefit from general intelligence much. Sometimes a dumb narrow specialist really is all you need for your tasks. But building that small specialized model isn't easy or cheap.

Engineering isn't free, models tend to grow obsolete as the price/capability frontier advances, and AI specialists are less of a commodity than AI inference is. I'm inclined to bet against approaches like this on a principle.

matu3ba 6 hours ago|||

> Engineering isn't free, models tend to grow obsolete as the price/capability frontier advances, and AI specialists are less of a commodity than AI inference is. I'm inclined to bet against approaches like this on a principle.

This does not sound like it will simplify the training and data side, unless their or subsequent models can somehow be efficiently utilized for that. However, this development may lead to (open source) hardware and distributed system compilation, EDA tooling, bus system design, etc getting more deserved attention and funding. In turn, new hardware may lead to more training and data competition instead of the current NVIDIA model training monopoly market. So I think you're correct for ~5 years.

mips_avatar 5 hours ago|||

A fine tuned 1.7B model probably is still too crippled to do anything useful. But around 8b the capabilities really start to change. I’m also extremely unemployed right now so I can provide the engineering.

armishra 11 hours ago||

I am extremely impressed by their inference speed!

big-chungus4 14 hours ago||

The number six seven

> It seems like "six seven" is likely being used to represent the number 17. Is that correct? If so, I'd be happy to discuss the significance or meaning of the number 17 with you.

dagi3d 17 hours ago||

wonder if at some point you could swap the model as if you were replacing a cpu in your pc or inserting a game cartridge

garganzol 10 hours ago||

Imagine a mass-produced AI chips with all human knowledge packed in chinesium epoxy blobs running from CR2032 batteries in toys for children. Given the progress in density and power consumption, it's not that far away.

tgsovlerkhgsel 13 hours ago||

Their "chat jimmy" demo sure is fast, but it's not useful at all.

Test prompt: ```

Please classify the sentiment of this post as "positive", "neutral" or "negative":

Given the price, I expected very little from this case, and I was 100% right.

``` Jimmy: Neutral.

I tried various other examples that I had successfully "solved" with very early LLMs and the results were similarly bad.

weli 13 hours ago|

Maybe its the tism but I also read that sentence as neutral. You expected very little and you got very little. Why would that be positive or negative? Maybe it should be positive because you got what you were expecting? But I would call getting what you expect something neutral, if you expected little and got a lot then that would be positive. If you expected a lot and got little then its negative. But if you expected little and got little the most clear outcome is that its a neutral statement. Am I missing something?

ilc 11 hours ago|

Minor note to anyone from taalas:

The background on your site genuinely made me wonder what was wrong with my monitor.

More comments...