Top
Best
New

Posted by armanified 20 hours ago

Show HN: I built a tiny LLM to demystify how language models work(github.com)
Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.

796 points | 122 commentspage 2
bharat1010 3 hours ago|
This is such a smart way to demystify LLMs. I really like that GuppyLM makes the whole pipeline feel approachable..great work
zwaps 15 hours ago||
I like the idea, just that the examples are reproduced from the training data set.

How does it handle unknown queries?

armanified 9 hours ago||
It mostly doesn't, at 9M it has very limited capacity. The whole idea of this project is to demonstrate how Language Models work.
bblb 12 hours ago||
Could it be possible to train LLM only through the chat messages without any other data or input?

If Guppy doesn't know regular expressions yet, could I teach it to it just by conversation? It's a fish so it wouldn't probably understand much about my blabbing, but would be interesting to give it a try.

Or is there some hard architectural limit in the current LLM's, that the training needs to be done offline and with fairly large training set.

roetlich 9 hours ago||
What does "done offline" mean? Otherwise you are limited by context window.
tatrions 7 hours ago||
[flagged]
Leomuck 6 hours ago||
Wow that is such a cool idea! And honestly very much needed. LLMs seem to be this blackbox nobody understands. So I love every effort to make that whole thing less mysterious. I will definitely have a look at dabbling with this, may it not be a goldfish LLM :)
CaseFlatline 6 hours ago||
I am trying to find how the synthetic data was created (looking through the repo) and didn't find it. Maybe I am missing it - Would love to see the prompts and process on that aspect of the training data generation!
vunderba 4 hours ago|
It's here:

https://github.com/arman-bd/guppylm/blob/main/guppylm/genera...

Uses a sort of mad-libs templatized style to generate all the permutations.

cbdevidal 17 hours ago||
> you're my favorite big shape. my mouth are happy when you're here.

Laughed loudly :-D

vunderba 16 hours ago|
This is a direct output from the synthetic training data though - wonder if there is a bit of overfitting going on or it’s just a natural limitation of a much smaller model.
jzer0cool 6 hours ago||
Does this work by just training once with next token prediction? Want to understand better how it creates fluent sentences if anyone can provide insights.
EmilioOldenziel 4 hours ago||
Building it yourself is always the best test if you really understand how it works.
jbethune 4 hours ago||
Forked. Very cool. I appreciate the simplicity and documentation.
kaipereira 15 hours ago|
This is so cool! I'd love to see a write-up on how made it, and what you referenced because designing neural networks always feel like a maze ;)
More comments...