Show HN: I built a tiny LLM to demystify how language models work

Posted by armanified 20 hours ago

Show HN: I built a tiny LLM to demystify how language models work(github.com)

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.

796 points | 122 commentspage 2

bharat1010 3 hours ago|

This is such a smart way to demystify LLMs. I really like that GuppyLM makes the whole pipeline feel approachable..great work

zwaps 15 hours ago||

I like the idea, just that the examples are reproduced from the training data set.

How does it handle unknown queries?

armanified 9 hours ago||

It mostly doesn't, at 9M it has very limited capacity. The whole idea of this project is to demonstrate how Language Models work.

bblb 12 hours ago||

Could it be possible to train LLM only through the chat messages without any other data or input?

If Guppy doesn't know regular expressions yet, could I teach it to it just by conversation? It's a fish so it wouldn't probably understand much about my blabbing, but would be interesting to give it a try.

Or is there some hard architectural limit in the current LLM's, that the training needs to be done offline and with fairly large training set.

roetlich 9 hours ago||

What does "done offline" mean? Otherwise you are limited by context window.

tatrions 7 hours ago||

[flagged]

Leomuck 6 hours ago||

Wow that is such a cool idea! And honestly very much needed. LLMs seem to be this blackbox nobody understands. So I love every effort to make that whole thing less mysterious. I will definitely have a look at dabbling with this, may it not be a goldfish LLM :)

CaseFlatline 6 hours ago||

I am trying to find how the synthetic data was created (looking through the repo) and didn't find it. Maybe I am missing it - Would love to see the prompts and process on that aspect of the training data generation!

vunderba 4 hours ago|

It's here:

https://github.com/arman-bd/guppylm/blob/main/guppylm/genera...

Uses a sort of mad-libs templatized style to generate all the permutations.

cbdevidal 17 hours ago||

> you're my favorite big shape. my mouth are happy when you're here.

Laughed loudly :-D

vunderba 16 hours ago|

This is a direct output from the synthetic training data though - wonder if there is a bit of overfitting going on or it’s just a natural limitation of a much smaller model.

jzer0cool 6 hours ago||

Does this work by just training once with next token prediction? Want to understand better how it creates fluent sentences if anyone can provide insights.

EmilioOldenziel 4 hours ago||

Building it yourself is always the best test if you really understand how it works.

jbethune 4 hours ago||

Forked. Very cool. I appreciate the simplicity and documentation.

kaipereira 15 hours ago|

This is so cool! I'd love to see a write-up on how made it, and what you referenced because designing neural networks always feel like a maze ;)

More comments...