Show HN: I built a tiny LLM to demystify how language models work

Posted by armanified 1 day ago

Show HN: I built a tiny LLM to demystify how language models work(github.com)

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.

851 points | 126 commentspage 8

aditya7303011 1 day ago|

Did something similar last year https://github.com/aditya699/EduMOE

zhichuanxun 18 hours ago||

[dead]

aditya7303011 1 day ago||

[dead]

LeonTing1010 1 day ago||

[flagged]

secabeen 1 day ago|

Training data is here:

https://huggingface.co/datasets/arman-bd/guppylm-60k-generic

jiusanzhou 23 hours ago||

[flagged]

ngruhn 23 hours ago||

comment smells AI written

3m 23 hours ago||

AI account

areys 18 hours ago||

[flagged]

moonu 17 hours ago|

This comment seems ai-written

dinkumthinkum 1 day ago||

I think this is a nice project because it is end to end and serves its goal well. Good job! It's a good example how someone might do something similar for a specific purpose. There are other visualizers that explain different aspects of LLMs but this is a good applied example.

martmulx 1 day ago||

How much training data did you end up needing for the fish personality to feel coherent? Curious what the minimum viable dataset looks like for something like this.

Propelloni 21 hours ago|

Great work! I still think that [1] does a better job of helping us understand how GPT and LLM work, but yours is funnier.

Then, some criticism. I probably don't get it, but I think the HN headline does your project a disservice. Your project does not demystify anything (see below) and it diverges from your project's claim, too. Furthermore, I think you claim too much on your github. "This project exists to show that training your own language model is not magic." and then just posts a few command line statements to execute. Yeah, running a mail server is not magic, just apt-get install exim4. So, code. Looking at train_guppylm.ipynb and, oh, it's PyTorch again. I'm better off reading [2] if I'm looking into that (I know, it is a published book, but I maintain my point).

So, in short, it does not help the initiated or the uninitiated. For the initiated it needs more detail for it to be useful, the uninitiated more context for it to be understood. Still a fun project, even if oversold.

[1] https://spreadsheets-are-all-you-need.ai/ [2] https://github.com/rasbt/LLMs-from-scratch

jadengeller 19 hours ago|

this comment seems to be astroturfing to sell a course