Show HN: I built a tiny LLM to demystify how language models work

Posted by armanified 22 hours ago

Show HN: I built a tiny LLM to demystify how language models work(github.com)

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.

816 points | 124 commentspage 3

brcmthrowaway 16 hours ago|

Why are there so many dead comments from new accounts?

59nadir 13 hours ago||

Because despite what HN users seem to think, HN is a LLM-infested hellscape to the same degree as Reddit, if not more.

wiseowise 12 hours ago|||

You’re absolutely right! HN isn’t just LLM-infested hellscape, it’s a completely new paradigm of machine assisted chocolate-infused information generation.

toyg 12 hours ago||

Just let me know which type of information goo you'd like me to generate, and I'll tailor the perfect one for you.

siva7 10 hours ago|||

But what should we do? The parent company isn't transparent about communicating the seriousness of this problem

loveparade 15 hours ago|||

It really seems it's mostly AI comments on this. Maybe this topic is attractive to all the bots.

armanified 9 hours ago||

This title might have triggered something in those bots; most of them have sneaky AI SaaS links in their bio.

Honestly, I never expected this post to become so popular. It was just the outcome of a weekend practice session.

AlecSchueler 16 hours ago||

They all seem to be slop comments.

Duplicake 11 hours ago||

I love this! Seems like it can't understand uppercase letters though

armanified 10 hours ago|

Uppercase letters were intentionally ignored.

jbethune 6 hours ago||

Forked. Very cool. I appreciate the simplicity and documentation.

ankitsanghi 17 hours ago||

Love it! I think it's important to understand how the tools we use (and will only increasingly use) work under the hood.

drincanngao 11 hours ago||

I was going to suggest implementing RoPE to fix the context limit, but realized that would make it anatomically incorrect.

armanified 11 hours ago|

I intentionally removed all optimizations to keep it vanilla.

nobodyandproud 8 hours ago||

Thanks. Tinkering is how I learn and this is what I’ve been looking for.

fawabc 12 hours ago||

how did you generate the synthetic data?

amelius 11 hours ago||

> A 9M model can't conditionally follow instructions

How many parameters would you need for that?

armanified 10 hours ago|

My initial idea was to train a navigation decision model with 25M parameters for a Raspberry Pi, which, in testing, was getting about 60% of tool calls correct. IMO, it seems like around 20M parameters would be a good size for following some narrow & basic language instructions.

amelius 10 hours ago||

Ok. This makes me wonder about a broader question. Is there a scientific approach showing a pyramid of cognitive functions, and how many parameters are (minimally) required for each layer in this pyramid?

winter_blue 6 hours ago||

This is amazing work. Thank you.

SilentM68 20 hours ago|

Would have been funny if it were called "DORY" due to memory recall issues of the fish vs LLMs similar recall issues :)

armanified 10 hours ago|

OMG! Why didn't I thought fo this first :P

More comments...