Top
Best
New

Posted by armanified 22 hours ago

Show HN: I built a tiny LLM to demystify how language models work(github.com)
Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.

816 points | 124 commentspage 3
brcmthrowaway 16 hours ago|
Why are there so many dead comments from new accounts?
59nadir 13 hours ago||
Because despite what HN users seem to think, HN is a LLM-infested hellscape to the same degree as Reddit, if not more.
wiseowise 12 hours ago|||
You’re absolutely right! HN isn’t just LLM-infested hellscape, it’s a completely new paradigm of machine assisted chocolate-infused information generation.
toyg 12 hours ago||
Just let me know which type of information goo you'd like me to generate, and I'll tailor the perfect one for you.
siva7 10 hours ago|||
But what should we do? The parent company isn't transparent about communicating the seriousness of this problem
loveparade 15 hours ago|||
It really seems it's mostly AI comments on this. Maybe this topic is attractive to all the bots.
armanified 9 hours ago||
This title might have triggered something in those bots; most of them have sneaky AI SaaS links in their bio.

Honestly, I never expected this post to become so popular. It was just the outcome of a weekend practice session.

AlecSchueler 16 hours ago||
They all seem to be slop comments.
Duplicake 11 hours ago||
I love this! Seems like it can't understand uppercase letters though
armanified 10 hours ago|
Uppercase letters were intentionally ignored.
jbethune 6 hours ago||
Forked. Very cool. I appreciate the simplicity and documentation.
ankitsanghi 17 hours ago||
Love it! I think it's important to understand how the tools we use (and will only increasingly use) work under the hood.
drincanngao 11 hours ago||
I was going to suggest implementing RoPE to fix the context limit, but realized that would make it anatomically incorrect.
armanified 11 hours ago|
I intentionally removed all optimizations to keep it vanilla.
nobodyandproud 8 hours ago||
Thanks. Tinkering is how I learn and this is what I’ve been looking for.
fawabc 12 hours ago||
how did you generate the synthetic data?
amelius 11 hours ago||
> A 9M model can't conditionally follow instructions

How many parameters would you need for that?

armanified 10 hours ago|
My initial idea was to train a navigation decision model with 25M parameters for a Raspberry Pi, which, in testing, was getting about 60% of tool calls correct. IMO, it seems like around 20M parameters would be a good size for following some narrow & basic language instructions.
amelius 10 hours ago||
Ok. This makes me wonder about a broader question. Is there a scientific approach showing a pyramid of cognitive functions, and how many parameters are (minimally) required for each layer in this pyramid?
winter_blue 6 hours ago||
This is amazing work. Thank you.
SilentM68 20 hours ago|
Would have been funny if it were called "DORY" due to memory recall issues of the fish vs LLMs similar recall issues :)
armanified 10 hours ago|
OMG! Why didn't I thought fo this first :P
More comments...