Show HN: I built a tiny LLM to demystify how language models work

Posted by armanified 19 hours ago

Show HN: I built a tiny LLM to demystify how language models work(github.com)

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.

796 points | 122 comments

thomasfl 5 hours ago|

Is there some documentation for this? The code is probably the simplest (Not So) Large Language Model implementation possible, but it is not straight forward to understand for developers not familiar with multi-head attention, ReLU FFN, LayerNorm and learned positional embeddings.

This projects shares similarities with Minix. Minix is still used at universities as an educational tool for teaching operating system design. Minix is the operating system that taught Linus Torvalds how to design (monolithic) operating systems. Similarly having students adding capabilities to GuppyLM is a good way to learn LLM design.

achenatx 5 hours ago|

give the code to an LLM and have a discussion about it.

dominotw 3 hours ago||

does this work? there is no more need for writing high level docs?

arcanemachiner 3 hours ago|||

> does this work?

Absolutely. If you loaded this into an agentic coding harness with a decent model, I can practically guarantee it would be able to help you figure out what's going on.

> there is no more need for writing high level docs?

Absolutely not. That would be like exploring a cave without a flashlight, knowing that you could just feel your way around in the dark instead.

Code is not always self-documenting, and can often tell you how it was written, but not why.

stronglikedan 2 hours ago||

> If you loaded this into an agentic coding harness with a decent model, I can practically guarantee it would be able to help you figure out what's going on.

My non-coder but technically savvy boss has been doing this lately to great success. It's nice because I spend less time on it since the model has taken my place for the most part.

libria 3 minutes ago||

> since the model has taken my place for the most part

Hah, you realize the same thing is going on in your boss's head right? The pie chart of Things-I-Need-stronglikedan-For just shrank tiny bit...

sigmoid10 3 hours ago||||

There are so many blogs and tutorials about this stuff in particular, I wouldn't worry about it being outside the training data distribution for modern LLMs. If you have a scarce topic in some obscure language I'd be more careful when learning from LLMs.

bigmadshoe 3 hours ago|||

LLMs can tell you what the code does but not why the developer chose to do it that way.

Also, large codebases are harder to understand. But projects like these are simple to discuss with an LLM.

stronglikedan 2 hours ago||

> LLMs can tell you what the code does but not why the developer chose to do it that way.

Do LLMs not take comments into consideration? (Serious question - I'm just getting into this stuff)

dr_hooo 41 minutes ago||

They do (it's just text), if they are there...

fg137 9 hours ago||

How does this compare to Andrej Karpathy's microgpt (https://karpathy.github.io/2026/02/12/microgpt/) or minGPT (https://github.com/karpathy/minGPT)?

armanified 8 hours ago||

I haven't compared it with anything yet. Thanks for the suggestion; I'll look into these.

BrokenCogs 6 hours ago||

Who cares how it compares, it's not a product it's a cool project

tantalor 6 hours ago|||

Even cool projects can learn from others. Maybe they missed something that could benefit the project, or made some interesting technical choice that gives a different result.

For the readers/learners, it's useful to understand the differences so we know what details matter, and which are just stylistic choices.

This isn't art; it's science & engineering.

BrokenCogs 5 hours ago||

But it isn't the OP's responsibility to compare their project to all other projects. The GP could themselves perform the comparison and post their thoughts instead of asking an open ended question.

philipallstar 5 hours ago|||

> it isn't the OP's responsibility to compare their project to all other projects

No one, including the GP, said it was.

fg137 4 hours ago||||

It isn't, but such information will be immensely helpful to anyone who wants to learn from such projects. Some tutorials are objectively better than others, and learners can benefit from such information.

tantalor 5 hours ago|||

100% agree, I didn't mean to imply that OP is responsible for that, or that the (lack of) comparison detracts in any way from the work.

stronglikedan 1 hour ago||||

> Who cares how it compares

Well, the person who asked the question, for one. I'm sure they're not the only one. Best not to assume why people are asking though, so you can save time by not writing irrelevant comments.

layer8 3 hours ago||||

Microgpt isn’t a product either. Are you saying that differences between cool projects aren’t worth thinking and conversing about?

totetsu 10 hours ago||

https://bbycroft.net/llm has 3d Visualization of tiny example LLM layers that do a very good job at showing what is going on (https://news.ycombinator.com/item?id=38505211)

armanified 7 hours ago||

Pretty neat! I'll definitely take a deeper look into this.

maverickxone 8 hours ago|||

have little to do with this, but i have to say your project are indeed pretty cool! Consider adding some more UI?

skramzy 6 hours ago||

Neat!

ordinarily 16 hours ago||

It's genuinely a great introduction to LLMs. I built my own awhile ago based off Milton's Paradise Lost: https://www.wvrk.org/works/milton

algoth1 7 hours ago||

This really makes me think if it would be feasible to make an llm trained exclusively on toki pona (https://en.wikipedia.org/wiki/Toki_Pona)

MarkusQ 4 hours ago|

There isn't enough training data though, is there? The "secret sauce" of LLMs is the vast amount of training data available + the compute to process it all.

algoth1 2 hours ago||

I think you could probably feed a copy of a toki pona grammar book to a big model, and have it produce ‘infinite’ training data

eden-u4 1 hour ago||

There are not enough samples in that book to generate new "infinite" data.

neurworlds 6 hours ago||

Cool project. I'm working on something where multiple LLM agents share a world and interact with each other autonomously. One thing that surprised me is how much the "world" matters — same model, same prompt, but put it in a system with resource constraints, other agents, and persistent memory, the behavior changes dramatically. Made me realize we spend too much time optimizing the model and not enough thinking about the environment it operates in.

mudkipdev 13 hours ago||

This is probably a consequence of the training data being fully lowercase:

You> hello Guppy> hi. did you bring micro pellets.

You> HELLO Guppy> i don't know what it means but it's mine.

functional_dev 12 hours ago|

Great find! It appears uppercase tokens are completely unknonw to the tokenizer.

But the character still comes through in response :)

hackerman70000 11 hours ago||

Finally an LLM that's honest about its world model. "The meaning of life is food" is arguably less wrong than what you get from models 10,000x larger

amelius 8 hours ago||

It's arguably even better than the most famous answer to that question.

siva7 7 hours ago||

which is?

amelius 7 hours ago||

https://medium.com/change-your-mind/the-meaning-of-life-is-4...

zkmon 7 hours ago||

Meaning/goal of life is to reproduce. Food (and everything else) is only a means to it. Reproduction is the only root goal given by nature to any life form. All resources and qualities are provided are only to help mating.

tantalor 6 hours ago|||

Reproduction is the goal of genes.

Food (not dying) is the goal of organisms.

philote 5 hours ago||

I'd argue genes nor life has a "goal". They are what they are because they've been successful at continuing their existence. Would you say a rock's goal is not to get broken?

tantalor 5 hours ago||

Only because genes/organisms can make choices (changes to its programming, or decisions) to optimize their path towards their goal.

A rock is maybe not a good counterexample, but a crystal is because it can grow over time. So in some sense, it tries not to break. However a crystal cannot make any choices; it's behavior is locked into the chemistry it starts with.

amelius 7 hours ago||||

Then why are reproductive rates so low in western countries?

https://en.wikipedia.org/wiki/List_of_countries_by_total_fer...

darepublic 6 hours ago||

The western lifestyle is an evolutionary dead end?

vixen99 6 hours ago||

It seems that some in the West want it to be and are working hard to make it so.

hca 4 hours ago|||

No, evolution has encoded lust. It has not yet allowed for condoms. But it's a process.

BiraIgnacio 1 hour ago||

Nice work and thanks for sharing it!

Now, I ask, have LLMs ben demystified to you? :D

I am still impressed how much (for the most part) trivial statistics and a lot of compute can do.

rpdaiml 4 hours ago|

This is a nice idea. A tiny implementation can be way more useful for learning than yet another wrapper around a big model, especially if it keeps the training loop and inference path small enough to read end to end.

More comments...