Show HN: I built a tiny LLM to demystify how language models work

Posted by armanified 1 day ago

Show HN: I built a tiny LLM to demystify how language models work(github.com)

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.

839 points | 126 commentspage 5

Vektorceraptor 11 hours ago|

Haha, funny name :)

AndrewKemendo 23 hours ago||

I love these kinds of educational implementations.

I want to really praise the (unintentional?) nod to Nagel, by limiting capabilities to representation of a fish, the user is immediately able to understand the constraints. It can only talk like a fish cause it’s very simple

Especially compared to public models, thats a really simple correspondence to grok intuitively (small LLM > only as verbose as a fish, larger LLM > more verbose) so kudos to the author for making that simple and fun.

dvt 23 hours ago|

> the user is immediately able to understand the constraints

Nagel's point was quite literally the opposite[1] of this, though. We can't understand what it must "be like to be a bat" because their mental model is so fundamentally different than ours. So using all the human language tokens in the world can't get us to truly understand what it's like to be a bat, or a guppy, or whatever. In fact, Nagel's point is arguably even stronger: there's no possible mental mapping between the experience of a bat and the experience of a human.

[1] https://www.sas.upenn.edu/~cavitch/pdf-library/Nagel_Bat.pdf

Terr_ 19 hours ago|||

IMO we're a step before that: We don't even have a real fish involved, we have a character that is fictionally a fish.

In LLM-discussions, obviously-fictional characters can be useful for this, like if someone builds a "Chat with Count Dracula" app. To truly believe that a typical "AI" is some entity that "wants to be helpful" is just as mistaken as believing the same architecture creates an entity that "feels the dark thirst for the blood of the living."

Or, in this case, that it really enjoys food-pellets.

andoando 19 hours ago||||

Id highly disagree with that. Were all living in the same shared universe, and underlying every intelligence must be precisely an understanding of events happening in this space-time.

vixen99 12 hours ago||

What does 'precisely' mean? Everyone has the same understanding of events - a precise one?

andoando 9 hours ago||

No I am saying the basis of intelligence must be shared, not that we have the same exact mental model.

I might for example say a human entered a building, a bat might on the other hand think "some big block with two sticks moved through a hole", but both are experiencing a shared physical observation, and there is some mapping between the two.

Its like when people say, if there are aliens they would find the same mathematical constants thet we do

AndrewKemendo 23 hours ago|||

Different argument

I’m not going to argue other than to say that you need to view the point from a third party perspective evaluating “fish” vs “more verbose thing,” such that the composition is the determinant of the complexity of interaction (which has a unique qualia per nagel)

Hence why it’s a “unintentional nod” not an instantiation

gdzie-jest-sol 16 hours ago||

* How creating dataset? I download it but it is commpresed in binary format.

* How training. In cloud or in my own dev

* How creating a gguf

freetonik 16 hours ago||

You sound like Guppy. Nice touch.

gdzie-jest-sol 16 hours ago||

``` uv run python -m guppylm chat

Traceback (most recent call last):

  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/user/gupik/guppylm/guppylm/__main__.py", line 48, in <module>
    main()
  File "/home/user/gupik/guppylm/guppylm/__main__.py", line 29, in main
    engine = GuppyInference("checkpoints/best_model.pt", "data/tokenizer.json")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/gupik/guppylm/guppylm/inference.py", line 17, in __init__
    self.tokenizer = Tokenizer.from_file(tokenizer_path)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Exception: No such file or directory (os error 2) ```

gdzie-jest-sol 16 hours ago||

meybe add training again (read best od fine) and train again

``` # after config device checkpoint_path = "checkpoints/best_model.pt"

ckpt = torch.load(checkpoint_path, map_location=device, weights_only=False)

model = GuppyLM(mc).to(device) if "model_state_dict" in ckpt: model.load_state_dict(ckpt["model_state_dict"]) else: model.load_state_dict(ckpt)

start_step = ckpt.get("step", 0) print(f"Encore {start_step}") ```

hughw 13 hours ago||

Tiny LLM is an oxymoron, just sayin.

uxcolumbo 12 hours ago||

How about: LLMs are on a spectrum and this one is on the tiny side?

armanified 13 hours ago||

True, but most would ignore LM if it weren't LLM.

oyebenny 20 hours ago||

Neat!

Elengal 15 hours ago||

Cool

meidad_g 9 hours ago||

[dead]

techpulselab 9 hours ago||

[dead]

zephyrwhimsy 4 hours ago|

[dead]

More comments...