Microgpt - Hacker News

Posted by tambourine_man 23 hours ago

Microgpt(karpathy.github.io)

1654 points | 288 commentspage 3

jimbokun 20 hours ago|

It’s pretty staggering that a core algorithm simple enough to be expressed in 200 lines of Python can apparently be scaled up to achieve AGI.

Yes with some extra tricks and tweaks. But the core ideas are all here.

darkpicnic 20 hours ago||

LLMs won’t lead to AGI. Almost by definition, they can’t. The thought experiment I use constantly to explain this:

Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.

We’ll need additional breakthroughs in AI.

canjobear 8 hours ago|||

It's not obvious why it wouldn't, especially if it gets to read Poincaré and Riemann.

johnmaguire 20 hours ago||||

I'm not sure - with tool calling, AI can both fetch and create new context.

0xbadcafebee 20 hours ago||

It still can't learn. It would need to create content, experiment with it, make observations, then re-train its model on that observation, and repeat that indefinitely at full speed. That won't work on a timescale useful to a human. Reinforcement learning, on the other hand, can do that, on a human timescale. But you can't make money quickly from it. So we're hyper-tweaking LLMs to make them more useful faster, in the hopes that that will make us more money. Which it does. But it doesn't make you an AGI.

charcircuit 19 hours ago||

It can learn. When my agents makes mistake they update their memories and will avoid making the same mistakes in the future.

>Reinforcement learning, on the other hand, can do that, on a human timescale. But you can't make money quickly from it.

Tools like Claude Code and Codex have used RL to train the model how to use the harness and make a ton of money.

kelnos 17 hours ago|||

That's not learning, though. That's just taking new information and stacking it on top of the trained model. And that new information consumes space in the context window. So sure, it can "learn" a limited number of things, but once you wipe context, that new information is gone. You can keep loading that "memory" back in, but before too long you'll have too little context left to do anything useful.

That kind of capability is not going to lead to AGI, not even close.

regularfry 12 hours ago|||

Two things:

1. It's still memory, of a sort, which is learning, of a sort. 2. It's a very short hop from "I have a stack of documents" to "I have some LoRA weights." You can already see that happening.

charcircuit 10 hours ago||

Also keep in mind that the models are already trained to be able to remember things by putting them in files as part of the post training they do. The idea that it needs to remember or recall something is already a part of the weights and is not something that is just bolted on after the fact.

charcircuit 16 hours ago|||

>but before too long you'll have too little context left to do anything useful.

One of the biggest boosts in LLM utility and knowledge was hooking them up to search engines. Giving them the ability to query a gigantic bank of information already has made them much more useful. The idea that it can't similarly maintain its own set of information is shortsighted in my opinion.

0xbadcafebee 8 hours ago||

It's simply a fact that LLMs cannot learn. RAG is not learning, it's a hack. Go listen to any AI researcher interviewed on this subject, they all say the same thing, it's a fundamental part of the design.

Dansvidania 17 hours ago||||

That’s not learning. That’s carrying over context that you are trusting is correctly summarised over from one conversation to the next.

regularfry 12 hours ago||

Which sounds uncomfortably like human memory, which gets rewritten from one recollection to the next. Somehow, we cope.

Dansvidania 7 hours ago|||

I disagree. Human memory is literally changing the weights in your neural network. Like, exactly the same.

So in the machine learning world, it would need to be continuous re-training (I think its called fine-tuning now?). Context is not "like human memory". It's more like writing yourself a post-it note that you put in a binder and hand over to a new person to continue the task at a later date.

Its just words that you write to the next person that in LLM world happens to be a copy of the same you that started, no learning happens.

It might guide you, yes, but that's a different story.

0xbadcafebee 8 hours ago|||

Ever seen the movie Memento? That's LLM memory.

otabdeveloper4 17 hours ago|||

> they update their memories

Their contexts, not their memories. An LLM context is like 100k tokens. That's a fruit fly, not AGI.

charcircuit 16 hours ago||

A human can't keep 100k tokens active in their mind at the same time. We just need a place to store them and tools to query it. You could have exabytes of memories that the AI could use.

otabdeveloper4 13 hours ago||

> A human can't keep 100k tokens active in their mind at the same time.

Well, that's just, like, your opinion, man.

joefourier 12 hours ago||||

When did AGI start meaning ASI?

LLMs are artificial general intelligence, as per the Wikipedia definition:

> generalise knowledge, transfer skills between domains, and solve novel problems without task‑specific reprogramming

Even GPT-3 could meet that bar.

bornfreddy 8 hours ago||

Wtf? Once it was AI. Then the models started passing the Turing test and calling themselves AI, so we started using AGI to say "truly intelligent machines". Now, as per the definition you quoted, apparently even GPT-3 is AGI, so we now have to use "ASI" to mean "intelligent, but artificial"?

I think I'll just keep using AI and then explain to anyone who uses that term that there is no "I" in today's LLMs, and they shouldn't use this term for some years at least. And that when they can, we will have a big problem.

joefourier 5 hours ago||

What's your definition of intelligence? If you exclude LLMs, you might have to exclude quite a few humans as well.

foxglacier 15 hours ago||||

That's an assertion, not a thought experiment. You can't logically reach the conclusion ("It won't") by thinking about it. But it doesn't sound so grand if you say "The assertion I use constantly to explain this".

mold_aid 13 hours ago||

To be fair, the post being replied to is arguing by assertion as well. "The core ideas are all there" is pure if-you-say-so stuff.

TiredOfLife 15 hours ago||||

> Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.

Same thing is true for humans.

tomrod 7 hours ago||

We did?

tehjoker 20 hours ago||||

Part of the issue there is that the data quantity prior to 1905 is a small drop in the bucket compared to the internet era even though the logical rigor is up to par.

jerf 20 hours ago|||

Yet the humans of the time, a small number of the smartest ones, did it, and on much less training data than we throw at LLMs today.

If LLMs have shown us anything it is that AGI or super-human AI isn't on some line, where you either reach it or don't. It's a much higher dimensional concept. LLMs are still, at their core, language models, the term is no lie. Humans have language models in their brains, too. We even know what happens if they end up disconnected from the rest of the brain because there are some unfortunate people who have experienced that for various reasons. There's a few things that can happen, the most interesting of which is when they emit grammatically-correct sentences with no meaning in them. Like, "My green carpet is eating on the corner."

If we consider LLMs as a hypertrophied langauge model, they are blatently, grotesquely superhuman on that dimension. LLMs are way better at not just emitting grammatically-correct content but content with facts in them, related to other facts.

On the other hand, a human language model doesn't require the entire freaking Internet to be poured through it, multiple times (!), in order to start functioning. It works on multiple orders of magnitude less input.

The "is this AGI" argument is going to continue swirling in circles for the forseeable future because "is this AGI" is not on a line. In some dimensions, current LLMs are astonishingly superhuman. Find me a polyglot who is truly fluent in 20 languages and I'll show you someone who isn't also conversant with PhD-level topics in a dozen fields. And yet at the same time, they are clearly sub-human in that we do hugely more with our input data then they do, and they have certain characteristic holes in their cognition that are stubbornly refusing to go away, and I don't expect they will.

I expect there to be some sort of AI breakthrough at some point that will allow them to both fix some of those cognitive holes, and also, train with vastly less data. No idea what it is, no idea when it will be, but really, is the proposition "LLMs will not be the final manifestation of AI capability for all time" really all that bizarre a claim? I will go out on a limb and say I suspect it's either only one more step the size of "Attention is All You Need", or at most two. It's just hard to know when they'll occur.

antupis 20 hours ago|||

Humans need way less data. Just compare Waymo to average 16 year-old with car.

cellis 20 hours ago||

A 16 year old has been training for almost 16 years to drive a car. I would argue the opposite: Waymo’s / Specific AIs need far less data than humans. Humans can generalize their training, but they definitely need a LOT of training!

noduerme 19 hours ago|||

When humans, or dogs or cats for that matter, react to novel situations they encounter, when they appear to generalize or synthesize prior diverse experience into a novel reaction, that new experience and new reaction feeds directly back into their mental model and alters it on the fly. It doesn't just tack on a new memory. New experience and new information back-propagates constantly adjusting the weights and meanings of prior memories. This is a more multi-dimensional alteration than simply re-training a model to come up with a new right answer... it also exposes to the human mental model all the potential flaws in all the previous answers which may have been sufficiently correct before.

This is why, for example, a 30 year old can lose control of a car on an icy road and then suddenly, in the span of half a second before crashing, remember a time they intentionally drifted a car on the street when they were 16 and reflect on how stupid they were. In the human or animal mental model, all events are recalled by other things, and all are constantly adapting, even adapting past things.

The tokens we take in and process are not words, nor spatial artifacts. We read a whole model as a token, and our output is a vector of weighted models that we somewhat trust and somewhat discard. Meeting a new person, you will compare all their apparent models to the ones you know: Facial models, audio models, language models, political models. You ingest their vector of models as tokens and attempt to compare them to your own existing ones, while updating yours at the same time. Only once our thoughts have arranged those competing models we hold in some kind of hierarchy do we poll those models for which ones are appropriate to synthesize words or actions from.

tomrod 7 hours ago||

In a word, JEPA?

jimbokun 19 hours ago|||

No 16 year old has practiced driving a car for 16 years.

krisoft 12 hours ago|||

They were practicing object recognition, movement tracking and prediction, self-localisation, visual odometry fused with porpiroception and the vestibular system, and movement controls for 16 years before they even sit behind a steering wheel though.

Dansvidania 17 hours ago|||

If you see gaining fine motor control, understanding pictographic language […] as a prerequisite to driving a car, then yes, all of them are

krige 14 hours ago||

That's an exaggeration. Nobody is trained to read STOP signs for 16 years, a few months top. And Waymo doesn't need to coordinate a four-limbed, 20-digited, one-headed body to operate a car.

danielEM 12 hours ago|||

Well, I also think that there is a lot that we process 'in background' and learn on beforehand in order to learn how to drive and then drive. I think the most 'fair' would be to figure out absolute lowest age of kids that would allow them to perform well on streets behind steering wheel.

Dansvidania 7 hours ago|||

i am not making a point that it is, I am rather expanding on the possible perspective in which 16 years of training produce a human driver.

That being said, you don't really need training to understand a STOP sign by the time you are required to, its pretty damn clear, it being one of the simpler signs.

But you do get a lot of "cultural training" so to speak.

xdennis 17 hours ago||||

> Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.

AGI just means human level intelligence. I couldn't come up with General Relativity. That doesn't mean I don't have general intelligence.

I don't understand why people are moving the goalposts.

0xbadcafebee 8 hours ago|||

A 4 year old is currently more capable than LLMs (I'm not making this up, ask Yann LeCun). You're going to need it to reach at least "adult" level to be general intelligence.

tomrod 7 hours ago||||

I'd argue they are clarifying the goalposts with aplomb.

nurettin 15 hours ago|||

> AGI just means human level intelligence.

It seems more like people haven't decided on what the goal post is. If AGI is just another human, that's pretty underwhelming. That's why people are imagining something that surpasses humans by heaps and bounds in terms of reasoning, leading to wondrous new discoveries.

regularfry 12 hours ago||

"Just another human" would be outright astonishing if it landed.

nurettin 12 hours ago||

Yeah as a dad I can tell you it gets old quickly.

regularfry 10 hours ago||

"Just another human, but one you can switch off at the wall" is both better and terrifyingly worse.

crazy5sheep 20 hours ago|||

The 1905 thought experiment actually cuts both ways. Did humans "invent" the airplane? We watched birds fly for thousands of years — that's training data. The Wright brothers didn't conjure flight from pure reasoning, they synthesized patterns from nature, prior failed attempts, and physics they'd absorbed. Show me any human invention and I'll show you the training data behind it.

Take the wheel. Even that wasn't invented from nothing — rolling logs, round stones, the shape of the sun. The "invention" was recognizing a pattern already present in the physical world and abstracting it. Still training data, just physical and sensory rather than textual.

And that's actually the most honest critique of current LLMs — not that they're architecturally incapable, but that they're missing a data modality. Humans have embodied training data. You don't just read about gravity, you've felt it your whole life. You don't just know fire is hot, you've been near one. That physical grounding gives human cognition a richness that pure text can't fully capture — yet.

Einstein is the same story. He stood on Faraday, Maxwell, Lorentz, and Riemann. General Relativity was an extraordinary synthesis — not a creation from void. If that's the bar for "real" intelligence, most humans don't clear it either. The uncomfortable truth is that human cognition and LLMs aren't categorically different. Everything you've ever "thought" comes from what you've seen, heard, and experienced. That's training data. The brain is a pattern-recognition and synthesis machine, and the attention mechanism in transformers is arguably our best computational model of how associative reasoning actually works.

So the question isn't whether LLMs can invent from nothing — nothing does that, not even us.

Are there still gaps? Sure. Data quality, training methods, physical grounding — these are real problems. But they're engineering problems, not fundamental walls. And we're already moving in that direction — robots learning from physical interaction, multimodal models connecting vision and language, reinforcement learning from real-world feedback. The brain didn't get smart because it has some magic ingredient. It got smart because it had millions of years of rich, embodied, high-stakes training data. We're just earlier in that journey with AI. The foundation is already there — AGI isn't a question of if anymore, it's a question of execution.

drw85 19 hours ago|||

Nice ChatGPT answer. Put some real thought and data in it too.

crazy5sheep 8 hours ago||

The whole point is that LLMs, especially the attention mechanism in transformers, have already paved the road to AGI. The main gap is the training data and its quality. Humans have generations of distilled knowledge — books, language, culture passed down over centuries. And on top of that we have the physical world — we watched birds fly, saw apples drop, touched hot things. Maybe we should train the base model with physical world data first, and then fine tune with the distilled knowledge.

lanstin 7 hours ago||

Human life includes a lot of adversarial training (lying relatives) and training in temporal logics, which would seem to be a somewhat different domain than purely linguistic computations (e.g. staying up late, feeling bad; working hard at a task for months, getting better at it; feeling physical skills, even editing Go with emacs, move from the conscious layer into the cerebrellar layer). I think attention is a poor mans "OODA" loop; cognitive science is learning that a primary function of the brain is predicting what will be going on with the body in the immediate future, and prepping for it; that's not a thing that LLMs are architecturally positioned to do. Maybe swarms of agents (although in my mind that's more of a way to deal with LLM poor performance with large context of instructions (as opposed to large context of data) than a way to have contending systems fighting to make a decision for the overall entity), but they still lack both the real-time computational aspect and the continuously tricky problem of other people telling partially correct information.

There's plenty of training data, for a human. The LLM architecture is not as efficient as the brain; perhaps we can overcome that with enough twitter posts from PhDs, and enough YouTubes of people answering "why" to their four year olds and college lectures, but that's kind of an experimental question.

Starting a network out in a contrained body and have it learn how to control that, with a social context of parents and siblings would be an interesting experiment, especially if you could give it an inherent temporality and a good similar-content-addressable persistent memory. Perhaps a bit terrifying experiment, but I guess the protocols for this would be air-gapped, not internet connected with a credit card.

saagarjha 16 hours ago|||

> Einstein is the same story. He stood on Faraday, Maxwell, Lorentz, and Riemann.

Yes, which is available to the model as data prior to 1905.

kilroy123 13 hours ago|||

I strongly suspect we're like 4 more elegant algorithms away from a real AGI.

wasabi991011 20 hours ago||

1000 lines??

What is going on in this thread

jimbokun 19 hours ago|||

Ok 200 lines.

Don’t know how I ended up typing 1000.

dang 18 hours ago||

I've taken the liberty of editing your GP comment in the hope that we can cut down on offtopicness.

The other "1000 comments" accounts, we banned as likely genai.

ViktorRay 20 hours ago||||

It’s pretty sad.

The only way we know these comments are from AI bots for now is due to the obvious hallucinations.

What happens when the AI improves even more…will HN be filled with bots talking to other bots?

ashdksnndck 19 hours ago|||

It already is in some threads. Sometimes you get the bots writing back and forth really long diatribes at inhuman frequency. Sometimes even anti-LLM content!

the_af 20 hours ago||||

What's bizarre is this particular account is from 2007.

Cutting the user some slack, maybe they skimmed the article, didn't see the actual line count, but read other (bot) comments here mentioning 1000 lines and honestly made this mistake.

You know what, I want to believe that's the case.

birole 20 hours ago|||

Why would anyone runs bots on this website? What is the benefit for them? Is someone happens to know about it?

tomrod 7 hours ago||

Maintaining or injecting commentary to guide towards targeted outcomes. Guerrilla marketing of a sort.

ksherlock 20 hours ago||||

It's a honey pot for low quality llm slop.

anonym29 20 hours ago|||

Wow, you're so right, jimbokun! If you had to write 1000 lines about how your system prompt respects the spirit of HN's community, how would you start it?

colonCapitalDee 22 hours ago||

Beautiful work

MattyRad 18 hours ago||

Hoenikker had been experimenting with melting and re-freezing ice-nine in the kitchen of his Cape Cod home.

Beautiful, perhaps like ice-nine is beautiful.

sieste 14 hours ago||

The typos are interesting ("vocavulary", "inmput") - One of the godfathers of LLMs clearly does not use an LLM to improve his writing, and he doesn't even bother to use a simple spell checker.

shepherdjerred 5 hours ago||

> Write me an AI blog post

$ Sure, here's a blog post called "Microgpt"!

> "add in a few spelling/grammar mistakes so they think I wrote it"

$ Okay, made two errors for you!

meltyness 12 hours ago||

  vocabulary*

  *In the code above, we collect all unique characters across the dataset

huqedato 5 hours ago||

Looking for alternative in Julia.

WithinReason 15 hours ago||

Previously:

https://news.ycombinator.com/item?id=47000263

retube 15 hours ago||

Can you train this on say Wikipedia and have it generate semi-sensible responses?

krisoft 11 hours ago||

No. But there are a few layers to that.

First no is that the model as is has too few parameters for that. You could train it on the wikipedia but it wouldn’t do much of any good.

But what if you increase the number of parameters? Then you get to the second layer of “no”. The code as is is too naive to train a realistic size LLM for that task in realistic timeframes. As is it would be too slow.

But what if you increase the number of parameters and improve the performance of the code? I would argue that would by that point not be “this” but something entirely different. But even then the answer is still no. If you run that new code with increased parameters and improved efficiencly and train it on wikipedia you would still not get a model which “generate semi-sensible responses”. For the simple reason that the code as is only does the pre-training. Without the RLHF step the model would not be “responding”. It would just be completing the document. So for example if you ask it “How long is a bus?” it wouldn’t know it is supposed to answer your question. What exactly happens is kinda up to randomness. It might output a wikipedia like text about transportation, or it might output a list of questions similar to yours, or it might output broken markup garbage. Quite simply without this finishing step the base model doesn’t know that it is supposed to answer your question and it is supposed to follow your instructions. That is why this last step is called “instruction tuning” sometimes. Because it teaches the model to follow instructions.

But if you would increase the parameter count, improve the efficiency, train it on wikipedia, then do the instruction tuning (wich involves curating a database of instruction - response pairs) then yes. After that it would generate semi-sensible responses. But as you can see it would take quite a lot more work and would stretch the definition of “this”.

It is a bit like asking if my car could compete in formula-1. The answer is yes, but first we need to replace all parts of it with different parts, and also add a few new parts. To the point where you might question if it is the same car at all.

nebben64 9 hours ago||

Very useful breakdown; thank you!

OJFord 13 hours ago|||

If you increase all the numbers (including, as a result, the time to train).

geon 12 hours ago||

That’s exactly what chatgpt etc are.

rramadass 21 hours ago||

C++ version - https://github.com/Charbel199/microgpt.cpp?tab=readme-ov-fil...

Rust version - https://github.com/mplekh/rust-microgpt

ThrowawayTestr 22 hours ago||

This is like those websites that implement an entire retro console in the browser.

geon 12 hours ago|

Is there a similarly simple implementation with tensorflow?

I tried building a tiny model last weekend, but it was very difficult to find any articles that weren’t broken ai slop.

joefourier 12 hours ago|

Tensorflow is largely dead, it’s been years since I’ve seen a new repo use it. Go with Jax if you want a PyTorch alternative that can have better performance for certain scenarios.

nickpsecurity 3 hours ago|||

Also, TPU support. Hardware diversity.

geon 11 hours ago|||

Any recommendations for Typescript?

More comments...