Microgpt - Hacker News

Posted by tambourine_man 1 day ago

Microgpt(karpathy.github.io)

1723 points | 296 commentspage 6

tithos 1 day ago|

What is the prime use case

keyle 1 day ago||

it's a great learning tool and it shows it can be done concisely.

geerlingguy 1 day ago|||

Looks like to learn how a GPT operates, with a real example.

foodevl 1 day ago||

Yeah, everyone learns differently, but for me this is a perfect way to better understand how GPTs work.

inerte 1 day ago|||

Kaparthy to tell you things you thought were hard in fact fit in a screen.

antonvs 1 day ago|||

To confuse people who only think in terms of use cases.

Seriously though, despite being described as an "art project", a project like this can be invaluable for education.

hrmtst93837 19 hours ago|||

Education often hinges on breaking down complex ideas into digestible chunks, and projects like this can spark creativity and critical thinking. What may seem whimsical can lead to deeper discussions about AI's role and limitations.

bourjwahwah 1 day ago|||

[dead]

jackblemming 1 day ago|||

Case study to whenever a new copy of Programming Pearls is released.

aaronblohowiak 1 day ago||

“Art project”

pixelatedindex 1 day ago||

If writing is art, then I’ve been amazed at the source code written by this legend

with 22 hours ago||

"everything else is just efficiency" is a nice line but the efficiency is the hard part. the core of a search engine is also trivial, rank documents by relevance. google's moat was making it work at scale. same applies here.

lukan 22 hours ago||

Sure, but understanding the core concepts are essential to make things efficient and as far as I understand, this has mainly educational purposes ( it does not even run on a GPU).

with 22 hours ago||

yep, agreed. wasn’t knocking the project at all, it’s great for exactly that purpose

geon 17 hours ago||

I think the hard part is improving on the basic concept.

The current top of the line models are extremely overfitted and produce so much nonsense they are useless for anything but the most simple tasks.

This architecture was an interesting experiment, but is not the future.

profsummergig 1 day ago|

If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.

simsla 1 day ago|

The blog post literally explains how to do so.

hrmtst93837 16 hours ago|||

It's true, the post lays out the details clearly, but a hands-on example can often make the concepts more tangible. Seeing it in action helps solidify understanding.

hrmtst93837 17 hours ago||||

The post lays out the steps clearly, but implementing them often reveals unexpected challenges. It's usually more complicated in practice than it appears on paper.

profsummergig 10 hours ago||

This. I literally am asking for a step-by-step guide outlining every step (including an existing corpus that can be used on a consumer-grade laptop to train the model in under a week).

hrmtst93837 20 hours ago|||

If the implementation details are clear, replicating the setup can be worthwhile. Sometimes seeing it in action helps to better understand the nuances.