Top
Best
New

Posted by huseyinkeles 10/13/2025

NanoChat – The best ChatGPT that $100 can buy(github.com)
https://x.com/karpathy/status/1977755427569111362
1523 points | 308 commentspage 2
chipsrafferty 10/13/2025|
Would love to hear some metrics on training it on your personal computer rather than a "cloud GPU box". I don't care if it takes 3 months to train if I have something good, offline, and free(ish, but just pay electric bills)
ComputerGuru 10/14/2025||
Each H100 can do 60 TFLOPS of f32 operations, while a single RTX 3080 can do roughly half that (just under 30). So complete back-of-the-envelope answer would be 16x as long (since nanochat is targeting four hours with 8xH100)

64 hours isn’t too bad at all!

(An RTX 2080 can only do 10 TFLOPS for fp32, so that would be again 3x as long.)

zoba 10/14/2025||
I’d also be interested in this. Especially for Macs
TheAceOfHearts 10/13/2025||
Here's the announcement post [0] from Karpathy, which provides a bit of additional context.

[0] https://x.com/karpathy/status/1977755427569111362

dang 10/13/2025|
Thanks - we'll put that in the toptext as well
kragen 10/13/2025||
This is really inspiring! Does anyone have some example of how well or poorly it performs on some example prompts?
kragen 10/13/2025|
Simon.incutio.com points out that there are screenshots on https://xcancel.com/karpathy/status/1977755430093980034.
dabockster 10/13/2025||
The title is extremely misleading - you have to rent time on an H100 cluster to get it to work. It is not on-device, and thus not truly $100.

I was really excited, too, until I looked through the readme files and the code.

rpdillon 10/14/2025||
The title is saying you can train your own model for $100. That part is true: the $100 goes to the cloud provider to rent you $250k of hardware for four hours. Then you can run that model on whatever hardware you have lying around, because it's really small.
mynameisjoseph 10/13/2025|||
I feel same. The title looks like I could have on-deivce ChatGPT with $100 forever. I couldn't imagine it's about training the model by myself.
simonw 10/13/2025||
Since the resulting model is only ~561M parameters you could run it on a Raspberry Pi that costs less than $100.
simonw 10/13/2025|||
It's about training a model from scratch for $100.
arkmm 10/13/2025||
What's misleading about that? You rent $100 of time on an H100 to train the model.
JKCalhoun 10/13/2025||
"The fastest way to feel the magic is to run the speedrun script speedrun.sh, which trains and inferences the $100 tier of nanochat. On an 8XH100 node at $24/hr, this gives a total run time of about 4 hours."

I am clueless and don't understand this. Where is the $100 being spent? Some sort of API you have to pay to access? Some sort of virtual hardware you have to rent access to?

simonw 10/13/2025||
H100s are expensive NVIDIA GPUs, each costing about $30,000. 8XH100 means you have 8 of those wired together in a big server in a data center somewhere, so around a quarter of a million dollars worth of hardware in a single box.

You need that much hardware because each H100 provides 80GB of GPU-accessible RAM, but to train this model you need to hold a LOT of model weights and training data in memory at once. 80*8 = 640GB.

~$24/hour is how much it costs to rent that machine from various providers.

calmoo 10/13/2025|||
Perfectly explained, thanks!
JKCalhoun 10/13/2025|||
Thank you.
llleeeooo 10/13/2025||
Renting 8 H100s would cost you about 24/h
alex000kim 10/14/2025||
I created this PR to make it easier for folks to train and serve it on any cloud (or their own K8s): https://github.com/karpathy/nanochat/pull/18
sbassi 10/13/2025||
Which data uses for training?
simonw 10/13/2025||
karpathy/fineweb-edu-100b-shuffle: https://huggingface.co/datasets/karpathy/fineweb-edu-100b-sh...

Which is derived from HuggingFaceFW/fineweb-edu: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu

HuggingFaceTB/smol-smoltalk: https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk

And extra fine-tuning on portions of:

cais/mmlu: https://huggingface.co/datasets/cais/mmlu

openai/gsm8k: https://huggingface.co/datasets/openai/gsm8k

allenai/ai2_arc: https://huggingface.co/datasets/allenai/ai2_arc

eranation 10/13/2025||
I think he mentioned somewhere he used fineweb (I assume this one https://huggingface.co/datasets/HuggingFaceFW/fineweb)
megadragon9 10/17/2025||
Love the educational value of this "nano-sized" project. This reminded me of the from-scratch project I created to learn about deep learning libraries, neural networks all the way to LLMs like GPT-2 using just Numpy and Python [1]. Learning is done by "re-inventing the wheel" yourself, one step at a time :)

[1] https://github.com/workofart/ml-by-hand

cat_plus_plus 10/13/2025||
End to end training is a different beast, but finetuning and inference of impressive LLMs like QWEN3 can be done on pretty run of the mill hardware like Apple Silicon macs and gaming PCs if anyone wants a personalized assistant with character. Just ask AI how to finetune AI using unsloth (if using NVIDIA) or MLX (for apple) and it will give you ready to run python scripts.
lostmsu 10/13/2025|
This is going to be the single most powerful boost to my indie research efforts in years. Thank you, Andrej!
More comments...