Top
Best
New

Posted by huseyinkeles 10/13/2025

NanoChat – The best ChatGPT that $100 can buy(github.com)
https://x.com/karpathy/status/1977755427569111362
1523 points | 308 commentspage 3
mhitza 10/13/2025|
Should be "that you can train for $100"

Curios to try it someday on a set of specialized documents. Though as I understand the cost of running this is whatever GPU you can rent with 80GB of VRAM. Which kind of leaves hobbyists and students out. Unless some cloud is donating gpu compute capacity.

Onavo 10/13/2025||
A GPU with 80GB VRAM costs around $1-3 USD an hour on commodity clouds (i.e. the non-Big 3 bare metal providers e.g. https://getdeploying.com/reference/cloud-gpu/nvidia-h100). I think it's accessible to most middle class users in first world countries.
antinomicus 10/13/2025||
Isn’t the whole point to run your model locally?
theptip 10/13/2025|||
No, that’s clearly not a goal of this project.

This is a learning tool. If you want a local model you are almost certainly better using something trained on far more compute. (Deepseek, Qwen, etc)

yorwba 10/13/2025||||
The 80 GB are for training with a batch size of 32 times 2048 tokens each. Since the model has only about 560M parameters, you could probably run it on CPU, if a bit slow.
simonw 10/13/2025||||
You can run a model locally on much less expensive hardware. It's training that requires the really big GPUs.
_ea1k 10/13/2025|||
I'd guess that this will output faster than the average reader can read, even while using only CPU inferencing on a modern-ish CPU.

The param count is small enough that even cheap (<$500) GPUs would work too.

portaouflop 10/13/2025||
If I have let’s say 40gb RAM does it not work at all or just take twice as long to train?
typpilol 10/13/2025||
Won't work at all. Or if it does it'll be so slow since it'll have to go to the disk for every single calculation so it won't ever finish.
karpathy 10/13/2025||
It will work great with 40GB GPU, probably a bit less than twice slower. These are micro models of a few B param at most and fit easily during both training and inference.
utopcell 10/14/2025||
How low can this go? Can this run on a 5090 card (32GiB)?
JonathanFly 10/14/2025||
Set nproc_per_node-1 instead of 8 (or run the training script directly instead of using torchrun) and set device_batch_size=4 instead of 32. You may be able to use 8 with a 5090, but it didn't work on my 4090. However it's way slower than expected, one H100 isn't 250x the 4090, so I'm not sure it's training correctly. I'll let it run overnight and see if the outputs make any sense, maybe the metrics are not accurate in this config.
jmspring 10/14/2025||
8XH100 nodes start at ~$450ish/day. Not sure about the $100 part. I need to dig into the post.
simonw 10/14/2025|
The quoted $100 price is for 4 hours at $24/hour. 450 / 24 = $18.75 so your numbers roughly match that.
jmspring 10/14/2025||
Thanks. Working on platforms - days are more interesting than hours.
samus 10/13/2025||
Andrej Karpathy slays again by spreading knowledge about this important subject to the people!
KnowledgeWeaver 10/13/2025||
Ah, but this is nice project. I'll start hacking once it's easier to fine-tune it with own documents for specific questions. What plaques me, though, is how you prevent the model from answering questions it was not trained for?
spacecadet 10/14/2025||
Built so many nano AIs over the last several years. I have played with nanoGPT, its ok. Just hype for Kpathy... So many tiny LLMs out there now that run on cheap SOCs. Try SmolVLM512, runs fine on a sub $100 pi.
simonw 10/14/2025|
You're misunderstanding the project. This isn't about an LLM that runs on $100 hardware. It's about a usable LLM that costs $100 to train from scratch.
spacecadet 10/19/2025||
No I get that. Having trained my own small LLMs and for much less than $100.
wyldfire 10/13/2025||
I would love to take an existing open-weight model and fine-tune it with specific training data along these lines. Can I do that with Qwen or GLM? Is there a ~simple recipe for doing that?
Havoc 10/13/2025||
>If your GPU(s) have less than 80GB, you'll have to tune some of the hyperparameters or you will OOM / run out of VRAM. Look for --device_batch_size in the scripts and reduce it until things fit. E.g. from 32 (default) to 16, 8, 4, 2, or even 1.

That sounds like it could run on a 24gb GPU. Batch size of 8 would imply 20gb mem, no?

...presumably just takes forever

zipy124 10/13/2025||
Yes, you can always stream data when training or doing inference on models when vram is lacking but the slow down is extremely noticeable. This is the case for CPU code too and is why optimising for bandwidth is so critical in high-performance computing. Your ability to compute is almost always substantially larger than your bandwidth. An Avx512 capable CPU with a suitable amount of cores is easily capable of doing multiple terabytes of fp64 operations per second, but is typically limited by memory bandwidth, GPUs with LLMs have just broadened this knowledge to more people.

A fun consequence of the fact that CPUs got faster at a rate quicker than memory is look up tables of pre-computed values used to be common optimisations in code, but now it is almost always quicker to re-compute them than to retrieve a pre-computed value from memory for common use-cases.

JonathanFly 10/14/2025||
> Batch size of 8 would imply 20gb mem, no?

I'm running it now and I had to go down to 4 instead of 8, and that 4 is using around 22-23GB of GPU memory. Not sure if something is wrong or if batch is only scaling part of the memory requirements. (Edit: I restarted running the training script directly instead of torch run, and 8 still doesn't fit, but 4 is now using 16-17 instead.)

On my 4090 the tok/sec is 523, which is 1/2000 of the 1,000,000 tok/sec of the 8 80GB H100s. That feels too slow so maybe something is wrong. The 4090 is about 1/3 of the raw compute. I'm sure there's other losses from less batching but even if it were 1/10ths as fast, I'd expected something more like 1,000,000 / 10 / 8 so at least 10,000 tok/sec.

Havoc 10/14/2025||
Thanks for investigating. Sounds like throwing some dollars at a cloud gpu makes more sense then
RobGR 10/13/2025|
This is an LLM trained using a $100 budget to RENT access to graphics cards. It's not about what you could do BUYING hardware for $100.
danielmarkbruce 10/13/2025||
Nowhere does he suggest he is buying hardware.
HelloMcFly 10/14/2025||
Once the LLM is trained you don't need the rented hardware anymore.
More comments...