NanoChat – The best ChatGPT that $100 can buy

Posted by huseyinkeles 1 day ago

NanoChat – The best ChatGPT that $100 can buy(github.com)

https://x.com/karpathy/status/1977755427569111362

1432 points | 293 commentspage 2

chipsrafferty 1 day ago|

Would love to hear some metrics on training it on your personal computer rather than a "cloud GPU box". I don't care if it takes 3 months to train if I have something good, offline, and free(ish, but just pay electric bills)

ComputerGuru 7 hours ago||

Each H100 can do 60 TFLOPS of f32 operations, while a single RTX 3080 can do roughly half that (just under 30). So complete back-of-the-envelope answer would be 16x as long (since nanochat is targeting four hours with 8xH100)

64 hours isn’t too bad at all!

(An RTX 2080 can only do 10 TFLOPS for fp32, so that would be again 3x as long.)

zoba 10 hours ago||

I’d also be interested in this. Especially for Macs

daft_pink 1 day ago||

Wow, how do we sign up for the Eurekalabs course and how much does it cost?

karpathy 1 day ago||

Still under development, remaining work includes tuning nanochat (current state being solid v0.1) and finalizing the in-between projects so that students can "unlock" all complexity that hides underneath: `torch.Tensor`, `torch.dist`, `.backward()`, '.compile()`, etc. And then the more ops heavy aspects.

BrokenCogs 1 day ago||

What's the pricing for the course/EurekaLabs? P.s. thanks for all you're doing

huseyinkeles 1 day ago||

Karpathy says nanochat will become the capstone project of the course LLM101n being developed by Eureka Labs.

I guess it’s still a work in progress? Couldn’t find any other information elsewhere.

Schiphol 1 day ago||

A bit more info [here](https://github.com/karpathy/LLM101n)

TheAceOfHearts 1 day ago||

Here's the announcement post [0] from Karpathy, which provides a bit of additional context.

[0] https://x.com/karpathy/status/1977755427569111362

dang 1 day ago|

Thanks - we'll put that in the toptext as well

dabockster 1 day ago||

The title is extremely misleading - you have to rent time on an H100 cluster to get it to work. It is not on-device, and thus not truly $100.

I was really excited, too, until I looked through the readme files and the code.

rpdillon 22 hours ago||

The title is saying you can train your own model for $100. That part is true: the $100 goes to the cloud provider to rent you $250k of hardware for four hours. Then you can run that model on whatever hardware you have lying around, because it's really small.

simonw 1 day ago|||

It's about training a model from scratch for $100.

mynameisjoseph 1 day ago|||

I feel same. The title looks like I could have on-deivce ChatGPT with $100 forever. I couldn't imagine it's about training the model by myself.

simonw 1 day ago||

Since the resulting model is only ~561M parameters you could run it on a Raspberry Pi that costs less than $100.

arkmm 1 day ago||

What's misleading about that? You rent $100 of time on an H100 to train the model.

kragen 1 day ago||

This is really inspiring! Does anyone have some example of how well or poorly it performs on some example prompts?

kragen 1 day ago|

Simon.incutio.com points out that there are screenshots on https://xcancel.com/karpathy/status/1977755430093980034.

JKCalhoun 1 day ago||

"The fastest way to feel the magic is to run the speedrun script speedrun.sh, which trains and inferences the $100 tier of nanochat. On an 8XH100 node at $24/hr, this gives a total run time of about 4 hours."

I am clueless and don't understand this. Where is the $100 being spent? Some sort of API you have to pay to access? Some sort of virtual hardware you have to rent access to?

simonw 1 day ago||

H100s are expensive NVIDIA GPUs, each costing about $30,000. 8XH100 means you have 8 of those wired together in a big server in a data center somewhere, so around a quarter of a million dollars worth of hardware in a single box.

You need that much hardware because each H100 provides 80GB of GPU-accessible RAM, but to train this model you need to hold a LOT of model weights and training data in memory at once. 80*8 = 640GB.

~$24/hour is how much it costs to rent that machine from various providers.

calmoo 1 day ago|||

Perfectly explained, thanks!

JKCalhoun 1 day ago|||

Thank you.

llleeeooo 1 day ago||

Renting 8 H100s would cost you about 24/h

sbassi 1 day ago||

Which data uses for training?

simonw 1 day ago||

karpathy/fineweb-edu-100b-shuffle: https://huggingface.co/datasets/karpathy/fineweb-edu-100b-sh...

Which is derived from HuggingFaceFW/fineweb-edu: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu

HuggingFaceTB/smol-smoltalk: https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk

And extra fine-tuning on portions of:

cais/mmlu: https://huggingface.co/datasets/cais/mmlu

openai/gsm8k: https://huggingface.co/datasets/openai/gsm8k

allenai/ai2_arc: https://huggingface.co/datasets/allenai/ai2_arc

eranation 1 day ago||

I think he mentioned somewhere he used fineweb (I assume this one https://huggingface.co/datasets/HuggingFaceFW/fineweb)

jmspring 21 hours ago||

8XH100 nodes start at ~$450ish/day. Not sure about the $100 part. I need to dig into the post.

simonw 21 hours ago|

The quoted $100 price is for 4 hours at $24/hour. 450 / 24 = $18.75 so your numbers roughly match that.

jmspring 21 hours ago||

Thanks. Working on platforms - days are more interesting than hours.

mips_avatar 17 hours ago||

Thanks Andrej for putting this up. Your videos gave me the confidence to work full time on LLMs last year after I left Microsoft

desaiguddu 12 hours ago|

I am building a product similar to DataGPT https://datagpt.com/ and Julius.ai - will this help in that?

simonw 10 hours ago|

Not at all. This project is for learning how LLMs work and how to build them from first principles. If you want to solve problems that aren't "how do I build an LLM from scratch" this isn't the right path for you.

More comments...