Posted by growingswe 12 hours ago
"How wrong was the prediction? We need a single number that captures "the model thought the correct answer was unlikely." If the model assigns probability 0.9 to the correct next token, the loss is low (0.1). If it assigns probability 0.01, the loss is high (4.6). The formula is − log ( � ) −log(p) where � p is the probability the model assigned to the correct token. This is called cross-entropy loss."
Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...
In 3 days they've covered machine learning, geometry, cryptography, file formats and directory services.
The "TRAINING" visualization does seem synthetic though, the graph is a bit too "perfect" and it's odd that the generated names don't update for every step.
For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does.