Posted by swatson741 3 days ago
def clipped_error(x):
return tf.select(tf.abs(x) < 1.0,
0.5 * tf.square(x),
tf.abs(x) - 0.5) # condition, true, false
Following the same principles that he outlines in this post, the "- 0.5" part is unnecessary since the gradient of 0.5 is 0, therefore -0.5 doesn't change the backpropagated gradient. In addition, a nicer formula that achieves the same goal as the above is √(x²+1)9 years ago, 365 points, 101 comments
1) Learn backprop, etc, basic math
2) Learn more advanced things, CNNs, LMM, NMF, PCA, etc
3) Publish a paper or poster
4) Forget basics
5) Relearn that backprop is a thing
repeat.
Some day I need to get my education together.
Back propagation is reverse mode auto differentiation. They are the same thing.
And for those who don't understand what back propagation is, it is just an efficient method to calculate the gradient for all parameters.
https://youtu.be/lXUZvyajciY?si=vbqKDOOY7l-491Ka&t=7028
Not too many details on timeline - just that he's working on it.
Then again, it might have been the corporate stuff that burned him out rather than the engineering.
> nanochat will become the capstone project of the course LLM101n being developed by Eureka Labs.