TLDR; Train an "energy" model that checks if the output is correct (rather than just outputting something), and gradient descent to find good outputs. Using transformers.
tripplyons 15 hours ago|
I've seen some of that channel's videos before, and many of them contain errors. I haven't read the Energy-Based Transformers paper yet, so I can't say for sure if this video contains any errors, but be careful.