I suspect Ant is already doing this for Claude. Takes a sh*t ton of compute though.
mips_avatar 11 hours ago||
nanochat is super capable, the d34 (2.2b) variant is competitive with qwens of that size. Andrej is I assume building out the improvements in preparation for bigger training runs. We desperately need a truly open model, so i think this is incredibly important.