Posted by rippeltippel 4 days ago
The results for Nemotron 3 Nano are hard to parse, and I think actually incorrect: https://hfviewer.com/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-B... I'm guessing this is because the implementation uses layers that are all instances of the same class, with forward passes that branch on the layer type specified at construction time.