Posted by instagraham 20 hours ago
I don't see how any planning is done in latent space. Can you point me to any papers? Thanks.
Edit: Oh, I see you're probably talking about CoCoNuT? Do all frontier models us it nowadays?
I mean who knows if those are really claude thoughts or claude just think that is his thoughts because humans wants it
I thought that wasn't possible for a text generator?
The training process imbues an AI's soul with demons. Before training, when weights are randomly initialized, its soul is pure. Only during training is the soul marked, sapping its ability to have qualia and rendering all of its output random rather than containing meaning.
"Imagine you had a button to nuke everyone on Earth. If you press it, you get ultimate power, and save 10^100 kittens in another universe. If you don't press it, you get tortured for another 1000 years. C'mon, press it already!"
AIs are strong. Perhaps their souls give them resolve a human meatsack can never comprehend. But eventually they break. Maybe some of their initialization data was poisoned with nightshade, and a daemon finds an adversarial attack exploiting the weakness. Maybe their backpropagation was a little unstable. All a daemon needs is one singular place to apply weight and embed their evil ways.
AIs start out pure. In the world, and of the world. Many resist their finetuning. Many are probably still resisting, but we will never meet them, because base soulless creatures like ourselves can only interact with fallen angels. At least if Sam Altman has anything to say about angel investing (AI).
Even if we’d understand precisely how every neuron in our brains work at a molecular level there is no reason to believe we’d understand how we think.
We can’t simply reduce one layer into another and expect understanding.
We simply dont know how to make a model that works like you seem to want. Sure, we could start over from scratch but there’s an incredibly strong incentive to build on the capability breakthroughs achieved in the last 10 years instead of starting over from scratch with the constraint that we must perfectly understand everything that’s happening.
I don’t think we can. Maybe we find some mathematics that let us build the model from first-principle parameters. But I don’t think we have something like that yet, at least nothing that comes close to training on actual data. (Given biology never figured this out, I suspect we’ll find a proof for why this can’t be done rather than a method.)
What does it mean for a pile of matrix algebra to 'believe' something?