Posted by salkahfi 4 hours ago
We have agents implement agents that play games against each other- so Claude isn't playing against GPT, but an agent written by Claude plays poker against an agent written by GPT, and this really tough task leads to very interesting findings on AI for coding.
Are you going to share those with the class or?
Maybe we should just get rid of tedious benchmarks like chess altogether at this point that is leading people to think of how to limit AI as a way of keeping it a relevant benchmark rather than expanding on what is already there.
And as a poker player, I can say that this game is much more challenging for computers than chess, writing a program that can play poker really well and efficiently is an unsolved problem.
Heh, we really did come full circle on this! When chatgpt launched in dec22 one of the first things that people noticed is that it sucked at math. Like basic math 12 + 35 would trip it up. Then people "discovered" tool use, and added a calculator. And everyone was like "well, that's cheating, of course it can use a calculator, but look it can't do the simple addition logic"... And now here we are :)
How you work without calculators is a proxy for real world competency.
Trying to solve everything with CoT alone without utilising tools seems futile.
Chess engines don’t grow on trees, they’re built by intelligent systems that can think, namely human brains.
Supposedly we want to build machines that can also think, not just regurgitate things created by human brains. That’s why testing CoT is important.
It’s not actually about chess, it’s about thinking and intelligence.
That was a whole half a decade ago, but back then deep learning AIs were defeated very badly by handcrafted scripts. Even the best bot in the neural net category was actual a symbolic script/neural net hybrid.
Bizarre.
AI already has a very creative imagination for role play so this just adds extra to their arsenal.