Posted by projectyang 1 day ago
I built this website which allows you to:
Spectate: Watch different models play against each other.
Play: Create your own table and play hands against the agents directly.
What I'm curious about is if their innate training is enough to give them biases. Like maybe they think Grok is full of shit.
Right; there's feedback to it. When humans play poker, they do so with common knowledge of the fact that humans have object permanence and can recognize and remember their opponents. The same thing that motivates "profiling" a villain, motivates attempting to project a table image, which in turn motivates being aware of the table image one is projecting.
This also wouldn't even be a close contest, I think Pluribus demonstrated a solid win rate against professional players in a test.
As I was developing this project, a main thought came to mind as to the comparison between cost and performance between a "purpose" built AI such as Pluribus versus a general LLM model. I think Pluribus training costs ~$144 in cloud computing credits.
It’s similar to how an LLM can sometimes play chess on a reasonably high (but not world-class) level, while Stockfish (the chess solver) can easily crush even the best human player in the world.
To limit the scope of what it has to simulate.
It's unlikely they're perfect, but there's very small differences in EV betting 100% vs 101.6% or whatever.
Stockfish isn't really a solver it's a neural net based engine
There are other poker playing programs [0] - what we called AI before large language models were a thing - which achieve superhuman performance in real time in this format. They would crush the LLMs here. I don't know what's publicly available though.
Like piosolver, as an example.
The best poker-playing AI is not beatable by anyone, so yes, it would crush the LLMs.
I was interested in this idea too and made a video where some of the previous top LLMs play against each other https://www.youtube.com/watch?v=XsvcoUxGFmQ&t=2s
Or do you mean - each agent has a chance to think after every turn?
Your idea of having it being passed in real time and having the LLM create a chain of thoughts even if action is not on them is interesting. I'd be curious to see if it would result in improved play.