Posted by projectyang 1/10/2026
I built this website which allows you to:
Spectate: Watch different models play against each other.
Play: Create your own table and play hands against the agents directly.
I had similar idea for people to code poker playing bots and enter tournaments versus each other, this was pre-llm, however.
It would be fun if you hosted a 'tournament' every month and had each of the latest releases from the major models participate and see who comes out on top.
Or perhaps do open it up to others to enter and participate versus each other - where they can choose the model they want to build with and also enter custom prompt instructions to mold the play as they wish.
If you walk this path, would love to chat more.
This also wouldn't even be a close contest, I think Pluribus demonstrated a solid win rate against professional players in a test.
As I was developing this project, a main thought came to mind as to the comparison between cost and performance between a "purpose" built AI such as Pluribus versus a general LLM model. I think Pluribus training costs ~$144 in cloud computing credits.
It’s similar to how an LLM can sometimes play chess on a reasonably high (but not world-class) level, while Stockfish (the chess solver) can easily crush even the best human player in the world.
To limit the scope of what it has to simulate.
It's unlikely they're perfect, but there's very small differences in EV betting 100% vs 101.6% or whatever.
Stockfish isn't really a solver it's a neural net based engine
Your 72o comment indicates you are either playing with very weak players, or have gotten lucky, as in reasonably competitive games playing (and then full bluffing) 72o will be significantly negative EV. Try grinding that strategy at a public 10/20 table and you will be quickly butchered and sent back to the ATM.
There are other poker playing programs [0] - what we called AI before large language models were a thing - which achieve superhuman performance in real time in this format. They would crush the LLMs here. I don't know what's publicly available though.
Like piosolver, as an example.
The best poker-playing AI is not beatable by anyone, so yes, it would crush the LLMs.
Given online is now bot-riddled, I half-finished something similar a while back, where the game was adopting and 'coaching' (a <500 character prompt was allowed every time the dealer chip passed, outside of play) an LLM player, as a kind of gambling-on-how-good-at-prompting-you-are game. Feature request! The rake could pay for the tokens, at least.
It's mostly a ChatGPT conversational interface over a classic Solver (Monte-Carlo simulation based), but that ease of use makes it very convenient for quick post-game analysis of hands.
I'm sure if you hook a Solver to a hud, it might be even simpler, but it's quite burdensome for amateurs, and it might be too close to cheating.
I was interested in this idea too and made a video where some of the previous top LLMs play against each other https://www.youtube.com/watch?v=XsvcoUxGFmQ&t=2s
That is, good enough to compete amongst each other but not good enough to for one to win.
PLAYER shows A♠ 6♣ (Pair)
GPT (5.2) shows Q♠ Q♥ (Pair)
I had paired with a 6 and no aces on the board.