Posted by neamar 6 days ago
______________________________
I think it's due to the subject matter and people being very relatable to me. And it's real, filmed while it happened, instead of some madeup or retold story.
Forcing an Error is an especially hard case because in machine vs machine matches both sides would be aware that something could force an error and would therefore not fall for it.
https://xcancel.com/polynoamial?lang=en
https://arxiv.org/abs/2301.09159
(To be fair, re (card) games: I'm also only interested in seeing Cyborg-on-Cyborg action. Lee vs a-G almost qualified :)
(I played a MTG game years ago and it was not fair, the opponent's deck was not shuffled but they always had cards that provided a certain experience)
They're ordered by date from newest to oldest, so it's the 3rd and 4th games v Lee Sedol from the top down.
What aspect of efficiently exploring the combinatorial explosion in possibilities of iterated rule-based systems is the human brain still currently doing much better than machines?
I happen to have recently written up a longer history of Go AI. If you're wondering about what is special about Go in particular or what generalizes to other problems, give it a read.
To a non-go player like myself, both moves 37 and 78 seemed completely arbitrary. I mean, much of the video talks about how it's impossible to calculate all the future moves like in chess, yet move 37 of a possible ~300 move game is called out as genius, and move 78 is a God Hand.
For the layman like myself, it seemed a bit inconsistent.
The thing that made me smile was how history repeated itself. Sedol predicted a 5-0 win against the program. Kasparov was pretty cocky as well in the 1990s. You'd think someone would have warned him! "Hey Sedol. Cool your jets, these guys wouldn't be spending so much money just to embarrass themselves."
DeepMind was definitely way more polite than IBM, so that was good to see. The Deep Blue team were sorta jerks to Gary.
Every move is a choice of ~300 possibilities, and you need to calculate far ahead to know if it's a good move or not, so the number of choices you have to explore is much greater than what it seems.
In between these two extremes is the dance where the elegance happens. Large, seemingly secure areas get split into two. Multiple, separate battles grow and merge into larger ones. A single, well placed stone earlier in the game could prove pivotal as a battle creeps towards it.
I was watching this game with my go club and we all instantly saw the significance of 37, it was audible in the room. 78 felt tangibly different, some of us immediately read it as a clear misplay, some were taking longer to come to any conclusion, just puzzled. Our most experienced player, at the time 5 dan, gasped when he got it. But it still took him time to even intuit what it was doing. Now that it is well understood, moves of that type are common even in intermediate level play. Changed the game forever.
That's an important takeaway from the AlphaGo saga:
It played moves that (at the time, for human players) seemed weird. And while playing those, outperformed humans.
But as understanding of how/why of such moves grew, it showed humans new ways of doing things. And in doing so, become better players themselves.
AlphaGo broke new ground, humans followed. And like you said: changed the game forever.
Also, the subtlety of what makes a win:
Humans, before AlphaGo: try & grab as much territory as possible to beat your opponent.
AlphaGo: just try to grab more territory than opponent (so, not necessarily much more). End up with only 1 point advantage = still a win.
Different viewing angle, different strategy, different outcome.
What I would have liked is for the video to take a minute to explain how a single move so early in a game was immediately obvious to players as amazing, given so much focus was put on the fact that there are more move options than atoms in the universe.
Let me rephrase: Mathematically, what was it about move 37 that reduced the quintillions+ of possible outcomes down to a perceived guaranteed win?
My assumption is that there are far fewer combinations of practical moves, which constrains the calculations considerably. I would have liked to have known more about that.
I don't think there's really a framework for that sort of analysis yet. Go players talk about influence and structure but they aren't thinking of a move shrinking the problem space in that way, even though of course it does.
And mathematical analysis has so far mostly (afaik) been about the broader game. Trying to use computation to understand the value of individual moves in this way is pretty much exactly the dead end that caused deepmind to wind up using the approach they did. An approach that certainly wins games, but so far it has been up to normal go players to explain why, and they use traditional go player tools to do so.
If you find anything let me know bc it's super interesting. But I think what you're looking for is an as-yet-unwritten math or CS thesis written by a serious go playing phd candidate.
If by that you mean reinforcement learning, that's not the case; e.g. see https://arxiv.org/abs/2501.12948
modern post-training uses RL and immense amounts of synthetic data to iteratively bootstrap better performance. if you squint this is extremely similar to the AlphaZero approach of iteratively training using RL over data generated through self-play