Friday, January 09, 2015

Heads-up limit hold ‘em poker is solved

Researchers at the University of Alberta have just announced that “heads-up limit hold ‘em”, the simplest form of poker commonly played for cash stakes, has been “solved” (science paper, summary on nature.com). This means that the researchers have found a complete strategy, which describes what plays to make in every possible situation of the game, and that is essentially unbeatable. This strategy will not lose against any other strategy, a defensive strategy that therefore makes it almost certain to win over the long run. This is a landmark in artificial intelligence, as another game is dominated by computers and not humans at the highest level (think chess). But what does this mean for the psychology of how humans play poker, and does it mean that downloading the perfect strategy will make you rich?

Game theory is the study of optimal decisions involving two or more competing interests (players). Games such as noughts-and-crosses, or indeed poker, were the initial inspiration for game theory, but now the theory of games covers any situation where the strategies and payoffs for two or more players can be formally modeled. The current research marks a landmark in at least one way: this is the first time that a significant game of “incomplete information” has been solved. Previous games to be solved (e.g. checkers) are games of “complete information”, where both players know the exact state of the game. In poker, each player’s private cards mean that neither player knows the exact game state (until the hand is over), and this leads to the phenomenon of bluffing where players pretend to be strong when they’re actually weak. Games of incomplete information are more realistic as models of real-life games, where the assumption of complete information can be far-fetched.

Computers solve games via brute-force, trialing different strategies until they find one that cannot be beaten. The complexity of most games is what makes this difficult. To make progress with complex games, researchers generally need to combine a massive amount of computing power with some clever programming tricks to design strong strategies. These were the main factors also underlying this breakthrough: a new solution algorithm and over two months of calculations over 4,000 CPUs were enough to do the job. This was enough computing power to surpass previous heads-up limit hold ‘em agents (which have been involved in annual competitions for some time).

This means that the latest breakthrough is excellent work but is not a real game-changer. In terms of playing style, the latest agent plays pretty much the same strategy to a neutral observer as previous agents. The only difference is now that the new agent is so close to the optimal game theoretic strategy, that researchers have called this form of poker “solved” in a bid to move research to more complex games. There are many more complex forms of poker that have not been solved, and will become a larger object of focus after this result. Merely adding one extra player to the game (the term “heads-up” refers to when only two players are involved in a game), or changing the betting rules slightly (in “limit” poker a player can only bet a set amount at each situation, in “no-limit” players can choose their bet size) is sufficient to create a much more complex game. These more complex forms of poker are unlikely to be solved any time soon.

As a game played for cash stakes, heads-up limit hold ‘em poker has been declining in popularity for some time. One reason is the relative simplicity of this form of poker; in fact poker agents have been (illegally) playing this game online for some time against unsuspecting punters across various poker networks. You can even play this form of poker against computers on modern betting terminals in Las Vegas. These games are free of charge, but the owners are so confident their strategy is better than any human that all-comers are accepted.  So this breakthrough is unlikely to make you a fortune if you get hands on a copy: the low-hanging fruit have been harvested some time ago in this form of poker.

Of course, as humans we have no way of remembering the terabyte-or-so of information (in heavily compressed form) required to describe this unbeatable strategy. So how can humans possibly hope to compete? Well, humans will never be as good as the computer’s strategy in a straight battle, but the approximations and heuristics that skilled humans use have proven remarkably effective. Many professional players use two key heuristics on the first round of heads-up limit hold ‘em: raise-or-fold as the first player (never calling), and always call as the first player if the second player re-raises. It turns out that the computer agrees with these simple heuristics well over 99% of the time. A boundedly-rational human plays almost exactly the same way as a computer with complete knowledge! (Without requiring the million hours of CPU time.) These heuristics have been an open secret among top players for more than five years.

Humans have further advantages over computers. Tweak a few parameters of the game and most expert humans can easily adjust, but an optimising agent has to completely start from scratch. Secondly, humans can often win a lot more money against imperfect opponents than this agent (using what are called exploitive strategies), an important criterion in the real-world where winning money and not prestige are the main goals of a professional gambler. To sum up, solving heads-up limit hold ‘em is a great achievement in artificial intelligence, but one that just goes to underscore how amazing human intelligence can be.

(In a past life I was a professional heads-up limit hold ‘em player and I have written two books on how humans can approximate optimal strategies in poker, specifically focusing on heads-up limit hold ‘em poker as a model for more complex forms of poker.)

No comments: