This applies Counterfactual Regret Minimization (CFR) to Kuhn poker.
Kuhn Poker is a two player 3-card betting game. The players are dealt one card each out of Ace, King and Queen (no suits). There are only three cards in the pack so one card is left out. Ace beats King and Queen and King beats Queen - just like in normal ranking of cards.
Both players ante chip (blindly bet chip). After looking at the cards, the first player can either pass or bet chip. If first player passes, the the player with higher card wins the pot. If first player bets, the second play can bet (i.e. call) chip or pass (i.e. fold). If the second player bets and the player with the higher card wins the pot. If the second player passes (i.e. folds) the first player gets the pot. This game is played repeatedly and a good strategy will optimize for the long term utility (or winnings).
Here's some example games:
KAp
- Player 1 gets K. Player 2 gets A. Player 1 passes. Player 2 doesn't get a betting chance and Player 2 wins the pot of chips. QKbp
- Player 1 gets Q. Player 2 gets K. Player 1 bets a chip. Player 2 passes (folds). Player 1 gets the pot of because Player 2 folded. QAbb
- Player 1 gets Q. Player 2 gets A. Player 1 bets a chip. Player 2 also bets (calls). Player 2 wins the pot of .He we extend the InfoSet
class and History
class defined in __init__.py
with Kuhn Poker specifics.
38from typing import List, cast, Dict
39
40import numpy as np
41
42from labml import experiment
43from labml.configs import option
44from labml_nn.cfr import History as _History, InfoSet as _InfoSet, Action, Player, CFRConfigs
45from labml_nn.cfr.infoset_saver import InfoSetSaver
Kuhn poker actions are pass (p
) or bet (b
)
48ACTIONS = cast(List[Action], ['p', 'b'])
The three cards in play are Ace, King and Queen
50CHANCES = cast(List[Action], ['A', 'K', 'Q'])
There are two players
52PLAYERS = cast(List[Player], [0, 1])
55class InfoSet(_InfoSet):
Does not support save/load
60 @staticmethod
61 def from_dict(data: Dict[str, any]) -> 'InfoSet':
63 pass
Return the list of actions. Terminal states are handled by History
class.
65 def actions(self) -> List[Action]:
69 return ACTIONS
Human readable string representation - it gives the betting probability
71 def __repr__(self):
75 total = sum(self.cumulative_strategy.values())
76 total = max(total, 1e-6)
77 bet = self.cumulative_strategy[cast(Action, 'b')] / total
78 return f'{bet * 100: .1f}%'
This defines when a game ends, calculates the utility and sample chance events (dealing cards).
The history is stored in a string:
81class History(_History):
History
95 history: str
Initialize with a given history string
97 def __init__(self, history: str = ''):
101 self.history = history
Whether the history is terminal (game over).
103 def is_terminal(self):
Players are yet to take actions
108 if len(self.history) <= 2:
109 return False
Last player to play passed (game over)
111 elif self.history[-1] == 'p':
112 return True
Both players called (bet) (game over)
114 elif self.history[-2:] == 'bb':
115 return True
Any other combination
117 else:
118 return False
Calculate the terminal utility for player ,
120 def _terminal_utility_p1(self) -> float:
if Player 1 has a better card and otherwise
125 winner = -1 + 2 * (self.history[0] < self.history[1])
Second player passed
128 if self.history[-2:] == 'bp':
129 return 1
Both players called, the player with better card wins chips
131 elif self.history[-2:] == 'bb':
132 return winner * 2
First player passed, the player with better card wins chip
134 elif self.history[-1] == 'p':
135 return winner
History is non-terminal
137 else:
138 raise RuntimeError()
Get the terminal utility for player
140 def terminal_utility(self, i: Player) -> float:
If is Player 1
145 if i == PLAYERS[0]:
146 return self._terminal_utility_p1()
Otherwise,
148 else:
149 return -1 * self._terminal_utility_p1()
The first two events are card dealing; i.e. chance events
151 def is_chance(self) -> bool:
155 return len(self.history) < 2
Add an action to the history and return a new history
157 def __add__(self, other: Action):
161 return History(self.history + other)
Current player
163 def player(self) -> Player:
167 return cast(Player, len(self.history) % 2)
Sample a chance action
169 def sample_chance(self) -> Action:
173 while True:
Randomly pick a card
175 r = np.random.randint(len(CHANCES))
176 chance = CHANCES[r]
See if the card was dealt before
178 for c in self.history:
179 if c == chance:
180 chance = None
181 break
Return the card if it was not dealt before
184 if chance is not None:
185 return cast(Action, chance)
Human readable representation
187 def __repr__(self):
191 return repr(self.history)
Information set key for the current history. This is a string of actions only visible to the current player.
193 def info_set_key(self) -> str:
Get current player
199 i = self.player()
Current player sees her card and the betting actions
201 return self.history[i] + self.history[2:]
203 def new_info_set(self) -> InfoSet:
Create a new information set object
205 return InfoSet(self.info_set_key())
A function to create an empty history object
208def create_new_history():
210 return History()
Configurations extends the CFR configurations class
213class Configs(CFRConfigs):
217 pass
Set the create_new_history
method for Kuhn Poker
220@option(Configs.create_new_history)
221def _cnh():
225 return create_new_history
228def main():
Create an experiment, we only write tracking information to sqlite
to speed things up. Since the algorithm iterates fast and we track data on each iteration, writing to other destinations such as Tensorboard can be relatively time consuming. SQLite is enough for our analytics.
237 experiment.create(name='kuhn_poker', writers={'sqlite'})
Initialize configuration
239 conf = Configs()
Load configuration
241 experiment.configs(conf)
Set models for saving
243 experiment.add_model_savers({'info_sets': InfoSetSaver(conf.cfr.info_sets)})
Start the experiment
245 with experiment.start():
Start iterating
247 conf.cfr.iterate()
251if __name__ == '__main__':
252 main()