league.payoff¶

shared_payoff¶

BattleRecordDict¶

class ding.league.shared_payoff.BattleRecordDict[source]¶

Overview:: A dict which is used to record battle game result. Initialized four fixed keys: wins, draws, losses, games; Each with value 0.
Interfaces:: __mul__

__mul__(decay: float) → dict[source]¶

Overview:

Multiply each key’s value with the input multiplier decay

Arguments:

decay (float): The multiplier.

Returns:

obj (dict): A deepcopied RecordDict after multiplication decay.

BattleSharedPayoff¶

class ding.league.shared_payoff.BattleSharedPayoff(cfg: easydict.EasyDict)[source]¶

Overview:: Payoff data structure to record historical match result, this payoff is shared among all the players. Use LockContext to ensure thread safe, since all players from all threads can access and modify it.
Interface:: __getitem__, add_player, update, get_key
Property:: players

__getitem__(players: tuple) → numpy.ndarray[source]¶

Overview:

Get win rates between home players and away players one by one

Arguments:

players (tuple): A tuple of (home, away), each one is a player or a player list.

Returns:

win_rates (np.ndarray): Win rate (squeezed, see Shape for more details) between each player from home and each player from away.

Shape:

win_rates: Assume there are m home players and n away players.(m,n > 0)
- m != 1 and n != 1: shape is (m, n)
- m == 1: shape is (n)
- n == 1: shape is (m)

add_player(player: ding.league.player.Player) → None[source]¶

Overview:

Add a player to the shared payoff.

Arguments:

player (Player): The player to be added. Usually is a new one to the league as well.

get_key(home: str, away: str) → Tuple[str, bool][source]¶

Overview:

Join home player id and away player id in alphabetival order.

Arguments:

home (str): Home player id
away (str): Away player id

Returns:

key (str): Tow ids sorted in alphabetical order, and joined by ‘-‘.
reverse (bool): Whether the two player ids are reordered.

update(job_info: dict) → bool[source]¶

Overview:

Update payoff with job_info when a job is to be finished. If update succeeds, return True; If raises an exception when updating, resolve it and return False.

Arguments:

job_info (dict): A dict containing job result information.

Returns:

result (bool): Whether update is successful.

Note

job_info has at least 5 keys [‘launch_player’, ‘player_id’, ‘env_num’, ‘episode_num’, ‘result’]. Key player_id ‘s value is a tuple of (home_id, away_id). Key result ‘s value is a two-layer list with the length of (episode_num, env_num).

create_payoff¶

Overview:

Given the key (payoff type), now supports keys [‘solo’, ‘battle’], create a new payoff instance if in payoff_mapping’s values, or raise an KeyError.

Arguments:

cfg (EasyDict): payoff config containing at least one key ‘type’

Returns:

payoff (BattleSharedPayoff or SoloSharedPayoff): the created new payoff, should be an instance of one of payoff_mapping’s values