A2C¶
A2CPolicy¶
- class ding.policy.a2c.A2CPolicy(cfg: dict, model: Optional[Union[type, torch.nn.modules.module.Module]] = None, enable_field: Optional[List[str]] = None)[source]¶
- Overview:
Policy class of A2C algorithm.
- _forward_collect(data: dict) dict [source]¶
- Overview:
Forward function of collect mode.
- Arguments:
- data (
Dict[str, Any]
): Dict type data, stacked env data for predicting policy_output(action), values are torch.Tensor or np.ndarray or dict/list combinations, keys are env_id indicated by integer.
- data (
- Returns:
output (
Dict[int, Any]
): Dict type data, including at least inferred action according to input obs.
- ReturnsKeys
necessary:
action
- _forward_eval(data: dict) dict [source]¶
- Overview:
Forward function of eval mode, similar to
self._forward_collect
.- Arguments:
- data (
Dict[str, Any]
): Dict type data, stacked env data for predicting policy_output(action), values are torch.Tensor or np.ndarray or dict/list combinations, keys are env_id indicated by integer.
- data (
- Returns:
output (
Dict[int, Any]
): The dict of predicting action for the interaction with env.
- ReturnsKeys
necessary:
action
- _forward_learn(data: dict) Dict[str, Any] [source]¶
- Overview:
Forward and backward function of learn mode.
- Arguments:
data (
dict
): Dict type data, including at least [‘obs’, ‘action’, ‘reward’, ‘next_obs’,’adv’]
- Returns:
info_dict (
Dict[str, Any]
): Including current lr and loss.
- _get_train_sample(data: list) Union[None, List[Any]] [source]¶
- Overview:
Get the trajectory and the n step return data, then sample from the n_step return data
- Arguments:
data (
list
): The trajectory’s buffer list
- Returns:
samples (
dict
): The training samples generated
- _init_collect() None [source]¶
- Overview:
Collect mode init method. Called by
self.__init__
. Init traj and unroll length, collect model.
- _init_eval() None [source]¶
- Overview:
Evaluate mode init method. Called by
self.__init__
. Init eval model with argmax strategy.
- _init_learn() None [source]¶
- Overview:
Learn mode init method. Called by
self.__init__
. Init the optimizer, algorithm config, main and target models.
- _process_transition(obs: Any, model_output: dict, timestep: collections.namedtuple) dict [source]¶
- Overview:
Generate dict type transition data from inputs.
- Arguments:
obs (
Any
): Env observationmodel_output (
dict
): Output of collect model, including at least [‘action’]- timestep (
namedtuple
): Output after env step, including at least [‘obs’, ‘reward’, ‘done’] (here ‘obs’ indicates obs after env step).
- timestep (
- Returns:
transition (
dict
): Dict type transition data.