A2C

A2CPolicy

class ding.policy.a2c.A2CPolicy(cfg: dict, model: Optional[Union[type, torch.nn.modules.module.Module]] = None, enable_field: Optional[List[str]] = None)[source]
Overview:

Policy class of A2C algorithm.

_forward_collect(data: dict) dict[source]
Overview:

Forward function of collect mode.

Arguments:
  • data (Dict[str, Any]): Dict type data, stacked env data for predicting policy_output(action),

    values are torch.Tensor or np.ndarray or dict/list combinations, keys are env_id indicated by integer.

Returns:
  • output (Dict[int, Any]): Dict type data, including at least inferred action according to input obs.

ReturnsKeys
  • necessary: action

_forward_eval(data: dict) dict[source]
Overview:

Forward function of eval mode, similar to self._forward_collect.

Arguments:
  • data (Dict[str, Any]): Dict type data, stacked env data for predicting policy_output(action),

    values are torch.Tensor or np.ndarray or dict/list combinations, keys are env_id indicated by integer.

Returns:
  • output (Dict[int, Any]): The dict of predicting action for the interaction with env.

ReturnsKeys
  • necessary: action

_forward_learn(data: dict) Dict[str, Any][source]
Overview:

Forward and backward function of learn mode.

Arguments:
  • data (dict): Dict type data, including at least [‘obs’, ‘action’, ‘reward’, ‘next_obs’,’adv’]

Returns:
  • info_dict (Dict[str, Any]): Including current lr and loss.

_get_train_sample(data: list) Union[None, List[Any]][source]
Overview:

Get the trajectory and the n step return data, then sample from the n_step return data

Arguments:
  • data (list): The trajectory’s buffer list

Returns:
  • samples (dict): The training samples generated

_init_collect() None[source]
Overview:

Collect mode init method. Called by self.__init__. Init traj and unroll length, collect model.

_init_eval() None[source]
Overview:

Evaluate mode init method. Called by self.__init__. Init eval model with argmax strategy.

_init_learn() None[source]
Overview:

Learn mode init method. Called by self.__init__. Init the optimizer, algorithm config, main and target models.

_process_transition(obs: Any, model_output: dict, timestep: collections.namedtuple) dict[source]
Overview:

Generate dict type transition data from inputs.

Arguments:
  • obs (Any): Env observation

  • model_output (dict): Output of collect model, including at least [‘action’]

  • timestep (namedtuple): Output after env step, including at least [‘obs’, ‘reward’, ‘done’]

    (here ‘obs’ indicates obs after env step).

Returns:
  • transition (dict): Dict type transition data.