template.VAC¶

Please Reference ding/model/template/vac.py for usage

VAC¶

class ding.model.template.VAC(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], share_encoder: bool = True, continuous: bool = False, encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], actor_head_hidden_size: int = 64, actor_head_layer_num: int = 1, critic_head_hidden_size: int = 64, critic_head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, sigma_type: Optional[str] = 'independent', bound_type: Optional[str] = None)[source]¶

Overview:: The VAC model.
Interfaces:: __init__, forward, compute_actor, compute_critic

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], share_encoder: bool = True, continuous: bool = False, encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], actor_head_hidden_size: int = 64, actor_head_layer_num: int = 1, critic_head_hidden_size: int = 64, critic_head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, sigma_type: Optional[str] = 'independent', bound_type: Optional[str] = None) → None[source]¶

Overview:

Init the VAC Model according to arguments.

Arguments:

obs_shape (Union[int, SequenceType]): Observation’s space.
action_shape (Union[int, SequenceType]): Action’s space.
share_encoder (bool): Whether share encoder.
continuous (bool): Whether collect continuously.
encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder
actor_head_hidden_size (Optional[int]): The hidden_size to pass to actor-nn’s Head.
actor_head_layer_num (int):
The num of layers used in the network to compute Q value output for actor’s nn.
critic_head_hidden_size (Optional[int]): The hidden_size to pass to critic-nn’s Head.
critic_head_layer_num (int):
The num of layers used in the network to compute Q value output for critic’s nn.
activation (Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
norm_type (Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details`

compute_actor(x: torch.Tensor) → Dict[source]¶

Overview:

Execute parameter updates with 'compute_actor' mode Use encoded embedding tensor to predict output.

Arguments:

inputs (torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size). hidden_size = actor_head_hidden_size

Returns:

outputs (Dict):
Run with encoder and head.

ReturnsKeys:

logit (torch.Tensor): Logit encoding tensor, with same size as input x.

Shapes:

logit (torch.FloatTensor): \((B, N)\), where B is batch size and N is action_shape

Examples:

>>> model = VAC(64,64)
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['action'].shape == torch.Size([4, 64])

compute_actor_critic(x: torch.Tensor) → Dict[source]¶

Overview:

Execute parameter updates with 'compute_actor_critic' mode Use encoded embedding tensor to predict output.

Arguments:

inputs (torch.Tensor): The encoded embedding tensor.

Returns:

outputs (Dict):
Run with encoder and head.

ReturnsKeys:

logit (torch.Tensor): Logit encoding tensor, with same size as input x.
value (torch.Tensor): Q value tensor with same size as batch size.

Shapes:

logit (torch.FloatTensor): \((B, N)\), where B is batch size and N is action_shape
value (torch.FloatTensor): \((B, )\), where B is batch size.

Examples:

>>> model = VAC(64,64)
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs,'compute_actor_critic')
>>> outputs['value']
tensor([0.0252, 0.0235, 0.0201, 0.0072], grad_fn=<SqueezeBackward1>)
>>> assert outputs['logit'].shape == torch.Size([4, 64])

Note

compute_actor_critic interface aims to save computation when shares encoder. Returning the combination dictionry.

compute_critic(x: torch.Tensor) → Dict[source]¶

Overview:

Execute parameter updates with 'compute_critic' mode Use encoded embedding tensor to predict output.

Arguments:

inputs (torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size). hidden_size = critic_head_hidden_size

Returns:

outputs (Dict):
Run with encoder and head.
Necessary Keys:
value (torch.Tensor): Q value tensor with same size as batch size.

Shapes:

value (torch.FloatTensor): \((B, )\), where B is batch size.

Examples:

>>> model = VAC(64,64)
>>> inputs = torch.randn(4, 64)
>>> critic_outputs = model(inputs,'compute_critic')
>>> critic_outputs['value']
tensor([0.0252, 0.0235, 0.0201, 0.0072], grad_fn=<SqueezeBackward1>)

forward(inputs: Union[torch.Tensor, Dict], mode: str) → Dict[source]¶

Overview:

Use encoded embedding tensor to predict output. Parameter updates with VAC’s MLPs forward setup.

Arguments:

Forward with 'compute_actor' or 'compute_critic':

inputs (torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size). Whether actor_head_hidden_size or critic_head_hidden_size depend on mode.

Returns:

outputs (Dict):
Run with encoder and head.
Forward with 'compute_actor', Necessary Keys:
logit (torch.Tensor): Logit encoding tensor, with same size as input x.
Forward with 'compute_critic', Necessary Keys:
value (torch.Tensor): Q value tensor with same size as batch size.

Shapes:

inputs (torch.Tensor): \((B, N)\), where B is batch size and N corresponding hidden_size
logit (torch.FloatTensor): \((B, N)\), where B is batch size and N is action_shape
value (torch.FloatTensor): \((B, )\), where B is batch size.

Actor Examples:

>>> model = VAC(64,128)
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['logit'].shape == torch.Size([4, 128])

Critic Examples:

>>> model = VAC(64,64)
>>> inputs = torch.randn(4, 64)
>>> critic_outputs = model(inputs,'compute_critic')
>>> critic_outputs['value']
tensor([0.0252, 0.0235, 0.0201, 0.0072], grad_fn=<SqueezeBackward1>)

Actor-Critic Examples:

>>> model = VAC(64,64)
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs,'compute_actor_critic')
>>> outputs['value']
tensor([0.0252, 0.0235, 0.0201, 0.0072], grad_fn=<SqueezeBackward1>)
>>> assert outputs['logit'].shape == torch.Size([4, 64])