envs.env_manager

base_env_manager

BaseEnvManager

class ding.envs.env_manager.base_env_manager.BaseEnvManager(env_fn: List[Callable], cfg: easydict.EasyDict = {})[source]
Overview:

Create a BaseEnvManager to manage multiple environments.

Interfaces:

reset, step, seed, close, enable_save_replay, launch, env_info, default_config

Properties:

env_num, ready_obs, done, method_name_list,active_env

close() None[source]
Overview:

Release the environment resources.

enable_save_replay(replay_path: Union[List[str], str]) None[source]
Overview:

Set each env’s replay save path.

Arguments:
  • replay_path (Union[List[str], str]): List of paths for each environment; Or one path for all environments.

env_info() collections.namedtuple[source]
Overview:

Get one env’s info, for example, action space, observation space, reward space, etc.

Returnns:
  • info (namedtuple): Usually a namedtuple BaseEnvInfo, each element is EnvElementInfo.

launch(reset_param: Optional[Dict] = None) None[source]
Overview:

Set up the environments and their parameters.

Arguments:
  • reset_param (Optional[Dict]): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

property ready_obs: Dict[int, Any]
Overview:

Get the next observations(in np.ndarray type) and corresponding env id.

Return:

A dictionary with observations and their environment IDs.

Example:
>>>     obs_dict = env_manager.ready_obs
>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}
reset(reset_param: Optional[Dict] = None) None[source]
Overview:

Reset the environments their parameters.

Arguments:
  • reset_param (List): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

seed(seed: Union[Dict[int, int], List[int], int], dynamic_seed: Optional[bool] = None) None[source]
Overview:

Set the seed for each environment.

Arguments:
  • seed (Union[Dict[int, int], List[int], int]): List of seeds for each environment; Or one seed for the first environment and other seeds are generated automatically.

step(actions: Dict[int, Any]) Dict[int, collections.namedtuple][source]
Overview:

Step all environments. Reset an env if done.

Arguments:
  • actions (Dict[int, Any]): {env_id: action}

Returns:
  • timesteps (Dict[int, namedtuple]): {env_id: timestep}. Timestep is a BaseEnvTimestep tuple with observation, reward, done, env_info.

Example:
>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}
>>>     timesteps = env_manager.step(actions_dict):
>>>     for env_id, timestep in timesteps.items():
>>>         pass

create_env_manager

ding.envs.env_manager.base_env_manager.create_env_manager(manager_cfg: dict, env_fn: List[Callable]) ding.envs.env_manager.base_env_manager.BaseEnvManager[source]
Overview:

Create an env manager according to manager cfg and env function.

Arguments:
  • manager_cfg (EasyDict): Env manager config.

  • env_fn (:obj:` List[Callable]`): A list of envs’ functions.

ArgumentsKeys:
  • manager_cfg’s necessary: type

get_env_manager_cls

ding.envs.env_manager.base_env_manager.get_env_manager_cls(cfg: easydict.EasyDict) type[source]
Overview:

Get an env manager class according to cfg.

Arguments:
  • cfg (EasyDict): Env manager config.

ArgumentsKeys:
  • necessary: type

subprocess_env_manager

ShmBuffer

class ding.envs.env_manager.subprocess_env_manager.ShmBuffer(dtype: numpy.generic, shape: Tuple[int])[source]
Overview:

Shared memory buffer to store numpy array.

fill(src_arr: numpy.ndarray) None[source]
Overview:

Fill the shared memory buffer with a numpy array. (Replace the original one.)

Arguments:
  • src_arr (np.ndarray): array to fill the buffer.

get() numpy.ndarray[source]
Overview:

Get the array stored in the buffer.

Return:
  • copy_data (np.ndarray): A copy of the data stored in the buffer.

ShmBufferContainer

class ding.envs.env_manager.subprocess_env_manager.ShmBufferContainer(dtype: numpy.generic, shape: Union[Dict[Any, tuple], tuple])[source]
Overview:

Support multiple shared memory buffers. Each key-value is name-buffer.

fill(src_arr: Union[Dict[Any, numpy.ndarray], numpy.ndarray]) None[source]
Overview:

Fill the one or many shared memory buffer.

Arguments:
  • src_arr (Union[Dict[Any, np.ndarray], np.ndarray]): array to fill the buffer.

get() Union[Dict[Any, numpy.ndarray], numpy.ndarray][source]
Overview:

Get the one or many arrays stored in the buffer.

Return:
  • data (np.ndarray): The array(s) stored in the buffer.

SyncSubprocessEnvManager

class ding.envs.env_manager.subprocess_env_manager.SyncSubprocessEnvManager(env_fn: List[Callable], cfg: easydict.EasyDict = {})[source]
close() None
Overview:

CLose the env manager and release all related resources.

enable_save_replay(replay_path: Union[List[str], str]) None
Overview:

Set each env’s replay save path.

Arguments:
  • replay_path (Union[List[str], str]): List of paths for each environment; Or one path for all environments.

env_info() collections.namedtuple
Overview:

Get one env’s info, for example, action space, observation space, reward space, etc.

Returnns:
  • info (namedtuple): Usually a namedtuple BaseEnvInfo, each element is EnvElementInfo.

launch(reset_param: Optional[Dict] = None) None
Overview:

Set up the environments and their parameters.

Arguments:
  • reset_param (Optional[Dict]): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

property ready_obs: Dict[int, Any]
Overview:

Get the next observations.

Return:

A dictionary with observations and their environment IDs.

Note:

The observations are returned in np.ndarray.

Example:
>>>     obs_dict = env_manager.ready_obs
>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}
reset(reset_param: Optional[Dict] = None) None
Overview:

Reset the environments their parameters.

Arguments:
  • reset_param (List): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

seed(seed: Union[Dict[int, int], List[int], int], dynamic_seed: Optional[bool] = None) None
Overview:

Set the seed for each environment.

Arguments:
  • seed (Union[Dict[int, int], List[int], int]): List of seeds for each environment; Or one seed for the first environment and other seeds are generated automatically.

step(actions: Dict[int, Any]) Dict[int, collections.namedtuple][source]
Overview:

Step all environments. Reset an env if done.

Arguments:
  • actions (Dict[int, Any]): {env_id: action}

Returns:
  • timesteps (Dict[int, namedtuple]): {env_id: timestep}. Timestep is a BaseEnvTimestep tuple with observation, reward, done, env_info.

Example:
>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}
>>>     timesteps = env_manager.step(actions_dict):
>>>     for env_id, timestep in timesteps.items():
>>>         pass

Note

  • The env_id that appears in actions will also be returned in timesteps.

  • Each environment is run by a subprocess separately. Once an environment is done, it is reset immediately.

AsyncSubprocessEnvManager

class ding.envs.env_manager.subprocess_env_manager.AsyncSubprocessEnvManager(env_fn: List[Callable], cfg: easydict.EasyDict = {})[source]
Overview:

Create an AsyncSubprocessEnvManager to manage multiple environments. Each Environment is run by a respective subprocess.

Interfaces:

seed, launch, ready_obs, step, reset, env_info,active_env

close() None[source]
Overview:

CLose the env manager and release all related resources.

enable_save_replay(replay_path: Union[List[str], str]) None[source]
Overview:

Set each env’s replay save path.

Arguments:
  • replay_path (Union[List[str], str]): List of paths for each environment; Or one path for all environments.

env_info() collections.namedtuple
Overview:

Get one env’s info, for example, action space, observation space, reward space, etc.

Returnns:
  • info (namedtuple): Usually a namedtuple BaseEnvInfo, each element is EnvElementInfo.

launch(reset_param: Optional[Dict] = None) None[source]
Overview:

Set up the environments and their parameters.

Arguments:
  • reset_param (Optional[Dict]): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

property ready_obs: Dict[int, Any]
Overview:

Get the next observations.

Return:

A dictionary with observations and their environment IDs.

Note:

The observations are returned in np.ndarray.

Example:
>>>     obs_dict = env_manager.ready_obs
>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}
reset(reset_param: Optional[Dict] = None) None[source]
Overview:

Reset the environments their parameters.

Arguments:
  • reset_param (List): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

seed(seed: Union[Dict[int, int], List[int], int], dynamic_seed: Optional[bool] = None) None
Overview:

Set the seed for each environment.

Arguments:
  • seed (Union[Dict[int, int], List[int], int]): List of seeds for each environment; Or one seed for the first environment and other seeds are generated automatically.

step(actions: Dict[int, Any]) Dict[int, collections.namedtuple][source]
Overview:

Step all environments. Reset an env if done.

Arguments:
  • actions (Dict[int, Any]): {env_id: action}

Returns:
  • timesteps (Dict[int, namedtuple]): {env_id: timestep}. Timestep is a BaseEnvTimestep tuple with observation, reward, done, env_info.

Example:
>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}
>>>     timesteps = env_manager.step(actions_dict):
>>>     for env_id, timestep in timesteps.items():
>>>         pass