envs.env_manager¶

base_env_manager¶

BaseEnvManager¶

class ding.envs.env_manager.base_env_manager.BaseEnvManager(env_fn: List[Callable], cfg: easydict.EasyDict = {})[source]¶

Overview:: Create a BaseEnvManager to manage multiple environments.
Interfaces:: reset, step, seed, close, enable_save_replay, launch, env_info, default_config
Properties:: env_num, ready_obs, done, method_name_list，active_env

close() → None[source]¶

Overview:: Release the environment resources.

enable_save_replay(replay_path: Union[List[str], str]) → None[source]¶

Overview:

Set each env’s replay save path.

Arguments:

replay_path (Union[List[str], str]): List of paths for each environment; Or one path for all environments.

env_info() → collections.namedtuple[source]¶

Overview:

Get one env’s info, for example, action space, observation space, reward space, etc.

Returnns:

info (namedtuple): Usually a namedtuple BaseEnvInfo, each element is EnvElementInfo.

launch(reset_param: Optional[Dict] = None) → None[source]¶

Overview:

Set up the environments and their parameters.

Arguments:

reset_param (Optional[Dict]): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

property ready_obs: Dict[int, Any]¶

Overview:

Get the next observations(in np.ndarray type) and corresponding env id.

Return:

A dictionary with observations and their environment IDs.

Example:

>>>     obs_dict = env_manager.ready_obs
>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}

reset(reset_param: Optional[Dict] = None) → None[source]¶

Overview:

Reset the environments their parameters.

Arguments:

reset_param (List): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

seed(seed: Union[Dict[int, int], List[int], int], dynamic_seed: Optional[bool] = None) → None[source]¶

Overview:

Set the seed for each environment.

Arguments:

seed (Union[Dict[int, int], List[int], int]): List of seeds for each environment; Or one seed for the first environment and other seeds are generated automatically.

step(actions: Dict[int, Any]) → Dict[int, collections.namedtuple][source]¶

Overview:

Step all environments. Reset an env if done.

Arguments:

actions (Dict[int, Any]): {env_id: action}

Returns:

timesteps (Dict[int, namedtuple]): {env_id: timestep}. Timestep is a BaseEnvTimestep tuple with observation, reward, done, env_info.

Example:

>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}
>>>     timesteps = env_manager.step(actions_dict):
>>>     for env_id, timestep in timesteps.items():
>>>         pass

create_env_manager¶

ding.envs.env_manager.base_env_manager.create_env_manager(manager_cfg: dict, env_fn: List[Callable]) → ding.envs.env_manager.base_env_manager.BaseEnvManager[source]¶

Overview:

Create an env manager according to manager cfg and env function.

Arguments:

manager_cfg (EasyDict): Env manager config.
env_fn (:obj:` List[Callable]`): A list of envs’ functions.

ArgumentsKeys:

manager_cfg’s necessary: type

get_env_manager_cls¶

ding.envs.env_manager.base_env_manager.get_env_manager_cls(cfg: easydict.EasyDict) → type[source]¶

Overview:

Get an env manager class according to cfg.

Arguments:

cfg (EasyDict): Env manager config.

ArgumentsKeys:

necessary: type

subprocess_env_manager¶

ShmBuffer¶

class ding.envs.env_manager.subprocess_env_manager.ShmBuffer(dtype: numpy.generic, shape: Tuple[int])[source]¶

Overview:: Shared memory buffer to store numpy array.

fill(src_arr: numpy.ndarray) → None[source]¶

Overview:

Fill the shared memory buffer with a numpy array. (Replace the original one.)

Arguments:

src_arr (np.ndarray): array to fill the buffer.

get() → numpy.ndarray[source]¶

Overview:

Get the array stored in the buffer.

Return:

copy_data (np.ndarray): A copy of the data stored in the buffer.

ShmBufferContainer¶

class ding.envs.env_manager.subprocess_env_manager.ShmBufferContainer(dtype: numpy.generic, shape: Union[Dict[Any, tuple], tuple])[source]¶

Overview:: Support multiple shared memory buffers. Each key-value is name-buffer.

fill(src_arr: Union[Dict[Any, numpy.ndarray], numpy.ndarray]) → None[source]¶

Overview:

Fill the one or many shared memory buffer.

Arguments:

src_arr (Union[Dict[Any, np.ndarray], np.ndarray]): array to fill the buffer.

get() → Union[Dict[Any, numpy.ndarray], numpy.ndarray][source]¶

Overview:

Get the one or many arrays stored in the buffer.

Return:

data (np.ndarray): The array(s) stored in the buffer.

SyncSubprocessEnvManager¶

class ding.envs.env_manager.subprocess_env_manager.SyncSubprocessEnvManager(env_fn: List[Callable], cfg: easydict.EasyDict = {})[source]¶

close() → None¶

Overview:: CLose the env manager and release all related resources.

enable_save_replay(replay_path: Union[List[str], str]) → None¶

Overview:

Set each env’s replay save path.

Arguments:

replay_path (Union[List[str], str]): List of paths for each environment; Or one path for all environments.

env_info() → collections.namedtuple¶

Overview:

Get one env’s info, for example, action space, observation space, reward space, etc.

Returnns:

info (namedtuple): Usually a namedtuple BaseEnvInfo, each element is EnvElementInfo.

launch(reset_param: Optional[Dict] = None) → None¶

Overview:

Set up the environments and their parameters.

Arguments:

reset_param (Optional[Dict]): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

property ready_obs: Dict[int, Any]¶

Overview:

Get the next observations.

Return:

A dictionary with observations and their environment IDs.

Note:

The observations are returned in np.ndarray.

Example:

>>>     obs_dict = env_manager.ready_obs
>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}

reset(reset_param: Optional[Dict] = None) → None¶

Overview:

Reset the environments their parameters.

Arguments:

reset_param (List): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

seed(seed: Union[Dict[int, int], List[int], int], dynamic_seed: Optional[bool] = None) → None¶

Overview:

Set the seed for each environment.

Arguments:

seed (Union[Dict[int, int], List[int], int]): List of seeds for each environment; Or one seed for the first environment and other seeds are generated automatically.

step(actions: Dict[int, Any]) → Dict[int, collections.namedtuple][source]¶

Overview:

Step all environments. Reset an env if done.

Arguments:

actions (Dict[int, Any]): {env_id: action}

Returns:

timesteps (Dict[int, namedtuple]): {env_id: timestep}. Timestep is a BaseEnvTimestep tuple with observation, reward, done, env_info.

Example:

>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}
>>>     timesteps = env_manager.step(actions_dict):
>>>     for env_id, timestep in timesteps.items():
>>>         pass

Note

The env_id that appears in actions will also be returned in timesteps.
Each environment is run by a subprocess separately. Once an environment is done, it is reset immediately.

AsyncSubprocessEnvManager¶

class ding.envs.env_manager.subprocess_env_manager.AsyncSubprocessEnvManager(env_fn: List[Callable], cfg: easydict.EasyDict = {})[source]¶

Overview:: Create an AsyncSubprocessEnvManager to manage multiple environments. Each Environment is run by a respective subprocess.
Interfaces:: seed, launch, ready_obs, step, reset, env_info，active_env

close() → None[source]¶

Overview:: CLose the env manager and release all related resources.

enable_save_replay(replay_path: Union[List[str], str]) → None[source]¶

Overview:

Set each env’s replay save path.

Arguments:

replay_path (Union[List[str], str]): List of paths for each environment; Or one path for all environments.

env_info() → collections.namedtuple¶

Overview:

Get one env’s info, for example, action space, observation space, reward space, etc.

Returnns:

info (namedtuple): Usually a namedtuple BaseEnvInfo, each element is EnvElementInfo.

launch(reset_param: Optional[Dict] = None) → None[source]¶

Overview:

Set up the environments and their parameters.

Arguments:

reset_param (Optional[Dict]): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

property ready_obs: Dict[int, Any]¶

Overview:

Get the next observations.

Return:

A dictionary with observations and their environment IDs.

Note:

The observations are returned in np.ndarray.

Example:

>>>     obs_dict = env_manager.ready_obs
>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}

reset(reset_param: Optional[Dict] = None) → None[source]¶

Overview:

Reset the environments their parameters.

Arguments:

reset_param (List): Dict of reset parameters for each environment, key is the env_id, value is the cooresponding reset parameters.

seed(seed: Union[Dict[int, int], List[int], int], dynamic_seed: Optional[bool] = None) → None¶

Overview:

Set the seed for each environment.

Arguments:

seed (Union[Dict[int, int], List[int], int]): List of seeds for each environment; Or one seed for the first environment and other seeds are generated automatically.

step(actions: Dict[int, Any]) → Dict[int, collections.namedtuple][source]¶

Overview:

Step all environments. Reset an env if done.

Arguments:

actions (Dict[int, Any]): {env_id: action}

Returns:

timesteps (Dict[int, namedtuple]): {env_id: timestep}. Timestep is a BaseEnvTimestep tuple with observation, reward, done, env_info.

Example:

>>>     actions_dict = {env_id: model.forward(obs) for env_id, obs in obs_dict.items())}
>>>     timesteps = env_manager.step(actions_dict):
>>>     for env_id, timestep in timesteps.items():
>>>         pass