Env Manager Overview¶
Env Manager¶
- Overview:
env manager is an environment manager that can manage multiple environments of the same type with different configurations. The env manager can run multiple envs at the same time, obtain information in the environment at the same time, and provide an interface similar to env, which can greatly simplify the code and speed up the operation. The currently supported types are single-process serial and multi-process parallel modes. BaseEnvManager maintains multiple environment instances through cyclic serial (pseudo-parallel), and Async(Sync)SubprocessEnvManager uses subprocess vectorization, that is, call multiprocessing, by running env in a child process, manages and runs the environment by means of inter-process communication. DI-engine’s env manager needs to use the env definition in DI-engine format (or Gym env decorated by EnvWrapper), It needs to provide the instantiation interface of each env when it is initialized, and set the specific operation details through config.
Generally speaking,
BaseEnvManager
is used to run in some simple environments or to debug, and it is recommended to runSyncSubProcessEnvManager
andAsyncSubProcessEnvManager
in complex environments or a large number of environments for acceleration.If you don’t know enough about the env module yet, it is recommended to consult DI-engine’s Env Overview
- Usage:
- init
The initialization of the env manager needs to pass in the instantiation call interface of each env and the config dictionary. The lambda function or partial function
functools.partial
can be used to wrap the instantiation function of env and specify its operating parameters.config = dict( env=dict( manager=dict(...), ... ), ... ) # lambda function way env_fn = lambda : DI-engineEnv(*args, **kwargs) env_manager = BaseEnvManager(env_fn=[env_fn for _ in range(4)], cfg=config.env.manager) # partial function way from functools import partial def env_fn(*args, **kwargs): return DI-engineEnv(*args, **kwargs) env_manager = BaseEnvManager(env_fn=[partial(env_fn, *args, **kwargs) for _ in range(4)], cfg=config.env.manager)
- launch/reset
After the env manager’s initialization, each environment will not be instantiated immediately. At this time, the env manager will be marked as a closed state. To initialize the environment for the first time, you need to call the
launch
method, which will construct each env instance according to the incoming env instantiation call interface (for SubprocessEnvManager, it is to run the subprocess of each environment and establish a communication channel), construct Some of the state variables of the environment are running, and thereset
method of each sub-environment is called at the same time to run the environment.Warning
Calling the
step
andreset
methods of env_manager in the closed state will cause an exception.After calling the
launch
method, you can manually reset the sub-environment by calling thereset
method of the env manager. When no parameters are passed in, all sub-environments will be reset by default. When thereset_param
parameter is passed in, the sub-environment corresponding to the key inreset_param
will be reset, and its key value will be used as the parameter of the sub-environmentreset
method. Due to the uncertainty of the time required for each sub-environment reset, the env manager will not return the corresponding observation after the step of the sub-environment ends. Instead, it will save the return value at the end of the reset and obtain the current value by calling theready_obs
property. Run the observation of the sub-environment that completes the step or reset method, which can speed up the operating efficiency of the SubprocessEnvManager.Note
When SubprocessEnvManager needs to reset the sub-environments that are being reset, this method will wait for the last reset of these sub-environments to complete before running this reset.
- step
The step method will serially (BaseEnvManager) or parallel (SubprocessEnvManager) call the step method of the sub-environment of the env manager, and return the result of the step, and store the observation in the
ready_obs
attribute. The parameter passed in this method is anactions
dictionary, the key of which specifies the env_id that needs to run thestep
, and the key value is the action to be run by thestep
of the sub-environment. According to different env manager types and config settings, when a certain number of sub-environments return step results, this method will check the running results, modify the running status of the sub-environments based on these results, and return the result or throw an exception.Warning
When
actions
contains the sub-environment id that is running other commands or has completed the episode, it will throw an exception.
- ready_obs
The
ready_obs
attribute returns a dictionary containing the env_id of the environment and the key-value pair of the latest observation returned. For SubprocessEnvManager, the environment id returned by theready_obs
attribute must be a sub-environment that has completed the reset or step method and is waiting for a new command. Therefore, it is safe to continue to call thereset
andstep
of these sub-environments.Method. When all sub-environments that are still running (not running to done) have not completed the ``reset
andstep
methods, calling theready_obs
property will wait for at least one sub-environment to finish running, and Return its observation.When using SubprocessEnvManager, as long as the env_id passed to the step and reset methods is the env_id returned by the ready_obs property, there will be no repeated commands for the sub-environment.
- done
This attribute will judge the completion of all sub-environments (whether it runs to done), if it is, it returns
True
, otherwise it returnsFalse
.
- close
Like Gym env’s
close
method, this method will safely close all sub-environments, destroy the processes created by the sub-environments, and release all resources. After calling this method, the env manager will be marked asclosed
, unless it islaunch
again to continue using it.
- Examples:
The following is an example of an env manager running multiple environments.
my_env_manager.launch() while not finished: obs = my_env_manager.ready_obs actions = ... # get actions from policy or else. timesteps = my_env_manager.step() for env_id, timestep in timesteps.item(): if timestep.done: # without auto_reset my_env_manager.reset(reset_param={env_id: ...}) ... my_env_manager.close()
- Advanced features:
- auto_reset
The env manager of DI-engine will automatically reset by default, that is, when an environment runs to done, it will automatically reset to continue running. The parameters of reset are the parameters set for the sub-environment during the last manual reset, unless the number of episodes run is accumulated Reach the episode_num specified in config. To turn off this feature, you can specify
auto_reset=False
in config
- env state
In order to facilitate the management of the status of each sub-environment and facilitate debugging, the env manager of DI-engine provides an enumerated type of environment status to grasp the running status of all sub-environments in real time. The specific meaning is as follows:
VOID: The env manager has been initialized, but the sub-environment has not yet been instantiated
INIT: The sub-environment has been instantiated and has not yet been launched or reset
RUN: sub-environment reset or step completed, running in progress
RESET: sub-environment resetting
DONE: sub-environment running to done
ERROR: The sub-environment has an exception occurred
The conversion between each state is as shown in the figure:
- max_retry 和 timeout
In order to prevent some sub-environments from reporting errors temporarily due to connection problems, or the program will not exit normally when the sub-processes are stuck, the env manager of DI-engine has added retry protection and timeout detection mechanisms. The user can specify the maximum number of retry and the maximum waiting time for communication between reset, step and sub-processes in config. When the waiting time is exceeded, an exception will be thrown in order to terminate the operation early. The settings and default values of these parameters in config are as follows:
manager_config = dict( max_retry=1, # max retry times for step and reset, default to 1 reset_timeout=60, # max waiting time for reset, default to 60s retry_waiting_time=0.1, # retry interval time for reset, default to 0.1s step_timeout=60, # max waiting time for rstep, default to 60s step_wait_timeout=0.01, # retry interval time for step, default to 0.1s connect_timeout=60, # max waiting time for communication between child processes, default to 60s )
- difference between Sync ans Async SubprocessEnvManager
Pending
- shared_memory
shared_memory can speed up the transfer of large vector data returned by the environment. When the size of variables such as obs returned by the environment exceeds 100kB, it is recommended to set it to True. When using shared_memory, you need to use BaseEnvInfo and EnvElementInfo template in the environment info function to specify the dtype corresponding to the shape and value of obs, act, and rew.
- get_attribute
Pending
BaseEnvManager (ding/envs/env_manager/base_env_manager.py)¶
- Overview:
Run multiple environment managers in a cyclic serial manner.
- Interfaces:
__init__: Initialization
launch: Initialize all sub-environments and resources required for state management of sub-environments
reset: Reset all environments by default. When reset_param passed in, the sub-environment specified by env_id will be reset. It returns all running results
step: Executes the input action and run a time step. Like reset, you can pass an action dict to operate on certain environments. It returns all running results
seed: Set the environment random seed, you can pass an env_id list to set specific seeds for certain environments
close: Close all environments, release resources
- Properties:
env_num: The number of sub-environments
active_env: List of all unfinished environments
ready_obs: Return all the env_id that are not running with the latest observation
done: Whether all the environments have been completed
SubprocessEnvManager (ding/envs/env_manager/subprocess_env_manager.py)¶
- Overview:
Inherit BaseEnvManager, create subprocess for each environment using multiprocessing to run multiple environments in paralle.
- Interfaces:
Only the methods that are different or new from BaseEnvManager are listed below
launch: Initialize the process of running each sub-environment, and initialize the resources required for state management of the sub-environment
reset: Send reset command to environmental processes. When reset_param passed in, the reset command is sent to the subprocess specified by env_id. It returns after sending.
step: Send action commands to environmental processes. Like reset, you can pass an action dict to operate on certain environments. It returns all running results.
close: Destroy all sub-process, release resources
- Properties:
Only the attributes that are different or new from BaseEnvManager are listed below
ready_obs: Return all the env_id that finish running step and reset with the latest observation. If all environments are running previous command, wait until at least one finish running
active_env: List of all running environments