envs.env_wrappers¶
env.env_wrappers¶
Please Reference ding/ding/envs/env_wrappers/env_wrappers.py for usage
Some descriptions referred to openai atari wrappers
NoopResetEnv¶
- class ding.envs.env_wrappers.NoopResetEnv(env, noop_max=30)[source]¶
- Overview:
Sample initial states by taking random number of no-ops on reset. No-op is assumed to be action 0.
- Interface:
__init__
,reset
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.noop_max (
int
): the maximum value of no-ops to run.
- __init__(env, noop_max=30)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature.- Arguments:
env (
gym.Env
): the environment to wrap.noop_max (
int
): the maximum value of no-ops to run.
- static new_shape(obs_shape, act_shape, rew_shape)[source]¶
- Overview:
Get new shape of observation, acton, and reward; in this case unchanged
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)
Note
Shapes space such as observation space shape, could be some nested strcutures, but for some simple environments, like pong in Atari, observation space shape is a
tuple
——(4, 84, 84)
. Same case applies for thenew_shape
interface in the following wrapper classes.
MaxAndSkipEnv¶
- class ding.envs.env_wrappers.MaxAndSkipEnv(env, skip=4)[source]¶
- Overview:
Return only every skip-th frame (frameskipping) using most recent raw observations (for max pooling across time steps)
- Interface:
__init__
,step
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.skip (
int
): number of skip-th frame.
- __init__(env, skip=4)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature.- Arguments:
env (
gym.Env
): the environment to wrap.skip (
int
): number of skip-th frame.
- static new_shape(obs_shape, act_shape, rew_shape)[source]¶
- Overview:
Get new shape of observation, acton, and reward; in this case unchanged
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)
- step(action)[source]¶
- Overview:
Step the environment with the given action. Repeat action, sum reward, and max over last observations.
- Arguments:
action (
Any
): the given action to step with.
- Returns:
max_frame (
np.array
) : max over last observationstotal_reward (
Any
) : amount of reward returned after previous actiondone (
Bool
) : whether the episode has ended, in which case further step() calls will return undefined resultsinfo (
Dict
) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
WarpFrame¶
- class ding.envs.env_wrappers.WarpFrame(env)[source]¶
- Overview:
Warp frames to 84x84 as done in the Nature paper and later work.
- Interface:
__init__
,observation
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.size=84
,obs_space
,self.observation_space
- __init__(env)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature.- Arguments:
env (
gym.Env
): the environment to wrap.
- static new_shape(obs_shape, act_shape, rew_shape)[source]¶
- Overview:
Get new shape of observation, acton, and reward; in this case only observation space wrapped to (4, 84, 84); others unchanged.
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)
ScaledFloatFrame¶
- class ding.envs.env_wrappers.ScaledFloatFrame(env)[source]¶
- Overview:
Normalize observations to 0~1.
- Interface:
__init__
,observation
,new_shape
- __init__(env)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; setup the properties.- Arguments:
env (
gym.Env
): the environment to wrap.
ClipRewardEnv¶
- class ding.envs.env_wrappers.ClipRewardEnv(env)[source]¶
- Overview:
clips the reward to {+1, 0, -1} by its sign.
- Interface:
__init__
,reward
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.reward_range
- __init__(env)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; setup the properties.- Arguments:
env (
gym.Env
): the environment to wrap.
FrameStack¶
- class ding.envs.env_wrappers.FrameStack(env, n_frames)[source]¶
- Overview:
Stack n_frames last frames.
- Interface:
__init__
,reset
,step
,_get_ob
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.n_frame (
int
): the number of frames to stack.observation_space
,frames
- __init__(env, n_frames)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; setup the properties.- Arguments:
env (
gym.Env
): the environment to wrap.n_frame (
int
): the number of frames to stack.
- _get_ob()[source]¶
- Overview:
The original wrapper use LazyFrames but since we use np buffer, it has no effect
- static new_shape(obs_shape, act_shape, rew_shape)[source]¶
- Overview:
Get new shape of observation, acton, and reward; in this case unchanged.
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)
- reset()[source]¶
- Overview:
Resets the state of the environment and append new observation to frames
- Returns:
self._get_ob()
: observation
- step(action)[source]¶
- Overview:
Step the environment with the given action. Repeat action, sum reward, and max over last observations, and append new observation to frames
- Arguments:
action (
Any
): the given action to step with.
- Returns:
self._get_ob()
: observationreward (
Any
) : amount of reward returned after previous actiondone (
Bool
) : whether the episode has ended, in which case further step() calls will return undefined resultsinfo (
Dict
) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
ObsTransposeWrapper¶
- class ding.envs.env_wrappers.ObsTransposeWrapper(env)[source]¶
- Overview:
Wrapper to transpose env, usually used in atari environments
- Interface:
__init__
,observation
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.observation_space
- __init__(env)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; setup the properties.- Arguments:
env (
gym.Env
): the environment to wrap.
RunningMeanStd¶
- class ding.envs.env_wrappers.RunningMeanStd(epsilon=0.0001, shape=())[source]¶
- Overview:
Wrapper to update new variable, new mean, and new count
- Interface:
__init__
,update
,reset
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.mean
,std
,_epsilon
,_shape
,_mean
,_var
,_count
- __init__(epsilon=0.0001, shape=())[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; setup the properties.- Arguments:
env (
gym.Env
): the environment to wrap.epsilon (
Float
): the epsilon used for self for the std outputshape (:obj: np.array): the np array shape used for the expression of this wrapper on attibutes of mean and variance
- property mean: numpy.ndarray¶
- Overview:
Property
mean
gotten fromself._mean
- static new_shape(obs_shape, act_shape, rew_shape)[source]¶
- Overview:
Get new shape of observation, acton, and reward; in this case unchanged.
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)
- reset()[source]¶
- Overview:
Resets the state of the environment and reset properties:
_mean
,_var
,_count
- property std: numpy.ndarray¶
- Overview:
Property
std
calculated fromself._var
and the epsilon value ofself._epsilon
ObsNormEnv¶
- class ding.envs.env_wrappers.ObsNormEnv(env)[source]¶
- Overview:
Normalize observations according to running mean and std.
- Interface:
__init__
,step
,reset
,observation
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.data_count
,clip_range
,rms
- __init__(env)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; setup the properties according to running mean and std.- Arguments:
env (
gym.Env
): the environment to wrap.
- static new_shape(obs_shape, act_shape, rew_shape)[source]¶
- Overview:
Get new shape of observation, acton, and reward; in this case unchanged.
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)
- observation(observation)[source]¶
- Overview:
Get obeservation
- Arguments:
observation (
Any
): Original observation
- Returns:
observation (
Any
): Normalized new observation
- reset(**kwargs)[source]¶
- Overview:
Resets the state of the environment and reset properties.
- Arguments:
kwargs (
Dict
): Reset with this key argumets
- Returns:
observation (
Any
): New observation after reset
- step(action)[source]¶
- Overview:
Step the environment with the given action. Repeat action, sum reward, and update
data_count
, and also update theself.rms
property once after integrating with the inputaction
.- Arguments:
action (
Any
): the given action to step with.
- Returns:
self.observation(observation)
: normalized observation after the input action and updatedself.rms
reward (
Any
) : amount of reward returned after previous actiondone (
Bool
) : whether the episode has ended, in which case further step() calls will return undefined resultsinfo (
Dict
) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
RewardNormEnv¶
- class ding.envs.env_wrappers.RewardNormEnv(env, reward_discount)[source]¶
- Overview:
Normalize reward according to running std.
- Interface:
__init__
,step
,reward
,reset
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.cum_reward
,reward_discount
,data_count
,rms
- __init__(env, reward_discount)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; setup the properties according to running mean and std.- Arguments:
env (
gym.Env
): the environment to wrap.
- static new_shape(obs_shape, act_shape, rew_shape)[source]¶
- Overview:
Get new shape of observation, acton, and reward; in this case unchanged.
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)
- reset(**kwargs)[source]¶
- Overview:
Resets the state of the environment and reset properties (NumType ones to 0, and
self.rms
as reset rms wrapper)- Arguments:
kwargs (
Dict
): Reset with this key argumets
- reward(reward)[source]¶
- Overview:
Normalize reward if
data_count
is more than 30- Arguments:
reward(
Float
): Raw Reward
- Returns:
reward(
Float
): Normalized Reward
- step(action)[source]¶
- Overview:
Step the environment with the given action. Repeat action, sum reward, and update
data_count
, and also update theself.rms
andself.cum_reward
properties once after integrating with the inputaction
.- Arguments:
action (
Any
): the given action to step with.
- Returns:
observation : normalized observation after the input action and updated
self.rms
self.reward(reward)
: amount of reward returned after previous action (normalized) and updateself.cum_reward
done (
Bool
) : whether the episode has ended, in which case further step() calls will return undefined resultsinfo (
Dict
) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
RamWrapper¶
- class ding.envs.env_wrappers.RamWrapper(env, render=False)[source]¶
- Overview:
Wrapper ram env into image-like env
- Interface:
__init__
,reset
,step
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.n_frame (
int
): the number of frames to stack.observation_space
- __init__(env, render=False)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature;- Arguments:
env (
gym.Env
): the environment to wrap.
- static new_shape(obs_shape, act_shape, rew_shape)[source]¶
- Overview:
Get new shape of observation, acton, and reward; in this case only observation space wrapped to (128,1,1); others unchanged.
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)
- reset()[source]¶
- Overview:
Resets the state of the environment and reset properties.
- Returns:
observation (
Any
): New observation after reset and reshaped
- step(action)[source]¶
- Overview:
Step the environment with the given action. Repeat action, sum reward and reshape the observation.
- Arguments:
action (
Any
): the given action to step with.
- Returns:
obs.reshape(128, 1, 1).astype(np.float32)
: reshaped observation after step with type restriction.reward (
Any
) : amount of reward returned after previous actiondone (
Bool
) : whether the episode has ended, in which case further step() calls will return undefined resultsinfo (
Dict
) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
EpisodicLifeEnv¶
- class ding.envs.env_wrappers.EpisodicLifeEnv(env)[source]¶
- Overview:
Make end-of-life == end-of-episode, but only reset on true game over. It helps the value estimation.
- Interface:
__init__
,step
,reset
,observation
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.lives
,was_real_done
- __init__(env)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; set lives to 0 at set done.- Arguments:
env (
gym.Env
): the environment to wrap.
- static new_shape(obs_shape, act_shape, rew_shape)[source]¶
- Overview:
Get new shape of observation, acton, and reward; in this case unchanged.
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)
- reset()[source]¶
- Overview:
Calls the Gym environment reset, only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.
- Returns:
obs (
Any
): New observation after reset with no-op step to advance from terminal/lost life state in case of notself.was_real_done
.
- step(action)[source]¶
- Overview:
Step the environment with the given action. Repeat action, sum reward; set
self.was_real_done
as done, and step according to lives i.e. check current lives, make loss of life terminal, then update lives to handle bonus lives.- Arguments:
action (
Any
): the given action to step with.
- Returns:
obs (
Any
): normalized observation after the input action and updatedself.rms
reward (
Any
) : amount of reward returned after previous actiondone (
Bool
) : whether the episode has ended, in which case further step() calls will return undefined resultsinfo (
Dict
) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
FireResetEnv¶
- class ding.envs.env_wrappers.FireResetEnv(env)[source]¶
- Overview:
Take action on reset for environments that are fixed until firing. Related discussion: https://github.com/openai/baselines/issues/240
- Interface:
__init__
,reset
,new_shape
- Properties:
env (
gym.Env
): the environment to wrap.
- __init__(env)[source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature.- Arguments:
env (
gym.Env
): the environment to wrap.
update_shape¶
- Overview:
Get new shape of observation, acton, and reward given the wrapper.
- Arguments:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
), wrapper_names (Any
)- Returns:
obs_shape (
Any
), act_shape (Any
), rew_shape (Any
)