envs.env_wrappers

env.env_wrappers

Please Reference ding/ding/envs/env_wrappers/env_wrappers.py for usage

Some descriptions referred to openai atari wrappers

NoopResetEnv

class ding.envs.env_wrappers.NoopResetEnv(env, noop_max=30)[source]
Overview:

Sample initial states by taking random number of no-ops on reset. No-op is assumed to be action 0.

Interface:

__init__, reset, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • noop_max (int): the maximum value of no-ops to run.

__init__(env, noop_max=30)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature.

Arguments:
  • env (gym.Env): the environment to wrap.

  • noop_max (int): the maximum value of no-ops to run.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Note

Shapes space such as observation space shape, could be some nested strcutures, but for some simple environments, like pong in Atari, observation space shape is a tuple——(4, 84, 84). Same case applies for the new_shape interface in the following wrapper classes.

reset()[source]
Overview:

Resets the state of the environment and returns an initial observation.

Returns:
  • observation (Any): the initial observation.

MaxAndSkipEnv

class ding.envs.env_wrappers.MaxAndSkipEnv(env, skip=4)[source]
Overview:

Return only every skip-th frame (frameskipping) using most recent raw observations (for max pooling across time steps)

Interface:

__init__, step, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • skip (int): number of skip-th frame.

__init__(env, skip=4)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature.

Arguments:
  • env (gym.Env): the environment to wrap.

  • skip (int): number of skip-th frame.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

step(action)[source]
Overview:

Step the environment with the given action. Repeat action, sum reward, and max over last observations.

Arguments:
  • action (Any): the given action to step with.

Returns:
  • max_frame (np.array) : max over last observations

  • total_reward (Any) : amount of reward returned after previous action

  • done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results

  • info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

WarpFrame

class ding.envs.env_wrappers.WarpFrame(env)[source]
Overview:

Warp frames to 84x84 as done in the Nature paper and later work.

Interface:

__init__, observation, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • size=84, obs_space, self.observation_space

__init__(env)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature.

Arguments:
  • env (gym.Env): the environment to wrap.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case only observation space wrapped to (4, 84, 84); others unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

observation(frame)[source]
Overview:

Returns the current observation from a frame

Arguments:
  • frame (Any): the frame to get observation from

Returns:
  • observation (Any): Framed observation

ScaledFloatFrame

class ding.envs.env_wrappers.ScaledFloatFrame(env)[source]
Overview:

Normalize observations to 0~1.

Interface:

__init__, observation, new_shape

__init__(env)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; setup the properties.

Arguments:
  • env (gym.Env): the environment to wrap.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

observation(observation)[source]
Overview:

Returns the scaled observation

Arguments:
  • observation(Float): The original observation

Returns:
  • observation (Float): The Scaled Float observation

ClipRewardEnv

class ding.envs.env_wrappers.ClipRewardEnv(env)[source]
Overview:

clips the reward to {+1, 0, -1} by its sign.

Interface:

__init__, reward, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • reward_range

__init__(env)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; setup the properties.

Arguments:
  • env (gym.Env): the environment to wrap.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

reward(reward)[source]
Overview:

Bin reward to {+1, 0, -1} by its sign. Note: np.sign(0) == 0.

Arguments:
  • reward(Float): Raw Reward

Returns:
  • reward(Float): Clipped Reward

FrameStack

class ding.envs.env_wrappers.FrameStack(env, n_frames)[source]
Overview:

Stack n_frames last frames.

Interface:

__init__, reset, step, _get_ob, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • n_frame (int): the number of frames to stack.

  • observation_space, frames

__init__(env, n_frames)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; setup the properties.

Arguments:
  • env (gym.Env): the environment to wrap.

  • n_frame (int): the number of frames to stack.

_get_ob()[source]
Overview:

The original wrapper use LazyFrames but since we use np buffer, it has no effect

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

reset()[source]
Overview:

Resets the state of the environment and append new observation to frames

Returns:
  • self._get_ob(): observation

step(action)[source]
Overview:

Step the environment with the given action. Repeat action, sum reward, and max over last observations, and append new observation to frames

Arguments:
  • action (Any): the given action to step with.

Returns:
  • self._get_ob() : observation

  • reward (Any) : amount of reward returned after previous action

  • done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results

  • info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

ObsTransposeWrapper

class ding.envs.env_wrappers.ObsTransposeWrapper(env)[source]
Overview:

Wrapper to transpose env, usually used in atari environments

Interface:

__init__, observation, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • observation_space

__init__(env)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; setup the properties.

Arguments:
  • env (gym.Env): the environment to wrap.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case with obs_space transposed and others unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

observation(obs: Union[tuple, numpy.ndarray])[source]
Overview:

Returns the transposed observation

Arguments:
  • observation(Union[tuple, np.ndarray]): The original observation

Returns:
  • observation (Union[tuple, np.ndarray]): The transposed observation

RunningMeanStd

class ding.envs.env_wrappers.RunningMeanStd(epsilon=0.0001, shape=())[source]
Overview:

Wrapper to update new variable, new mean, and new count

Interface:

__init__, update, reset, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • mean, std, _epsilon, _shape, _mean, _var, _count

__init__(epsilon=0.0001, shape=())[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; setup the properties.

Arguments:
  • env (gym.Env): the environment to wrap.

  • epsilon (Float): the epsilon used for self for the std output

  • shape (:obj: np.array): the np array shape used for the expression of this wrapper on attibutes of mean and variance

property mean: numpy.ndarray
Overview:

Property mean gotten from self._mean

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

reset()[source]
Overview:

Resets the state of the environment and reset properties: _mean, _var, _count

property std: numpy.ndarray
Overview:

Property std calculated from self._var and the epsilon value of self._epsilon

update(x)[source]
Overview:

Update mean, variable, and count

Arguments:
  • x: the batch

ObsNormEnv

class ding.envs.env_wrappers.ObsNormEnv(env)[source]
Overview:

Normalize observations according to running mean and std.

Interface:

__init__, step, reset, observation, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • data_count, clip_range, rms

__init__(env)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; setup the properties according to running mean and std.

Arguments:
  • env (gym.Env): the environment to wrap.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

observation(observation)[source]
Overview:

Get obeservation

Arguments:
  • observation (Any): Original observation

Returns:
  • observation (Any): Normalized new observation

reset(**kwargs)[source]
Overview:

Resets the state of the environment and reset properties.

Arguments:
  • kwargs (Dict): Reset with this key argumets

Returns:
  • observation (Any): New observation after reset

step(action)[source]
Overview:

Step the environment with the given action. Repeat action, sum reward, and update data_count, and also update the self.rms property once after integrating with the input action.

Arguments:
  • action (Any): the given action to step with.

Returns:
  • self.observation(observation) : normalized observation after the input action and updated self.rms

  • reward (Any) : amount of reward returned after previous action

  • done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results

  • info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

RewardNormEnv

class ding.envs.env_wrappers.RewardNormEnv(env, reward_discount)[source]
Overview:

Normalize reward according to running std.

Interface:

__init__, step, reward, reset, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • cum_reward, reward_discount, data_count, rms

__init__(env, reward_discount)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; setup the properties according to running mean and std.

Arguments:
  • env (gym.Env): the environment to wrap.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

reset(**kwargs)[source]
Overview:

Resets the state of the environment and reset properties (NumType ones to 0, and self.rms as reset rms wrapper)

Arguments:
  • kwargs (Dict): Reset with this key argumets

reward(reward)[source]
Overview:

Normalize reward if data_count is more than 30

Arguments:
  • reward(Float): Raw Reward

Returns:
  • reward(Float): Normalized Reward

step(action)[source]
Overview:

Step the environment with the given action. Repeat action, sum reward, and update data_count, and also update the self.rms and self.cum_reward properties once after integrating with the input action.

Arguments:
  • action (Any): the given action to step with.

Returns:
  • observation : normalized observation after the input action and updated self.rms

  • self.reward(reward) : amount of reward returned after previous action (normalized) and update self.cum_reward

  • done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results

  • info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

RamWrapper

class ding.envs.env_wrappers.RamWrapper(env, render=False)[source]
Overview:

Wrapper ram env into image-like env

Interface:

__init__, reset, step, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • n_frame (int): the number of frames to stack.

  • observation_space

__init__(env, render=False)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature;

Arguments:
  • env (gym.Env): the environment to wrap.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case only observation space wrapped to (128,1,1); others unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

reset()[source]
Overview:

Resets the state of the environment and reset properties.

Returns:
  • observation (Any): New observation after reset and reshaped

step(action)[source]
Overview:

Step the environment with the given action. Repeat action, sum reward and reshape the observation.

Arguments:
  • action (Any): the given action to step with.

Returns:
  • obs.reshape(128, 1, 1).astype(np.float32) : reshaped observation after step with type restriction.

  • reward (Any) : amount of reward returned after previous action

  • done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results

  • info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

EpisodicLifeEnv

class ding.envs.env_wrappers.EpisodicLifeEnv(env)[source]
Overview:

Make end-of-life == end-of-episode, but only reset on true game over. It helps the value estimation.

Interface:

__init__, step, reset, observation, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

  • lives, was_real_done

__init__(env)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; set lives to 0 at set done.

Arguments:
  • env (gym.Env): the environment to wrap.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

reset()[source]
Overview:

Calls the Gym environment reset, only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.

Returns:
  • obs (Any): New observation after reset with no-op step to advance from terminal/lost life state in case of not self.was_real_done.

step(action)[source]
Overview:

Step the environment with the given action. Repeat action, sum reward; set self.was_real_done as done, and step according to lives i.e. check current lives, make loss of life terminal, then update lives to handle bonus lives.

Arguments:
  • action (Any): the given action to step with.

Returns:
  • obs (Any): normalized observation after the input action and updated self.rms

  • reward (Any) : amount of reward returned after previous action

  • done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results

  • info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

FireResetEnv

class ding.envs.env_wrappers.FireResetEnv(env)[source]
Overview:

Take action on reset for environments that are fixed until firing. Related discussion: https://github.com/openai/baselines/issues/240

Interface:

__init__, reset, new_shape

Properties:
  • env (gym.Env): the environment to wrap.

__init__(env)[source]
Overview:

Initialize self. See help(type(self)) for accurate signature.

Arguments:
  • env (gym.Env): the environment to wrap.

static new_shape(obs_shape, act_shape, rew_shape)[source]
Overview:

Get new shape of observation, acton, and reward; in this case unchanged.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)

reset()[source]
Overview:

Resets the state of the environment and reset properties i.e. reset with action 1

update_shape

Overview:

Get new shape of observation, acton, and reward given the wrapper.

Arguments:

obs_shape (Any), act_shape (Any), rew_shape (Any), wrapper_names (Any)

Returns:

obs_shape (Any), act_shape (Any), rew_shape (Any)