Slime Volleyball is a two-player match-based environment with two types of observation spaces, vector and picture forms. The action space is often simplified to a discrete action space, used as the basic environment for testing ``self-play``-related algorithms. It is a collection of environments (there are 3 sub-environments, namely ``SlimeVolley-v0``, ``SlimeVolleyPixel-v0``, ``SlimeVolleyNoFrameskip-v0``), of which the ``SlimeVolley-v0`` game is shown in the figure below.
.. image:: ./images/slime_volleyball.gif
:align: center
Installation
===============
Installation Methods
------------------------
Install ``slimevolleygym``. You can install by command ``pip`` or through **DI-engine**.
.. code:: shell
# Method1: Install Directly
pip install slimevolleygym
Installation Check
------------------------
After completing installation, you can check whether it is succesful by the following commands:
.. code:: python
import gym
import slimevolleygym
env = gym.make("SlimeVolley-v0")
obs = env.reset()
print(obs.shape) # (12, )
DI-engine Mirrors
----
Due to Slime Volleyball is easy to install, DI-engine does not have Mirror specifically for it. You can customize your build with the benchmark Mirror ``opendilab/ding:nightly``, or visit the `docker
hub <https://hub.docker.com/repository/docker/opendilab/ding>`__ for more mirrors.
.. _Original environment space) :
Original Environment
========================
Note: ``SlimeVolley-v0`` is used here as an example, because benchmarking the ``self-play`` series of algorithms naturally gives priority to simplicity. If you want to use the other two environments, you can check the original repository and adapt the environment according to the `DI-engine的API <https://di-engine-docs.readthedocs.io/en/main-zh/feature/env_overview.html>`_.
.. _Observation Space-1:
Observation Space
--------------------------
- The observation space is a vector of size ``(12, )`` containing the absolute coordinates of self, opponent, and ball with two consecutive frames stitched togerther. The data type is \ ``float64``
- The original action space of ``SlimeVolley-v0`` is defined as ``MultiBinary(3)`` with three kinds of actions. More than one actions can be performed at the same time. Each action is corresponding to two cases: 0 (not executed) and 1 (executed).
i.e. ``(1, 0, 1)`` represents the execution of the first and third actions at the same time. The data type is \ ``int``\, which needs to be passed into a python list object (or a 1-dimensional np array of size 3, i.e. ``np.array([0, 1, 0])``
- The actual implementation does not strictly limit the action to 0 and 1. It treats values greater than 0 as 1, while values less than or equal to 0 as 0.
- In the ``SlimeVolley-v0`` environment, the basic action is meant to be
- 0: forward (forward)
- 1: backward (backward)
- 2: jump (jump)
- In the ``SlimeVolley-v0`` environment, the combined action is meant to be
- [0, 0, 0], NOOP
- [1, 0, 0], LEFT (forward)
- [1, 0, 1], UPLEFT (forward jump)
- [0, 0, 1], UP (jump)
- [0, 1, 1], UPRIGHT (backward jump)
- [0, 1, 0]] RIGHT (backward)
Reward Space
-----------------
- The reward is the score of the game. If the ball lands on the ground of your field, -1 is given. If it lands on the ground of the opponent‘s field, +1 is given. If the game is still in progress, 0 is given.
.. _Other-1:
Other
--------
- The end of the game is represented as the end of episode. There are two ending conditions
- 1. The life point of one side is 0, default is 5.
- 2. reach the maximum environmental step, default is 3000.
- The game supports two kinds of matchmaking, intelligent body against built-in bot (the bot left, the intelligent body right), intelligent body against intelligent body
- The built-in bot is a very simple RNN-trained smartbody `bot_link <https://blog.otoro.net/2015/03/28/neural-slime-volleyball/>`_
- Only one side's obs are returned by default. The other side's obs, and information can be found in the ``info`` field/
Key Facts
========
1. 1-dimensional vector observation space (of size (12, )) with information in absolute coordinates
2. ``MultiBinary`` action space
3. sparser rewards (maximum life value of 5, maximum number of steps of 3,000, the reward can be gain only when the life value is deducted)
.. _RL Environment Space) :
RL Environment Space
======================
.. _Observation Space-2:
Observation Space
--------
- Transform the space vector into a one-dimensional np array of size ``(12, )``. The data type is ``np.float32``.
Action Space
--------
- Transform the ``MultiBinary`` action space into a discrete action space of size 6 (a simple Cartesian product is sufficient). The final result is a one-dimensional np array of size \ ``(1, )``\. The data type is \ ``np.int64``
.. _Reward Space-2:
Reward Space
--------
- Transform the reward vector into a one-dimensional np array of size\ ``(1, )``\. The data type is\ ``np.float32``\ values in ``[-1, 0, 1]``.
- The\ `info``\returned form the environment\ ``step``\ must contain the\ ``final_eval_reward``\ key-value pair, which represents the evaluation metrics for the entire episode, containing the rewards for the episode (life value difference between two players).
- The above spatial definitions are all descriptions of single intelligences. The multi-intelligence space splices the corresponding obs/action/reward information.
i.e. The observation space changes from ``(12, )`` to ``(2, 12)``, thar represents the observation information of both sides.
.. _Other-3:
Other
====
Lazy initialization
----------
In order to support environment vetorization, an environment instance is oftern initialized lazily. In this way, method ``__init__`` does not really initialize the real original environment, but only set corresponding parameters and configurations. The real original environment is initialized when first calling mdthod ``reset``.
Random Seed
------------------
- There are two random seeds in the environment. One is orignal environment's random seed; The other is the random seed which is required in many environment space transformations. (e.g. ``random``, ``np.random``)
- As a user, you only need to set these two random seeds by calling method ``seed``, and do not need to care about the implementation details.
- Implementation details: For orignal environment's random seed, within RL env's ``reset`` method; Before orginal env's ``reset`` method.
- Implementation details: For the seed for ``random`` / ``np.random``, within env's ``seed`` method.
Difference between training env and evaluation env
- Training env uses dynamic random seed, i.e. Every episode has different random seeds generated by one random generator. However, this random generator's random seed is set by env's ``seed`` method, and is fixed throughout an experiment. Evaluation env uses static random seed, i.e. Every episode has the same random seed, which is set directly by ``seed`` method.
- Training env and evaluation env use different pre-process wrappers. ``episode_life`` and ``clip_reward`` are not used in evaluation env.
Save the replay video
----------------------------
After env is initiated, and before it is reset, call ``enable_save_replay`` method to set where the replay video will be saved. Environment will automatically save the replay video after each episode is completed. (The default call is ``gym.wrapper.Monitor``, depending on ``ffmpeg``). The code shown below will run an environment episode and save the replay viedo in a file like ``./video/xxx.mp4``.
.. code:: python
from easydict import EasyDict
from dizoo.slime_volley.envs.slime_volley_env import SlimeVolleyEnv