PARL Documentation
sphinx-quickstart on Mon Apr 22 11:12:25 2019.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
PARL Documentation
.. toctree::
:maxdepth: 1
.. automodule:: parl.framework.model_base
Agent (*Generate Data Flow*)
1. __init__(self, algorithm, gpu_id=None)
Call build_program here and run initialization for default_startup_program.
2. build_program(self)
Use define_predict and define_learn in Algorithm to build training program and prediction program. This will be called
by __init__ method in class Agent.
3. predict(self, obs)
Predict the action with current observation of the enviroment. Note that this function will only do the prediction and it doesn't try any exploration.
To explore in the action space, you should create your process in `sample` function below.
Basically, this function is often used in test process.
4. sample(self, obs)
Predict the action given current observation of the enviroment.
Additionaly, action will be added noise here to explore a new trajectory.
Basically, this function is often used in training process.
5. learn(self, obs, action, reward, next_obs, terminal)
Pass data to the training program to update model. This method is the training interface for Agent.
Algorithm (*Backward Part*)
1. define_predict(self, obs)
Use method policy( ) from Agent to predict the probabilities of actions.
2. define_learn(self, obs, action, reward, next_obs, terminal)
Define loss function and optimizer here to update the policy model.
An Example
.. code-block:: python
# From https://github.com/PaddlePaddle/PARL/blob/develop/parl/algorithms/policy_gradient.py
class PolicyGradient(Algorithm):
def __init__(self, model, hyperparas):
Algorithm.__init__(self, model, hyperparas)
self.model = model
self.lr = hyperparas['lr']
def define_predict(self, obs):
""" use policy model self.model to predict the action probability
return self.model.policy(obs)
def define_learn(self, obs, action, reward):
""" update policy model self.model with policy gradient algorithm
act_prob = self.model.policy(obs)
log_prob = layers.cross_entropy(act_prob, action)
cost = log_prob * reward
cost = layers.reduce_mean(cost)
optimizer = fluid.optimizer.Adam(self.lr)
return cost
Model (*Forward Part*)
1. policy(self, *args)
Define the structure of networks here. Algorithm will call this method to predict probabilities of actions.
It is optional.
2. value(self, *args)
Return: values: a dict of estimated values for the current observations and states.
For example, "q_value" and "v_value".
3. sync_params_to(self, target_net, gpu_id, decay=0.0, share_vars_parallel_executor=None)
This method deepcopied the parameters from the current network to the target network, which two have the same structure.
An example
.. code-block:: python
class MLPModel(Model):
def __init__(self):
self.fc = layers.fc(size=64)
def policy(self, obs):
out = self.fc(obs)
return out
model = MLPModel()
target_model = deepcopy(model) # automatically create new unique parameters names for target_model.fc
# build program
x = layers.data(name='x', shape=[100], dtype="float32")
y1 = model.policy(x)
y2 = target_model.policy(x)
Three Components
PARL is made up of three components: **Model, Algorithm, Agent**. They are constructed layer-by-layer to build the main body.
A Model is owned by an Algorithm. Model is responsible for the entire network model (**forward part**) for the specific problems.
Algorithm defines the way to update the parameters in the Model (**backward part**). We already implemented some common
used algorithms__, like DQN/DDPG/PPO/A3C, you can directly import and use them.
.. __: https://github.com/PaddlePaddle/PARL/tree/develop/parl/algorithms
Agent interates with the environment and **generate data flow** outside the Algorithm.
**1. Reproducible**
| We provide algorithms that reproduce stably the results of many influential reinforcement learning algorithms.
**2. Large Scale**
| Ability to support high-performance parallelization of training with thousands of CPUs and multi-GPUs.
**3. Reusable**
| Algorithms provided in the repository can be directly adapted to new tasks by defining a forward network and training mechanism will be built automatically.
**4. Extensible**
| Build new algorithms quickly by inheriting the abstract class in the framework.
Implemented Algorithms
*PARL is a flexible, distributed and eager mode oriented reinforcement learning framework.*
| **Eager Mode** | **Distributed Training** |
|.. code-block:: python |.. code-block:: python |
| | |
| # Target Network in DQN | # Real multi-thread programming |
| | # witout the GIL limitation |
| | |
| target_network = copy.deepcopy(Q_network) | @parl.remote_class |
| ... | class HelloWorld(object): |
| #reset parameters periodically | def sum(self, a, b): |
| target_network.load(Q_network) | return a + b |
| | |
| | parl.init() |
| | obj = HelloWorld() |
| | # NOT consume local computation resources |
| | ans = obj.sum(a, b) |
| | |
| PARL is distributed on PyPI and can be installed with pip:
.. centered:: ``pip install parl``
.. toctree::
:maxdepth: 1
:caption: Installation
.. toctree::
:maxdepth: 1
:caption: Features
.. toctree::
:maxdepth: 1
:caption: Basic_structure
.. toctree::
:maxdepth: 1
:caption: Tutorial
.. toctree::
:maxdepth: 1
:caption: High-quality Implementations
.. toctree::
:maxdepth: 1
:caption: APIs
.. image:: ../.github/abstractions.png
:align: center
:width: 400px
| PARL aims to build an **agent** for training algorithms to perform complex tasks.
| The main abstractions introduced by PARL that are used to build an agent recursively are the following:
* **Model** is abstracted to construct the forward network which defines a policy network or critic network given state as input.
* **Algorithm** describes the mechanism to update parameters in the *model* and often contains at least one model.
* **Agent**, a data bridge between the *environment* and the *algorithm*, is responsible for data I/O with the outside environment and describes data preprocessing before feeding data into the training process.
- Python 2.7 or 3.5+.
- PaddlePaddle >=1.2.1 (**Optional**, if you only want to use APIs related to parallelization alone)
PARL is distributed on PyPI and can be installed with pip:
pip install parl
