diff --git a/README.cn.md b/README.cn.md index 2b5d1846405cb77f22d56b8b7f49a3a341192515..8a6c9fb4fe423ed5f12bd58a264a66f776ca30f1 100644 --- a/README.cn.md +++ b/README.cn.md @@ -2,7 +2,8 @@ PARL

-[English](./README.md) | 简体中文 +[English](./README.md) | 简体中文 +[**文档**](https://parl.readthedocs.io) > PARL 是一个高性能、灵活的强化学习框架。 # 特点 @@ -28,46 +29,11 @@ PARL的目标是构建一个可以完整复杂任务的智能体。以下是用 ### Agent `Agent` 负责算法与环境的交互,在交互过程中把生成的数据提供给`Algorithm`来更新模型(`Model`),数据的预处理流程也一般定义在这里。 -以下是构建一个包含DQN算法的智能体(`Agent`)用来玩雅达利游戏(`Atari Games`)的示例: - -```python -import parl -from parl.algorithms import DQN - -class AtariModel(parl.Model): - """AtariModel - This class defines the forward part for an algorithm, - its input is state observed on environment. - """ - def __init__(self, img_shape, action_dim): - # define your layers - self.cnn1 = layers.conv_2d(num_filters=32, filter_size=5, - stride=1, padding=2, act='relu') - ... - self.fc1 = layers.fc(action_dim) - - def value(self, img): - # define how to estimate the Q value based on the image of atari games. - img = img / 255.0 - l = self.cnn1(img) - ... - Q = self.fc1(l) - return Q -""" -三步定义一个智能体: - 1. 定义前向模型,就是上面的值函数网络(Value),定义了如何针对输入的游戏图像评估Q值。 - 2. 通过DQN算法来更新模型(Model),在这里我们直接import仓库中实现好的DQN算法即可。 - 3. 在AtariAgent中定义数据交互部分,把交互过程中得到的数据用来传给DQN算法以更新模型。 -""" - -model = AtariModel(img_shape=(32, 32), action_dim=4) -algorithm = DQN(model) -agent = AtariAgent(algorithm) -``` +提示: 请访问[教程](https://parl.readthedocs.io/en/latest/getting_started.html) and [API 文档](https://parl.readthedocs.io/en/latest/model.html)以获取更多关于基础类的信息。 # 简易高效的并行接口 在PARL中,一个**修饰符**(parl.remote_class)就可以帮助用户实现自己的并行算法。 -以下我们通过`Hello World`的例子来说明如何简单地通过PARL来调度外部的计算资源实现并行计算。 +以下我们通过`Hello World`的例子来说明如何简单地通过PARL来调度外部的计算资源实现并行计算。 请访问我们的[教程文档](https://parl.readthedocs.io/en/latest/parallel_training/setup.html)以获取更多的并行训练信息。 ```python #============Agent.py================= @parl.remote_class @@ -79,21 +45,14 @@ class Agent(object): def sum(self, a, b): return a+b -# launch `Agent.py` at any computation platforms such as a CPU cluster. -if __main__ == '__main__': - agent = Agent() - agent.as_remote(server_address) - - -#============Server.py================= -remote_manager = parl.RemoteManager() -agent = remote_manager.get_remote() +parl.connect('localhost:8037') +agent = Agent() agent.say_hello() ans = agent.sum(1,5) # run remotely and not comsume any local computation resources ``` 两步调度外部的计算资源: 1. 使用`parl.remote_class`修饰一个类,之后这个类就被转化为可以运行在其他CPU或者机器上的类。 -2. 通过`RemoteManager`获取远端的类实例,通过这种方式获取到的实例和原来的类是有同样的函数的。由于这些类是在别的计算资源上运行的,执行这些函数**不再消耗当前线程计算资源**。 +2. 调用`parl.connect`函数来初始化并行通讯,通过这种方式获取到的实例和原来的类是有同样的函数的。由于这些类是在别的计算资源上运行的,执行这些函数**不再消耗当前线程计算资源**。 PARL diff --git a/README.md b/README.md index 52586e02fa7f70076f147ef9755d46ec0c5f5a42..29c87fa1fb228446e6145aea40da32b1c34efd6b 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,8 @@ PARL

-English | [简体中文](./README.cn.md) +English | [简体中文](./README.cn.md) +[**Documentation**](https://parl.readthedocs.io) > PARL is a flexible and high-efficient reinforcement learning framework. @@ -28,47 +29,12 @@ The main abstractions introduced by PARL that are used to build an agent recursi `Algorithm` describes the mechanism to update parameters in `Model` and often contains at least one model. ### Agent -`Agent`, a data bridge between the environment and the algorithm, is responsible for data I/O with the outside environment and describes data preprocessing before feeding data into the training process. +`Agent`, a data bridge between the environment and the algorithm, is responsible for data I/O with the outside environment and describes data preprocessing before feeding data into the training process. -Here is an example of building an agent with DQN algorithm for Atari games. -```python -import parl -from parl.algorithms import DQN, DDQN - -class AtariModel(parl.Model): - """AtariModel - This class defines the forward part for an algorithm, - its input is state observed on the environment. - """ - def __init__(self, img_shape, action_dim): - # define your layers - self.cnn1 = layers.conv_2d(num_filters=32, filter_size=5, - stride=1, padding=2, act='relu') - ... - self.fc1 = layers.fc(action_dim) - - def value(self, img): - # define how to estimate the Q value based on the image of atari games. - img = img / 255.0 - l = self.cnn1(img) - ... - Q = self.fc1(l) - return Q -""" -three steps to build an agent - 1. define a forward model which is critic_model in this example - 2. a. to build a DQN algorithm, just pass the critic_model to `DQN` - b. to build a DDQN algorithm, just replace DQN in the following line with DDQN - 3. define the I/O part in AtariAgent so that it could update the algorithm based on the interactive data -""" - -model = AtariModel(img_shape=(32, 32), action_dim=4) -algorithm = DQN(model) -agent = AtariAgent(algorithm) -``` +Note: For more information about base classes, please visit our [tutorial](https://parl.readthedocs.io/en/latest/getting_started.html) and [API documentation](https://parl.readthedocs.io/en/latest/model.html). # Parallelization -PARL provides a compact API for distributed training, allowing users to transfer the code into a parallelized version by simply adding a decorator. +PARL provides a compact API for distributed training, allowing users to transfer the code into a parallelized version by simply adding a decorator. For more information about our APIs for parallel training, please visit our [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html). Here is a `Hello World` example to demonstrate how easy it is to leverage outer computation resources. ```python #============Agent.py================= @@ -81,21 +47,14 @@ class Agent(object): def sum(self, a, b): return a+b -# launch `Agent.py` at any computation platforms such as a CPU cluster. -if __main__ == '__main__': - agent = Agent() - agent.as_remote(server_address) - - -#============Server.py================= -remote_manager = parl.RemoteManager() -agent = remote_manager.get_remote() +parl.connect('localhost:8037') +agent = Agent() agent.say_hello() ans = agent.sum(1,5) # run remotely and not consume any local computation resources ``` Two steps to use outer computation resources: 1. use the `parl.remote_class` to decorate a class at first, after which it is transferred to be a new class that can run in other CPUs or machines. -2. Get remote objects from the `RemoteManager`, and these objects have the same functions as the real ones. However, calling any function of these objects **does not** consume local computation resources since they are executed elsewhere. +2. call `parl.connect` to initialize parallel communication before creating an object. Calling any function of the objects **does not** consume local computation resources since they are executed elsewhere. PARL As shown in the above figure, real actors(orange circle) are running at the cpu cluster, while the learner(blue circle) is running at the local gpu with several remote actors(yellow circle with dotted edge). diff --git a/examples/A2C/README.md b/examples/A2C/README.md index 2f5eec60aa9333047f1525a675677e3af5b27f15..469b3535e000970ebcef31cdd8a7200f6425db44 100755 --- a/examples/A2C/README.md +++ b/examples/A2C/README.md @@ -34,10 +34,10 @@ Note that if you have started a master before, you don't have to run the above command. For more information about the cluster, please refer to our [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html) -Then we can start the distributed training by running `learner.py`. +Then we can start the distributed training by running: ```bash -python learner.py +python train.py ``` ### Reference diff --git a/examples/A2C/learner.py b/examples/A2C/train.py similarity index 100% rename from examples/A2C/learner.py rename to examples/A2C/train.py diff --git a/examples/ES/README.md b/examples/ES/README.md index f25a449f1ac17a4d5190d08ff1986708d72cdcb0..e1d44d24d949eca8d08b7adfd4e8cc81337d2dbc 100644 --- a/examples/ES/README.md +++ b/examples/ES/README.md @@ -1,5 +1,5 @@ ## Reproduce ES with PARL -Based on PARL, the Evolution Strategies (ES) algorithm has been reproduced, reaching the same level of indicators as the paper in Mujoco benchmarks. +Based on PARL, we have implemented the Evolution Strategies (ES) algorithm and evaluate it in Mujoco environments. Its performance reaches the same level of indicators as the paper. + ES in [Evolution Strategies as a Scalable Alternative to Reinforcement Learning](https://arxiv.org/abs/1703.03864) @@ -8,7 +8,7 @@ Based on PARL, the Evolution Strategies (ES) algorithm has been reproduced, reac Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games. ### Benchmark result -TODO +![learninng_curve](learning_curve.png) ## How to use ### Dependencies @@ -20,18 +20,21 @@ TODO ### Distributed Training -#### Learner -```sh -python learner.py +To replicate the performance reported above, we encourage you to train with 96 CPUs. +If you haven't created a cluster before, enter the following command to create a cluster. For more information about the cluster, please refer to our [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html). + +```bash +xparl start --port 8037 --cpu_num 96 ``` -#### Actors -```sh -sh run_actors.sh +Then we can start the distributed training by running: + + +```bash +python train.py ``` -You can change training settings (e.g. `env_name`, `server_ip`) in `es_config.py`. If you want to use different number of actors, please modify `actor_num` in both `es_config.py` and `run_actors.sh`. -Training result will be saved in `log_dir/train/result.csv`. +Training result will be saved in `train_log` with training curve that can be visualized in tensorboard data. ### Reference + [Ray](https://github.com/ray-project/ray) diff --git a/examples/ES/es_config.py b/examples/ES/es_config.py index ad4c9c402ffd379b3b676a509f244110b4e7f347..854a3666611f770141c8f6375c84abc108feffee 100644 --- a/examples/ES/es_config.py +++ b/examples/ES/es_config.py @@ -14,8 +14,7 @@ config = { #========== remote config ========== - 'server_ip': 'localhost', - 'server_port': 8037, + 'master_address': 'localhost:8037', #========== env config ========== 'env_name': 'Humanoid-v1', diff --git a/examples/ES/learning_curve.png b/examples/ES/learning_curve.png new file mode 100644 index 0000000000000000000000000000000000000000..806ee1c40d0fefea0a00e7d2d8462021e0654436 Binary files /dev/null and b/examples/ES/learning_curve.png differ diff --git a/examples/ES/run_actors.sh b/examples/ES/run_actors.sh deleted file mode 100644 index 7df4f4bba18be6ce174b78278b93eeee50361dfb..0000000000000000000000000000000000000000 --- a/examples/ES/run_actors.sh +++ /dev/null @@ -1,10 +0,0 @@ -#!/bin/bash - -export CPU_NUM=1 - -actor_num=96 - -for i in $(seq 1 $actor_num); do - python actor.py & -done; -wait diff --git a/examples/ES/learner.py b/examples/ES/train.py similarity index 92% rename from examples/ES/learner.py rename to examples/ES/train.py index 478dfaf23570ebac65c59061a4b0a5ffb5fe452f..be2c7d703eeba39931312491274f554ee9a76562 100644 --- a/examples/ES/learner.py +++ b/examples/ES/train.py @@ -23,10 +23,10 @@ from obs_filter import MeanStdFilter from mujoco_agent import MujocoAgent from mujoco_model import MujocoModel from noise import SharedNoiseTable -from parl import RemoteManager from parl.utils import logger, tensorboard from parl.utils.window_stat import WindowStat from six.moves import queue +from actor import Actor class Learner(object): @@ -53,40 +53,37 @@ class Learner(object): self.actors_signal_input_queues = [] self.actors_output_queues = [] - self.run_remote_manager() + self.create_actors() self.eval_rewards_stat = WindowStat(self.config['report_window_size']) self.eval_lengths_stat = WindowStat(self.config['report_window_size']) - def run_remote_manager(self): - """ Accept connection of new remote actor and start sampling of the remote actor. + def create_actors(self): + """ create actors for parallel training. """ - remote_manager = RemoteManager(port=self.config['server_port']) - logger.info('Waiting for {} remote actors to connect.'.format( - self.config['actor_num'])) + parl.connect(self.config['master_address']) self.remote_count = 0 for i in range(self.config['actor_num']): - remote_actor = remote_manager.get_remote() signal_queue = queue.Queue() output_queue = queue.Queue() self.actors_signal_input_queues.append(signal_queue) self.actors_output_queues.append(output_queue) self.remote_count += 1 - logger.info('Remote actor count: {}'.format(self.remote_count)) remote_thread = threading.Thread( target=self.run_remote_sample, - args=(remote_actor, signal_queue, output_queue)) + args=(signal_queue, output_queue)) remote_thread.setDaemon(True) remote_thread.start() logger.info('All remote actors are ready, begin to learn.') - def run_remote_sample(self, remote_actor, signal_queue, output_queue): + def run_remote_sample(self, signal_queue, output_queue): """ Sample data from remote actor or get filters of remote actor. """ + remote_actor = Actor(self.config) while True: info = signal_queue.get() if info['signal'] == 'sample': @@ -211,6 +208,9 @@ class Learner(object): if __name__ == '__main__': from es_config import config + logger.info( + "Before training, it takes a few mimutes to initialize a noise table for exploration" + ) learner = Learner(config) while True: diff --git a/examples/GA3C/README.md b/examples/GA3C/README.md index 7c2ce7eb19fce1a3c13ec1dd1ea539fbb9a3c377..5b892fdb6c6108f1dd12b923d5057ddfeaadae3e 100755 --- a/examples/GA3C/README.md +++ b/examples/GA3C/README.md @@ -33,10 +33,10 @@ Note that if you have started a master before, you don't have to run the above command. For more information about the cluster, please refer to our [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html) -Then we can start the distributed training by running `learner.py`. +Then we can start the distributed training by running: ```bash -python learner.py +python train.py ``` [Tips] The performance can be influenced dramatically in a slower computational diff --git a/examples/GA3C/learner.py b/examples/GA3C/train.py similarity index 100% rename from examples/GA3C/learner.py rename to examples/GA3C/train.py diff --git a/examples/IMPALA/README.md b/examples/IMPALA/README.md index 5a65f3115aa3efcfa99e64fd34af0776c32a78c3..cd361daf894ea0cfaa5e056204cc938076ff4541 100755 --- a/examples/IMPALA/README.md +++ b/examples/IMPALA/README.md @@ -37,10 +37,10 @@ Note that if you have started a master before, you don't have to run the above command. For more information about the cluster, please refer to our [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html) -Then we can start the distributed training by running `learner.py`. +Then we can start the distributed training by running: ```bash -python learner.py +python train.py ``` ### Reference diff --git a/examples/IMPALA/learner.py b/examples/IMPALA/train.py similarity index 100% rename from examples/IMPALA/learner.py rename to examples/IMPALA/train.py