未验证 提交 d8449b74 编写于 作者: B Bo Zhou 提交者: GitHub

update documents (#58)

* Update README.md

* Update train.py

* Update README.md

* Update agent_base.py

* Update train.py

* Update train.py

* Update train.py
上级 348db1fb
......@@ -71,7 +71,7 @@ agent = AtariAgent(algorithm)
```
pip install --upgrade git+https://github.com/PaddlePaddle/PARL.git
pip install parl
```
# Examples
......
......@@ -137,14 +137,15 @@ def main():
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--rom', help='atari rom', required=True)
parser.add_argument(
'--rom', help='path of the rom of the atari game', required=True)
parser.add_argument(
'--batch_size', type=int, default=64, help='batch size for training')
parser.add_argument(
'--train_total_steps',
type=int,
default=int(1e8),
help='maximum training steps')
help='maximum environmental steps of games')
parser.add_argument(
'--test_every_steps',
type=int,
......
......@@ -128,7 +128,8 @@ python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]
<img src="image/velocity_distribution.png" alt="PARL" width="800"/>
</p>
Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then we fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem :)
Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :)
## Acknowledgments
We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.
......@@ -57,6 +57,9 @@ class Agent(object):
"""build your training program and prediction program here,
using the functions define_learn and define_predict in algorithm.
Note that it's unnecessary to call this function explictly since
it will be called automatically in the initialization function.
To build the program, you may need to do the following:
a. create a new program in fluid with program guard
b. define your data layer
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册