未验证 提交 46188cd4 编写于 作者: B Bo Zhou 提交者: GitHub

Update some docs. (#51)

* Update model_base.py

* Update README.md

* Update README.md
上级 bbde58fb
......@@ -26,7 +26,7 @@ The main abstractions introduced by PARL that are used to build an agent recursi
`Algorithm` describes the mechanism to update parameters in `Model` and often contains at least one model.
### Agent
`Agent` is a data bridge between environment and algorithm. It is responsible for data I/O with outside and describes data preprocessing before feeding into the training process.
`Agent` is a data bridge between environment and algorithm. It is responsible for data I/O with outside and describes data preprocessing before feeding data into the training process.
Here is an example of building an agent with DQN algorithm for atari games.
```python
......
......@@ -128,7 +128,7 @@ python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]
<img src="image/velocity_distribution.png" alt="PARL" width="800"/>
</p>
Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the target velocity distribution varies after each change of the target velocity. Thus we actually have 4 models for each stage, they are trained for launch stage, first change stage, second change stage, third change stage respectively. These four models are trained successively, this is, the first stage model is trained while the parameters of launch stage are fixed.We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem:)
Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then we fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem :)
## Acknowledgments
We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.
......@@ -121,7 +121,7 @@ class Model(Network):
"""
A Model is owned by an Algorithm.
It implements the entire network model(forward part) to solve a specific problem.
In conclusion, Model is responsible for forward and
In general, Model is responsible for forward and
Algorithm is responsible for backward.
Model can also use deepcopy way to construct target model, which has the same structure as initial model.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册