Update some docs. (#51)

* Update model_base.py * Update README.md * Update README.md

Update some docs. (#51)
* Update model_base.py * Update README.md * Update README.md
46188cd4 · Bo Zhou · GitHub · bbde58fb · 46188cd4 · 46188cd4
Showing with 3 addition and 3 deletion

README.md README.md +1 -1

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md +1 -1

parl/framework/model_base.py parl/framework/model_base.py +1 -1

未找到文件。
--- a/README.md
+++ b/README.md
@@ -26,7 +26,7 @@ The main abstractions introduced by PARL that are used to build an agent recursi
 `Algorithm` describes the mechanism to update parameters in `Model` and often contains at least one model.

 ### Agent
-`Agent` is a data bridge between environment and algorithm. It is responsible for data I/O with outside and describes data preprocessing before feeding into the training process.
+`Agent` is a data bridge between environment and algorithm. It is responsible for data I/O with outside and describes data preprocessing before feeding data into the training process.

 Here is an example of building an agent with DQN algorithm for atari games.
 ```python

--- a/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md
+++ b/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md
@@ -128,7 +128,7 @@ python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]
 <img src="image/velocity_distribution.png" alt="PARL" width="800"/>
 </p>

-Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the target velocity distribution varies after each change of the target velocity. Thus we actually have 4 models for each stage, they are trained for launch stage, first change stage, second change stage, third change stage respectively. These four models are trained successively, this is, the first stage model is trained while the parameters of launch stage are fixed.We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem:)
+Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then we fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem :)

 ## Acknowledgments
 We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.
--- a/parl/framework/model_base.py
+++ b/parl/framework/model_base.py
@@ -121,7 +121,7 @@ class Model(Network):
    """
    A Model is owned by an Algorithm. 
    It implements the entire network model(forward part) to solve a specific problem.
-    In conclusion, Model is responsible for forward and 
+    In general, Model is responsible for forward and 
    Algorithm is responsible for backward.

    Model can also use deepcopy way to construct target model, which has the same structure as initial model.