diff --git a/README.md b/README.md index a85a171874ed210ffbed2f3185261d5ac47aa298..6eae711131a651f618fece57981af1d003f281f7 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ The main abstractions introduced by PARL that are used to build an agent recursi `Algorithm` describes the mechanism to update parameters in `Model` and often contains at least one model. ### Agent -`Agent` is a data bridge between environment and algorithm. It is responsible for data I/O with outside and describes data preprocessing before feeding into the training process. +`Agent` is a data bridge between environment and algorithm. It is responsible for data I/O with outside and describes data preprocessing before feeding data into the training process. Here is an example of building an agent with DQN algorithm for atari games. ```python diff --git a/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md b/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md index ad4387533e5eefe9509c31e0c812dd29d361f935..bb41fd295b76171ad007858098c7ce10ce8a6d5a 100644 --- a/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md +++ b/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md @@ -128,7 +128,7 @@ python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM] PARL

-Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the target velocity distribution varies after each change of the target velocity. Thus we actually have 4 models for each stage, they are trained for launch stage, first change stage, second change stage, third change stage respectively. These four models are trained successively, this is, the first stage model is trained while the parameters of launch stage are fixed.We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem:) +Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then we fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem :) ## Acknowledgments We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video. diff --git a/parl/framework/model_base.py b/parl/framework/model_base.py index cba0df8bfc13525206c905947366c156b483727a..b7a896d4cfacf7e9da0712f5a902c68046ad93ea 100644 --- a/parl/framework/model_base.py +++ b/parl/framework/model_base.py @@ -121,7 +121,7 @@ class Model(Network): """ A Model is owned by an Algorithm. It implements the entire network model(forward part) to solve a specific problem. - In conclusion, Model is responsible for forward and + In general, Model is responsible for forward and Algorithm is responsible for backward. Model can also use deepcopy way to construct target model, which has the same structure as initial model.