@@ -26,7 +26,7 @@ The main abstractions introduced by PARL that are used to build an agent recursi
`Algorithm` describes the mechanism to update parameters in `Model` and often contains at least one model.
### Agent
`Agent` is a data bridge between environment and algorithm. It is responsible for data I/O with outside and describes data preprocessing before feeding into the training process.
`Agent` is a data bridge between environment and algorithm. It is responsible for data I/O with outside and describes data preprocessing before feeding data into the training process.
Here is an example of building an agent with DQN algorithm for atari games.
Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the target velocity distribution varies after each change of the target velocity. Thus we actually have 4 models for each stage, they are trained for launch stage, first change stage, second change stage, third change stage respectively. These four models are trained successively, this is, the first stage model is trained while the parameters of launch stage are fixed.We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem:)
Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then we fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem :)
## Acknowledgments
We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.