提交 4163d732 编写于 作者: B Bo Zhou 提交者: Hongsheng Zeng

update readme for competition folder (#42)

* Update README.md

* add experimental results
上级 cdb50056
# The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge # The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge
<p align="center">
<img src="image/competition.png" alt="PARL" width="800"/>
</p>
This folder contains the winning solution of our team `Firework` in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation.
For more technical details about our solution, we provide:
1. [[Link]](https://youtu.be/RT4JdMsZaTE) An interesting video demonstrating the training process visually.
2. [[Link]](https://docs.google.com/presentation/d/1n9nTfn3EAuw2Z7JichqMMHB1VzNKMgExLJHtS4VwMJg/edit?usp=sharing) A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop.
3. [[Link]](https://drive.google.com/file/d/1W-FmbJu4_8KmwMIzH0GwaFKZ0z1jg_u0/view?usp=sharing) A poster briefly introducing our solution in NeurIPS2018 competition workshop.
3. (coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient.
**Note**: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from different people.
<p align="center">
<img src="image/demo.gif" alt="PARL" width="500"/>
</p>
This folder contains the code used to train the winning models for the [NeurIPS 2018: AI for Prosthetics Challenge](https://www.crowdai.org/challenges/neurips-2018-ai-for-prosthetics-challenge) along with the resulting models.
## Dependencies ## Dependencies
- python3.6 - python3.6
- [paddlepaddle>=1.2.1](https://github.com/PaddlePaddle/Paddle) - [paddlepaddle>=1.2.1](https://github.com/PaddlePaddle/Paddle)
...@@ -9,30 +25,35 @@ This folder contains the code used to train the winning models for the [NeurIPS ...@@ -9,30 +25,35 @@ This folder contains the code used to train the winning models for the [NeurIPS
- tqdm - tqdm
- tensorflow (To use tensorboard) - tensorflow (To use tensorboard)
## Result ## Part1: Final submitted model
### Result
For final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds.
| Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes | | Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes |
|----------------------------|---------------------------------|------------|-------------------| |----------------------------|---------------------------------|------------|-------------------|
| 9968.5404 | 9980.3952 | 0.0026 | 500 CPUs * 10 episodes | | 9968.5404 | 9980.3952 | 0.0026 | 5000 |
## Start test our submit models ### Test
- How to Run - How to Run
```bash 1. Enter the sub-folder `final_submit`
# cd current directory 2. Download the model file from online stroage service, [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3)
# cd final_submit/ 3. Unpack the file by using:
# download submit models file (saved_model.tar.gz) `tar zxvf saved_model.tar.gz`
tar zxvf saved_model.tar.gz 4. Launch test scription:
python test.py `python test.py`
```
> You can download models file from [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3)
## Part2: Curriculum learning
## Start train <p align="center">
<img src="image/curriculum-learning.png" alt="PARL" width="500"/>
</p>
### Stage I: Curriculum learning #### 1. Target: Run as fast as possible
#### 1. Run Fastest <p align="center">
<img src="image/fastest.png" alt="PARL" width="800"/>
</p>
```bash ```bash
# server # server
...@@ -42,7 +63,7 @@ python simulator_server.py --port [PORT] --ensemble_num 1 ...@@ -42,7 +63,7 @@ python simulator_server.py --port [PORT] --ensemble_num 1
python simulator_client.py --port [PORT] --ip [IP] --reward_type RunFastest python simulator_client.py --port [PORT] --ip [IP] --reward_type RunFastest
``` ```
#### 2. target speed 3.0 m/s #### 2. Target: run at 3.0 m/s
```bash ```bash
# server # server
...@@ -54,7 +75,7 @@ python simulator_client.py --port [PORT] --ip [IP] --reward_type FixedTargetSpee ...@@ -54,7 +75,7 @@ python simulator_client.py --port [PORT] --ip [IP] --reward_type FixedTargetSpee
--act_penalty_lowerbound 1.5 --act_penalty_lowerbound 1.5
``` ```
#### 3. target speed 2.0 m/s #### 3. target: walk at 2.0 m/s
```bash ```bash
# server # server
...@@ -66,7 +87,11 @@ python simulator_client.py --port [PORT] --ip [IP] --reward_type FixedTargetSpee ...@@ -66,7 +87,11 @@ python simulator_client.py --port [PORT] --ip [IP] --reward_type FixedTargetSpee
--act_penalty_lowerbound 0.75 --act_penalty_lowerbound 0.75
``` ```
#### 4. target speed 1.25 m/s #### 4. target: walk slowly at 1.25 m/s
<p align="center">
<img src="image/last course.png" alt="PARL" width="800"/>
</p>
```bash ```bash
# server # server
...@@ -78,9 +103,8 @@ python simulator_client.py --port [PORT] --ip [IP] --reward_type FixedTargetSpee ...@@ -78,9 +103,8 @@ python simulator_client.py --port [PORT] --ip [IP] --reward_type FixedTargetSpee
--act_penalty_lowerbound 0.6 --act_penalty_lowerbound 0.6
``` ```
### Stage II: Round2 ## Part3: Training in random velocity environment for round2 evaluation
As mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. ([Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7))
> You can download resulting 1.25m/s model in Stage I from [Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7)
```bash ```bash
# server # server
...@@ -92,10 +116,19 @@ python simulator_client.py --port [PORT] --ip [IP] --reward_type Round2 --act_pe ...@@ -92,10 +116,19 @@ python simulator_client.py --port [PORT] --ip [IP] --reward_type Round2 --act_pe
--act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3 --act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3
``` ```
> To get a higher score, you need train a seperate model for every stage (target_v change times), and fix trained model of previous stage. It's omitted here.
### Test trained model ### Test trained model
```bash ```bash
python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM] python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]
``` ```
### Other implementation details
<p align="center">
<img src="image/velocity_distribution.png" alt="PARL" width="800"/>
</p>
Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the target velocity distribution varies after each change of the target velocity. Thus we actually have 4 models for each stage, they are trained for launch stage, first change stage, second change stage, third change stage respectively. These four models are trained successively, this is, the first stage model is trained while the parameters of launch stage are fixed.We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem:)
## Acknowledgments
We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册