README.md 6.8 KB
Newer Older
1
# The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge
2 3 4 5 6 7 8 9 10 11 12 13
<p align="center">
<img src="image/competition.png" alt="PARL" width="800"/>
</p>

This folder contains the winning solution of our team `Firework` in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation.

For more technical details about our solution, we provide:
1. [[Link]](https://youtu.be/RT4JdMsZaTE) An interesting video demonstrating the training process visually.
2. [[Link]](https://docs.google.com/presentation/d/1n9nTfn3EAuw2Z7JichqMMHB1VzNKMgExLJHtS4VwMJg/edit?usp=sharing) A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop.
3. [[Link]](https://drive.google.com/file/d/1W-FmbJu4_8KmwMIzH0GwaFKZ0z1jg_u0/view?usp=sharing) A poster briefly introducing our solution in NeurIPS2018 competition workshop.
3. (coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient.

H
Hongsheng Zeng 已提交
14
**Note**: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person.
15 16 17 18

<p align="center">
<img src="image/demo.gif" alt="PARL" width="500"/>
</p>
19

20
## Dependencies
21
- python3.6
22
- [paddlepaddle>=1.2.1](https://github.com/PaddlePaddle/Paddle)
23
- [osim-rl](https://github.com/stanfordnmbl/osim-rl)
24 25 26 27
- [grpcio==1.12.1](https://grpc.io/docs/quickstart/python.html)
- tqdm
- tensorflow (To use tensorboard)

28 29 30
## Part1: Final submitted model
### Result
For final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds.
31

32 33
| Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes |
|----------------------------|---------------------------------|------------|-------------------|
34
| 9968.5404                  | 9980.3952                       | 0.0026     | 5000              |
35

36
### Test
37
- How to Run
38

39 40 41 42 43 44
  1. Enter the sub-folder `final_submit`
  2. Download the model file from online stroage service, [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3) 
  3. Unpack the file by using: 
           `tar zxvf saved_model.tar.gz`
  4. Launch test scription: 
           `python test.py`
45

46
## Part2: Curriculum learning
47

48 49 50
<p align="center">
<img src="image/curriculum-learning.png" alt="PARL" width="500"/>
</p>
51

52
#### 1. Target: Run as fast as possible
53

54 55 56
<p align="center">
<img src="image/fastest.png" alt="PARL" width="800"/>
</p>
57 58 59 60 61 62

```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1 

# client (Suggest: 200+ clients)
H
Hongsheng Zeng 已提交
63
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest
64 65
```

66
#### 2. Target: run at 3.0 m/s
67 68

```bash
69 70 71 72 73
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
           --restore_model_path [RunFastest model]

# client (Suggest: 200+ clients)
H
Hongsheng Zeng 已提交
74
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \
75 76
           --act_penalty_lowerbound 1.5 
```
77

78
#### 3. target: walk at 2.0 m/s
79

80 81 82 83
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
           --restore_model_path [FixedTargetSpeed 3.0m/s model]
84

85
# client (Suggest: 200+ clients)
H
Hongsheng Zeng 已提交
86
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \
87
           --act_penalty_lowerbound 0.75 
88 89
```

90 91 92 93 94
#### 4. target: walk slowly at 1.25 m/s

<p align="center">
<img src="image/last course.png" alt="PARL" width="800"/>
</p>
95 96 97 98 99 100 101

```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
           --restore_model_path [FixedTargetSpeed 2.0m/s model]  

# client (Suggest: 200+ clients)
H
Hongsheng Zeng 已提交
102
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \
103 104 105
           --act_penalty_lowerbound 0.6
```

106 107
## Part3: Training in random velocity environment for round2 evaluation
As mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. ([Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7))
108 109 110 111

```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \
H
Hongsheng Zeng 已提交
112
           --restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head 
113 114

# client (Suggest: 100+ clients)
H
Hongsheng Zeng 已提交
115
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \
116 117 118 119 120 121 122 123
           --act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3
```

### Test trained model

```bash
python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]
```
124 125 126 127 128 129 130

### Other implementation details

<p align="center">
<img src="image/velocity_distribution.png" alt="PARL" width="800"/>
</p>

B
Bo Zhou 已提交
131
Following the above steps correctly, you can get an agent that scores around 9960 in round2. Its performance is slightly poorer than our final submitted model. The score gap results from the multi-stage paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then we fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problem :)
132 133 134

## Acknowledgments
We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.