Created by: qingqing01
Fix https://github.com/PaddlePaddle/models/issues/1017
优化前:
实验 | GPU个数 | BatchSize/GPU | 是否Load Data | 数据处理+拷贝 | Run时间 | 10个batch的时间 |
---|---|---|---|---|---|---|
Executor | 1 | 5 | Yes | 7.45 | 18.94 | 27.12 |
Executor | 1 | 5 | No | 0 | 18.77 | 19.60 |
ParallelExe | 2 | 5 | Yes | 16.03 | 25.96 | 41.99 |
ParallelExe | 2 | 5 | No | 0 | 25.56 | 25.57 |
ParallelExe | 4 | 5 | Yes | 31.15 | 32.43 | 63.58 |
ParallelExe | 4 | 5 | No | 0 | 30.48 | 30.47 |
优化后:
实验 | GPU个数 | BatchSize/GPU | 是否Load Data | 数据处理+拷贝 | Run时间 | 10个batch的时间 |
---|---|---|---|---|---|---|
Executor | 1 | 5 | Yes | 0.59 | 18.43 | 20.03 |
Executor | 1 | 5 | No | 0 | 18.77 | 19.60 |
ParallelExe | 2 | 5 | Yes | 1.76 | 26.06 | 27.82 |
ParallelExe | 2 | 5 | No | 0 | 25.10 | 25.11 |
ParallelExe | 4 | 5 | Yes | 3.34 | 34.45 | 37.81 |
ParallelExe | 4 | 5 | No | 0 | 31.28 | 31.29 |
- 单机1卡总时间:27.12 -> 20.03, 提升 35%
- 单机4卡总时间:63.58 -> 37.81, 提升 68%