Created by: chenwhql
动态图DataLoader这段时间进行了两次优化:
-
优化1:https://github.com/PaddlePaddle/Paddle/pull/21634
- 去掉了一些原DataLoader不合理的实现,个人测试ResNet整体训练提速6.2%(相对于使用优化前DataLoader)
-
优化2:https://github.com/PaddlePaddle/Paddle/pull/21762
- 用子进程加速数据的载入过程,个人测试ResNet累计整体提速32.2%(相对于使用优化前DataLoader)
目前这两次优化的PR均已Merge到develop,现在根据最新的代码对这两次优化进行整体效果测试(验证结果以本次测试为准)。
测试方法:
- 拉取models repo,然后拉取本PR所在分支至当前models repo(本人测试的models版本号:109a3c75,如果有冲突,可以考虑切换到此分支,或手动解决)
- 返回dygraph目录下,执行dataloader_test.sh,等待测试结果
- 执行parse_dataloader_test_result.py,将结果输出至终端,对比分析
个人测试过程概述:
- 基于models/dygraph下面mnist, resnet, se_resnet, transformer共4个模型进行测试
- 将上述所有模型代码中epoch数改为1,缩短测试时间,其他参数保持不变
- 在上述所有模型中加入记录train部分时间的测试代码,累计在末尾打印
- 对于上述模型,基于原来的train.py,复制得到train_sp.py, train_mp.py
- train_sp.py, train_mp.py均将数据载入方式改为使用DataLoader
- train_sp.py使用 优化1 之后的DataLoader
- train_mp.py使用 优化2 之后的DataLoader(use_multiprocess=True)
- 统一对上述几个模型的单epoch,单卡8卡进行测试,日志存储到本地
- 从本地日志中查看测试结果
个人测试数据说明:
- 开发机:yq01-gpu-255-137-12-00,8卡P40
- CMAKE指令:cmake .. -DPY_VERSION=3.5 -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=ON -DCUDA_ARCH_NAME=Auto -DWITH_TESTING=ON -DWITH_DISTRIBUTE=ON
- GCC版本:5.4.0
- 数据对应关系
- reader列数据 - train.py
- 单进程DataLoader列数据 - train_sp.py
- 多进程DataLoader列数据 - train_mp.py
- 补充说明
- 目前的主要优化是将数据从磁盘载入的过程写入多进程,但创建线程和进程也会引入开销
- 本次优化仅对的CV类模型训练速度提升比较明显,NLP模型读入数据负载较小,所以没有明显优化效果,NLP模型建议暂时不使用多进程模式,后续仍会继续优化
- 表格统计数据保留小数点后3位,四舍五入
- 多卡测试数据统计的是0卡的数据
1. 单卡测试数据(单位s)
模型 | reader | 单进程DataLoader | 多进程DataLoader (相对reader) |
---|---|---|---|
mnist | 12.000 | 11.432 | 9.448 (-21.3%) |
resnet | 94.407 | 83.520 | 63.586 (-32.6%) |
se_resnext | 180.853 | 134.58 | 128.874 (-28.7%) |
transformer | 93.559 | 93.331 | 93.000 (-0.5%) |
2. 8卡测试数据(单位s)
模型 | reader | 单进程DataLoader | 多进程DataLoader (相对reader) |
---|---|---|---|
mnist | 6.367 | 7.272 | 5.971 (-6.2%) |
resnet | 57.158 | 55.931 | 51.899 (-9.2%) |
se_resnext | 67.845 | 62.330 | 54.590 (-19.5%) |
transformer | 22.807 | 24.197 | 23.726 (+4%) |
注意
- 检查当前GPU的环境变量,建议设置为8卡,CUDA_VISLBLE_DEVICES=0,1,2,3,4,5,6,7
- 确认当前GPU没有被其他任务占用
- 确认当前CPU也没有被比较重的任务占用
附录 - 原始测试结果
λ yq01-gpu-255-137-12-00 /work/models/dygraph {develop} python parse_dataloader_test_result.py
./dataloader_test_log/mnist - total train time: 12.000037908554077 s
./dataloader_test_log/resnet - total train time: 94.40757536888123 s
./dataloader_test_log/se_resnet - total train time: 180.8534700870514 s
./dataloader_test_log/transformer - total train time: 93.55937242507935 s
./dataloader_test_log/mnist_sp - total train time: 11.432420492172241 s
./dataloader_test_log/resnet_sp - total train time: 83.5201735496521 s
./dataloader_test_log/se_resnet_sp - total train time: 134.5808563232422 s
./dataloader_test_log/transformer_sp - total train time: 93.33124494552612 s
./dataloader_test_log/mnist_mp - total train time: 9.447554111480713 s
./dataloader_test_log/resnet_mp - total train time: 63.58643341064453 s
./dataloader_test_log/se_resnet_mp - total train time: 128.87386989593506 s
./dataloader_test_log/transformer_mp - total train time: 93.00028419494629 s
./dataloader_test_log/mnist_8/workerlog.0 - total train time: 6.367103576660156 s
./dataloader_test_log/mnist_8/workerlog.1 - total train time: 6.469170331954956 s
./dataloader_test_log/mnist_8/workerlog.2 - total train time: 6.326692581176758 s
./dataloader_test_log/mnist_8/workerlog.3 - total train time: 6.332724571228027 s
./dataloader_test_log/mnist_8/workerlog.4 - total train time: 6.324522018432617 s
./dataloader_test_log/mnist_8/workerlog.5 - total train time: 6.312472343444824 s
./dataloader_test_log/mnist_8/workerlog.6 - total train time: 6.3679540157318115 s
./dataloader_test_log/mnist_8/workerlog.7 - total train time: 6.320514440536499 s
./dataloader_test_log/resnet_8/workerlog.0 - total train time: 57.158374071121216 s
./dataloader_test_log/resnet_8/workerlog.1 - total train time: 57.0883584022522 s
./dataloader_test_log/resnet_8/workerlog.2 - total train time: 57.085835218429565 s
./dataloader_test_log/resnet_8/workerlog.3 - total train time: 57.082090854644775 s
./dataloader_test_log/resnet_8/workerlog.4 - total train time: 57.22903871536255 s
./dataloader_test_log/resnet_8/workerlog.5 - total train time: 57.0818076133728 s
./dataloader_test_log/resnet_8/workerlog.6 - total train time: 56.99318337440491 s
./dataloader_test_log/resnet_8/workerlog.7 - total train time: 57.07746934890747 s
./dataloader_test_log/se_resnet_8/workerlog.0 - total train time: 67.84486436843872 s
./dataloader_test_log/se_resnet_8/workerlog.1 - total train time: 67.85854125022888 s
./dataloader_test_log/se_resnet_8/workerlog.2 - total train time: 67.86838150024414 s
./dataloader_test_log/se_resnet_8/workerlog.3 - total train time: 67.84253692626953 s
./dataloader_test_log/se_resnet_8/workerlog.4 - total train time: 67.87079882621765 s
./dataloader_test_log/se_resnet_8/workerlog.5 - total train time: 67.87530589103699 s
./dataloader_test_log/se_resnet_8/workerlog.6 - total train time: 67.89710235595703 s
./dataloader_test_log/se_resnet_8/workerlog.7 - total train time: 67.49056339263916 s
./dataloader_test_log/transformer_8/workerlog.0 - total train time: 22.80716848373413 s
./dataloader_test_log/transformer_8/workerlog.1 - total train time: 22.8052077293396 s
./dataloader_test_log/transformer_8/workerlog.2 - total train time: 22.810211420059204 s
./dataloader_test_log/transformer_8/workerlog.3 - total train time: 23.049880027770996 s
./dataloader_test_log/transformer_8/workerlog.4 - total train time: 22.928219318389893 s
./dataloader_test_log/transformer_8/workerlog.5 - total train time: 22.99123525619507 s
./dataloader_test_log/transformer_8/workerlog.6 - total train time: 22.811274528503418 s
./dataloader_test_log/transformer_8/workerlog.7 - total train time: 22.81301498413086 s
./dataloader_test_log/mnist_8_sp/workerlog.0 - total train time: 7.271549224853516 s
./dataloader_test_log/mnist_8_sp/workerlog.1 - total train time: 7.24362587928772 s
./dataloader_test_log/mnist_8_sp/workerlog.2 - total train time: 7.29660964012146 s
./dataloader_test_log/mnist_8_sp/workerlog.3 - total train time: 7.2771148681640625 s
./dataloader_test_log/mnist_8_sp/workerlog.4 - total train time: 7.337551116943359 s
./dataloader_test_log/mnist_8_sp/workerlog.5 - total train time: 7.274338245391846 s
./dataloader_test_log/mnist_8_sp/workerlog.6 - total train time: 7.2760045528411865 s
./dataloader_test_log/mnist_8_sp/workerlog.7 - total train time: 7.299149990081787 s
./dataloader_test_log/resnet_8_sp/workerlog.0 - total train time: 55.93130087852478 s
./dataloader_test_log/resnet_8_sp/workerlog.1 - total train time: 56.52779817581177 s
./dataloader_test_log/resnet_8_sp/workerlog.2 - total train time: 56.52779531478882 s
./dataloader_test_log/resnet_8_sp/workerlog.3 - total train time: 56.63466262817383 s
./dataloader_test_log/resnet_8_sp/workerlog.4 - total train time: 56.64422035217285 s
./dataloader_test_log/resnet_8_sp/workerlog.5 - total train time: 56.527549743652344 s
./dataloader_test_log/resnet_8_sp/workerlog.6 - total train time: 56.52781629562378 s
./dataloader_test_log/resnet_8_sp/workerlog.7 - total train time: 56.52778148651123 s
./dataloader_test_log/se_resnet_8_sp/workerlog.0 - total train time: 62.33003783226013 s
./dataloader_test_log/se_resnet_8_sp/workerlog.1 - total train time: 62.401503801345825 s
./dataloader_test_log/se_resnet_8_sp/workerlog.2 - total train time: 62.092822790145874 s
./dataloader_test_log/se_resnet_8_sp/workerlog.3 - total train time: 62.20855903625488 s
./dataloader_test_log/se_resnet_8_sp/workerlog.4 - total train time: 62.29902935028076 s
./dataloader_test_log/se_resnet_8_sp/workerlog.5 - total train time: 62.26711320877075 s
./dataloader_test_log/se_resnet_8_sp/workerlog.6 - total train time: 62.40175724029541 s
./dataloader_test_log/se_resnet_8_sp/workerlog.7 - total train time: 62.07155179977417 s
./dataloader_test_log/transformer_8_sp/workerlog.0 - total train time: 24.19667410850525 s
./dataloader_test_log/transformer_8_sp/workerlog.1 - total train time: 24.166663885116577 s
./dataloader_test_log/transformer_8_sp/workerlog.2 - total train time: 24.18856978416443 s
./dataloader_test_log/transformer_8_sp/workerlog.3 - total train time: 24.231191873550415 s
./dataloader_test_log/transformer_8_sp/workerlog.4 - total train time: 24.256746292114258 s
./dataloader_test_log/transformer_8_sp/workerlog.5 - total train time: 24.22942304611206 s
./dataloader_test_log/transformer_8_sp/workerlog.6 - total train time: 24.18540906906128 s
./dataloader_test_log/transformer_8_sp/workerlog.7 - total train time: 24.51620316505432 s
./dataloader_test_log/mnist_8_mp/workerlog.0 - total train time: 5.971302270889282 s
./dataloader_test_log/mnist_8_mp/workerlog.1 - total train time: 6.012018203735352 s
./dataloader_test_log/mnist_8_mp/workerlog.2 - total train time: 5.971257209777832 s
./dataloader_test_log/mnist_8_mp/workerlog.3 - total train time: 6.194690227508545 s
./dataloader_test_log/mnist_8_mp/workerlog.4 - total train time: 5.955884218215942 s
./dataloader_test_log/mnist_8_mp/workerlog.5 - total train time: 6.020132303237915 s
./dataloader_test_log/mnist_8_mp/workerlog.6 - total train time: 5.928117513656616 s
./dataloader_test_log/mnist_8_mp/workerlog.7 - total train time: 5.965324878692627 s
./dataloader_test_log/resnet_8_mp/workerlog.0 - total train time: 51.89928698539734 s
./dataloader_test_log/resnet_8_mp/workerlog.1 - total train time: 51.88896584510803 s
./dataloader_test_log/resnet_8_mp/workerlog.2 - total train time: 51.88880944252014 s
./dataloader_test_log/resnet_8_mp/workerlog.3 - total train time: 52.01701879501343 s
./dataloader_test_log/resnet_8_mp/workerlog.4 - total train time: 51.87888717651367 s
./dataloader_test_log/resnet_8_mp/workerlog.5 - total train time: 51.88829278945923 s
./dataloader_test_log/resnet_8_mp/workerlog.6 - total train time: 51.988335609436035 s
./dataloader_test_log/resnet_8_mp/workerlog.7 - total train time: 51.889004707336426 s
./dataloader_test_log/se_resnet_8_mp/workerlog.0 - total train time: 54.58965301513672 s
./dataloader_test_log/se_resnet_8_mp/workerlog.1 - total train time: 54.455613136291504 s
./dataloader_test_log/se_resnet_8_mp/workerlog.2 - total train time: 54.44919466972351 s
./dataloader_test_log/se_resnet_8_mp/workerlog.3 - total train time: 54.455557107925415 s
./dataloader_test_log/se_resnet_8_mp/workerlog.4 - total train time: 54.609676122665405 s
./dataloader_test_log/se_resnet_8_mp/workerlog.5 - total train time: 54.4555549621582 s
./dataloader_test_log/se_resnet_8_mp/workerlog.6 - total train time: 54.413665771484375 s
./dataloader_test_log/se_resnet_8_mp/workerlog.7 - total train time: 54.44967770576477 s
./dataloader_test_log/transformer_8_mp/workerlog.0 - total train time: 23.726247310638428 s
./dataloader_test_log/transformer_8_mp/workerlog.1 - total train time: 24.126054048538208 s
./dataloader_test_log/transformer_8_mp/workerlog.2 - total train time: 23.720133304595947 s
./dataloader_test_log/transformer_8_mp/workerlog.3 - total train time: 23.724708795547485 s
./dataloader_test_log/transformer_8_mp/workerlog.4 - total train time: 23.7517249584198 s
./dataloader_test_log/transformer_8_mp/workerlog.5 - total train time: 23.745847463607788 s
./dataloader_test_log/transformer_8_mp/workerlog.6 - total train time: 23.73387098312378 s
./dataloader_test_log/transformer_8_mp/workerlog.7 - total train time: 23.724374055862427 s