Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • models
  • 合并请求
  • !4207

M
models
  • 项目概览

PaddlePaddle / models
大约 2 年 前同步成功

通知 232
Star 6828
Fork 2962
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
M
models
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 602
    • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
    • 合并请求 255
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

[Test] Dygraph DataLoader Phased Optimization !4207

  • Report abuse
!4207 开放中 1月 16, 2020 由 saxon_zh@saxon_zh 创建
#<User:0x00007ff7d70183b8>
  • 概览 0
  • 提交 1
  • 变更 14

Created by: chenwhql

动态图DataLoader这段时间进行了两次优化:

  • 优化1:https://github.com/PaddlePaddle/Paddle/pull/21634
    • 去掉了一些原DataLoader不合理的实现,个人测试ResNet整体训练提速6.2%(相对于使用优化前DataLoader)
  • 优化2:https://github.com/PaddlePaddle/Paddle/pull/21762
    • 用子进程加速数据的载入过程,个人测试ResNet累计整体提速32.2%(相对于使用优化前DataLoader)

目前这两次优化的PR均已Merge到develop,现在根据最新的代码对这两次优化进行整体效果测试(验证结果以本次测试为准)。

测试方法:

  1. 拉取models repo,然后拉取本PR所在分支至当前models repo(本人测试的models版本号:109a3c75,如果有冲突,可以考虑切换到此分支,或手动解决)
  2. 返回dygraph目录下,执行dataloader_test.sh,等待测试结果
  3. 执行parse_dataloader_test_result.py,将结果输出至终端,对比分析

个人测试过程概述:

  1. 基于models/dygraph下面mnist, resnet, se_resnet, transformer共4个模型进行测试
  2. 将上述所有模型代码中epoch数改为1,缩短测试时间,其他参数保持不变
  3. 在上述所有模型中加入记录train部分时间的测试代码,累计在末尾打印
  4. 对于上述模型,基于原来的train.py,复制得到train_sp.py, train_mp.py
  • train_sp.py, train_mp.py均将数据载入方式改为使用DataLoader
  • train_sp.py使用 优化1 之后的DataLoader
  • train_mp.py使用 优化2 之后的DataLoader(use_multiprocess=True)
  1. 统一对上述几个模型的单epoch,单卡8卡进行测试,日志存储到本地
  2. 从本地日志中查看测试结果

个人测试数据说明:

  • 开发机:yq01-gpu-255-137-12-00,8卡P40
  • CMAKE指令:cmake .. -DPY_VERSION=3.5 -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=ON -DCUDA_ARCH_NAME=Auto -DWITH_TESTING=ON -DWITH_DISTRIBUTE=ON
  • GCC版本:5.4.0
  • 数据对应关系
    • reader列数据 - train.py
    • 单进程DataLoader列数据 - train_sp.py
    • 多进程DataLoader列数据 - train_mp.py
  • 补充说明
    • 目前的主要优化是将数据从磁盘载入的过程写入多进程,但创建线程和进程也会引入开销
    • 本次优化仅对的CV类模型训练速度提升比较明显,NLP模型读入数据负载较小,所以没有明显优化效果,NLP模型建议暂时不使用多进程模式,后续仍会继续优化
    • 表格统计数据保留小数点后3位,四舍五入
    • 多卡测试数据统计的是0卡的数据

1. 单卡测试数据(单位s)

模型 reader 单进程DataLoader 多进程DataLoader (相对reader)
mnist 12.000 11.432 9.448 (-21.3%)
resnet 94.407 83.520 63.586 (-32.6%)
se_resnext 180.853 134.58 128.874 (-28.7%)
transformer 93.559 93.331 93.000 (-0.5%)

2. 8卡测试数据(单位s)

模型 reader 单进程DataLoader 多进程DataLoader (相对reader)
mnist 6.367 7.272 5.971 (-6.2%)
resnet 57.158 55.931 51.899 (-9.2%)
se_resnext 67.845 62.330 54.590 (-19.5%)
transformer 22.807 24.197 23.726 (+4%)

注意

  • 检查当前GPU的环境变量,建议设置为8卡,CUDA_VISLBLE_DEVICES=0,1,2,3,4,5,6,7
  • 确认当前GPU没有被其他任务占用
  • 确认当前CPU也没有被比较重的任务占用

附录 - 原始测试结果

λ yq01-gpu-255-137-12-00 /work/models/dygraph {develop} python parse_dataloader_test_result.py
./dataloader_test_log/mnist - total train time: 12.000037908554077 s
./dataloader_test_log/resnet - total train time: 94.40757536888123 s
./dataloader_test_log/se_resnet - total train time: 180.8534700870514 s
./dataloader_test_log/transformer - total train time: 93.55937242507935 s
./dataloader_test_log/mnist_sp - total train time: 11.432420492172241 s
./dataloader_test_log/resnet_sp - total train time: 83.5201735496521 s
./dataloader_test_log/se_resnet_sp - total train time: 134.5808563232422 s
./dataloader_test_log/transformer_sp - total train time: 93.33124494552612 s
./dataloader_test_log/mnist_mp - total train time: 9.447554111480713 s
./dataloader_test_log/resnet_mp - total train time: 63.58643341064453 s
./dataloader_test_log/se_resnet_mp - total train time: 128.87386989593506 s
./dataloader_test_log/transformer_mp - total train time: 93.00028419494629 s
./dataloader_test_log/mnist_8/workerlog.0 - total train time: 6.367103576660156 s
./dataloader_test_log/mnist_8/workerlog.1 - total train time: 6.469170331954956 s
./dataloader_test_log/mnist_8/workerlog.2 - total train time: 6.326692581176758 s
./dataloader_test_log/mnist_8/workerlog.3 - total train time: 6.332724571228027 s
./dataloader_test_log/mnist_8/workerlog.4 - total train time: 6.324522018432617 s
./dataloader_test_log/mnist_8/workerlog.5 - total train time: 6.312472343444824 s
./dataloader_test_log/mnist_8/workerlog.6 - total train time: 6.3679540157318115 s
./dataloader_test_log/mnist_8/workerlog.7 - total train time: 6.320514440536499 s
./dataloader_test_log/resnet_8/workerlog.0 - total train time: 57.158374071121216 s
./dataloader_test_log/resnet_8/workerlog.1 - total train time: 57.0883584022522 s
./dataloader_test_log/resnet_8/workerlog.2 - total train time: 57.085835218429565 s
./dataloader_test_log/resnet_8/workerlog.3 - total train time: 57.082090854644775 s
./dataloader_test_log/resnet_8/workerlog.4 - total train time: 57.22903871536255 s
./dataloader_test_log/resnet_8/workerlog.5 - total train time: 57.0818076133728 s
./dataloader_test_log/resnet_8/workerlog.6 - total train time: 56.99318337440491 s
./dataloader_test_log/resnet_8/workerlog.7 - total train time: 57.07746934890747 s
./dataloader_test_log/se_resnet_8/workerlog.0 - total train time: 67.84486436843872 s
./dataloader_test_log/se_resnet_8/workerlog.1 - total train time: 67.85854125022888 s
./dataloader_test_log/se_resnet_8/workerlog.2 - total train time: 67.86838150024414 s
./dataloader_test_log/se_resnet_8/workerlog.3 - total train time: 67.84253692626953 s
./dataloader_test_log/se_resnet_8/workerlog.4 - total train time: 67.87079882621765 s
./dataloader_test_log/se_resnet_8/workerlog.5 - total train time: 67.87530589103699 s
./dataloader_test_log/se_resnet_8/workerlog.6 - total train time: 67.89710235595703 s
./dataloader_test_log/se_resnet_8/workerlog.7 - total train time: 67.49056339263916 s
./dataloader_test_log/transformer_8/workerlog.0 - total train time: 22.80716848373413 s
./dataloader_test_log/transformer_8/workerlog.1 - total train time: 22.8052077293396 s
./dataloader_test_log/transformer_8/workerlog.2 - total train time: 22.810211420059204 s
./dataloader_test_log/transformer_8/workerlog.3 - total train time: 23.049880027770996 s
./dataloader_test_log/transformer_8/workerlog.4 - total train time: 22.928219318389893 s
./dataloader_test_log/transformer_8/workerlog.5 - total train time: 22.99123525619507 s
./dataloader_test_log/transformer_8/workerlog.6 - total train time: 22.811274528503418 s
./dataloader_test_log/transformer_8/workerlog.7 - total train time: 22.81301498413086 s
./dataloader_test_log/mnist_8_sp/workerlog.0 - total train time: 7.271549224853516 s
./dataloader_test_log/mnist_8_sp/workerlog.1 - total train time: 7.24362587928772 s
./dataloader_test_log/mnist_8_sp/workerlog.2 - total train time: 7.29660964012146 s
./dataloader_test_log/mnist_8_sp/workerlog.3 - total train time: 7.2771148681640625 s
./dataloader_test_log/mnist_8_sp/workerlog.4 - total train time: 7.337551116943359 s
./dataloader_test_log/mnist_8_sp/workerlog.5 - total train time: 7.274338245391846 s
./dataloader_test_log/mnist_8_sp/workerlog.6 - total train time: 7.2760045528411865 s
./dataloader_test_log/mnist_8_sp/workerlog.7 - total train time: 7.299149990081787 s
./dataloader_test_log/resnet_8_sp/workerlog.0 - total train time: 55.93130087852478 s
./dataloader_test_log/resnet_8_sp/workerlog.1 - total train time: 56.52779817581177 s
./dataloader_test_log/resnet_8_sp/workerlog.2 - total train time: 56.52779531478882 s
./dataloader_test_log/resnet_8_sp/workerlog.3 - total train time: 56.63466262817383 s
./dataloader_test_log/resnet_8_sp/workerlog.4 - total train time: 56.64422035217285 s
./dataloader_test_log/resnet_8_sp/workerlog.5 - total train time: 56.527549743652344 s
./dataloader_test_log/resnet_8_sp/workerlog.6 - total train time: 56.52781629562378 s
./dataloader_test_log/resnet_8_sp/workerlog.7 - total train time: 56.52778148651123 s
./dataloader_test_log/se_resnet_8_sp/workerlog.0 - total train time: 62.33003783226013 s
./dataloader_test_log/se_resnet_8_sp/workerlog.1 - total train time: 62.401503801345825 s
./dataloader_test_log/se_resnet_8_sp/workerlog.2 - total train time: 62.092822790145874 s
./dataloader_test_log/se_resnet_8_sp/workerlog.3 - total train time: 62.20855903625488 s
./dataloader_test_log/se_resnet_8_sp/workerlog.4 - total train time: 62.29902935028076 s
./dataloader_test_log/se_resnet_8_sp/workerlog.5 - total train time: 62.26711320877075 s
./dataloader_test_log/se_resnet_8_sp/workerlog.6 - total train time: 62.40175724029541 s
./dataloader_test_log/se_resnet_8_sp/workerlog.7 - total train time: 62.07155179977417 s
./dataloader_test_log/transformer_8_sp/workerlog.0 - total train time: 24.19667410850525 s
./dataloader_test_log/transformer_8_sp/workerlog.1 - total train time: 24.166663885116577 s
./dataloader_test_log/transformer_8_sp/workerlog.2 - total train time: 24.18856978416443 s
./dataloader_test_log/transformer_8_sp/workerlog.3 - total train time: 24.231191873550415 s
./dataloader_test_log/transformer_8_sp/workerlog.4 - total train time: 24.256746292114258 s
./dataloader_test_log/transformer_8_sp/workerlog.5 - total train time: 24.22942304611206 s
./dataloader_test_log/transformer_8_sp/workerlog.6 - total train time: 24.18540906906128 s
./dataloader_test_log/transformer_8_sp/workerlog.7 - total train time: 24.51620316505432 s
./dataloader_test_log/mnist_8_mp/workerlog.0 - total train time: 5.971302270889282 s
./dataloader_test_log/mnist_8_mp/workerlog.1 - total train time: 6.012018203735352 s
./dataloader_test_log/mnist_8_mp/workerlog.2 - total train time: 5.971257209777832 s
./dataloader_test_log/mnist_8_mp/workerlog.3 - total train time: 6.194690227508545 s
./dataloader_test_log/mnist_8_mp/workerlog.4 - total train time: 5.955884218215942 s
./dataloader_test_log/mnist_8_mp/workerlog.5 - total train time: 6.020132303237915 s
./dataloader_test_log/mnist_8_mp/workerlog.6 - total train time: 5.928117513656616 s
./dataloader_test_log/mnist_8_mp/workerlog.7 - total train time: 5.965324878692627 s
./dataloader_test_log/resnet_8_mp/workerlog.0 - total train time: 51.89928698539734 s
./dataloader_test_log/resnet_8_mp/workerlog.1 - total train time: 51.88896584510803 s
./dataloader_test_log/resnet_8_mp/workerlog.2 - total train time: 51.88880944252014 s
./dataloader_test_log/resnet_8_mp/workerlog.3 - total train time: 52.01701879501343 s
./dataloader_test_log/resnet_8_mp/workerlog.4 - total train time: 51.87888717651367 s
./dataloader_test_log/resnet_8_mp/workerlog.5 - total train time: 51.88829278945923 s
./dataloader_test_log/resnet_8_mp/workerlog.6 - total train time: 51.988335609436035 s
./dataloader_test_log/resnet_8_mp/workerlog.7 - total train time: 51.889004707336426 s
./dataloader_test_log/se_resnet_8_mp/workerlog.0 - total train time: 54.58965301513672 s
./dataloader_test_log/se_resnet_8_mp/workerlog.1 - total train time: 54.455613136291504 s
./dataloader_test_log/se_resnet_8_mp/workerlog.2 - total train time: 54.44919466972351 s
./dataloader_test_log/se_resnet_8_mp/workerlog.3 - total train time: 54.455557107925415 s
./dataloader_test_log/se_resnet_8_mp/workerlog.4 - total train time: 54.609676122665405 s
./dataloader_test_log/se_resnet_8_mp/workerlog.5 - total train time: 54.4555549621582 s
./dataloader_test_log/se_resnet_8_mp/workerlog.6 - total train time: 54.413665771484375 s
./dataloader_test_log/se_resnet_8_mp/workerlog.7 - total train time: 54.44967770576477 s
./dataloader_test_log/transformer_8_mp/workerlog.0 - total train time: 23.726247310638428 s
./dataloader_test_log/transformer_8_mp/workerlog.1 - total train time: 24.126054048538208 s
./dataloader_test_log/transformer_8_mp/workerlog.2 - total train time: 23.720133304595947 s
./dataloader_test_log/transformer_8_mp/workerlog.3 - total train time: 23.724708795547485 s
./dataloader_test_log/transformer_8_mp/workerlog.4 - total train time: 23.7517249584198 s
./dataloader_test_log/transformer_8_mp/workerlog.5 - total train time: 23.745847463607788 s
./dataloader_test_log/transformer_8_mp/workerlog.6 - total train time: 23.73387098312378 s
./dataloader_test_log/transformer_8_mp/workerlog.7 - total train time: 23.724374055862427 s
指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/models!4207
Source branch: github/fork/chenwhql/dygraph/dataloader_test
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7