Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • models
  • 合并请求
  • !2447

M
models
  • 项目概览

PaddlePaddle / models
大约 2 年 前同步成功

通知 232
Star 6828
Fork 2962
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
M
models
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 602
    • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
    • 合并请求 255
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

Add the use python multi-processes to read data in deeplabv3+ !2447

  • Report abuse
!2447 已合并 6月 19, 2019 由 saxon_zh@saxon_zh 创建2 of 2 tasks completed2/2 tasks
#<User:0x00007fedf99104a8>
  • 概览 0
  • 提交 2
  • 变更 3

Created by: Xreki

Two things are done in this PR:

  • Set the default value of some Paddle flags, such as GC flags, which are tested in benchmark for more than a month.

  • The reading and preprocessing of data is slow in deeplabv3+, so that we use python multi-process to speedup this. Training logs are shown as follows:

  • develop, 1 GPU

Model is saved to /work/models/PaddleCV/deeplabv3+/output/model
step 0, loss: 2.931588, step_time_cost: 90.530
step 1, loss: 2.992406, step_time_cost: 0.155
step 2, loss: 2.889170, step_time_cost: 0.151
step 3, loss: 2.928668, step_time_cost: 0.149
step 4, loss: 2.867778, step_time_cost: 0.146
step 5, loss: 2.917643, step_time_cost: 0.146
step 6, loss: 2.862031, step_time_cost: 0.146
step 7, loss: 2.900028, step_time_cost: 0.146
step 8, loss: 2.788363, step_time_cost: 0.146
step 9, loss: 2.714418, step_time_cost: 0.146
.
.
.
step 80, loss: 1.859994, step_time_cost: 0.155
step 81, loss: 1.939524, step_time_cost: 0.175
step 82, loss: 1.974417, step_time_cost: 0.171
step 83, loss: 2.802828, step_time_cost: 0.173
step 84, loss: 2.613242, step_time_cost: 0.183
step 85, loss: 1.916699, step_time_cost: 0.166
step 86, loss: 2.218951, step_time_cost: 0.182
step 87, loss: 2.365781, step_time_cost: 0.181
step 88, loss: 1.862683, step_time_cost: 0.172
step 89, loss: 2.137310, step_time_cost: 0.179
step 90, loss: 1.812432, step_time_cost: 0.182
step 91, loss: 2.779344, step_time_cost: 0.176
step 92, loss: 1.489150, step_time_cost: 0.168                                                                                                                                               
step 93, loss: 3.127990, step_time_cost: 0.173
step 94, loss: 1.663515, step_time_cost: 0.250
step 95, loss: 2.504054, step_time_cost: 0.245
step 96, loss: 2.406195, step_time_cost: 0.332
step 97, loss: 1.640830, step_time_cost: 0.301
step 98, loss: 2.027133, step_time_cost: 0.313
step 99, loss: 1.782999, step_time_cost: 0.303
  • develop 4 GPUs:
Model is saved to /work/models/PaddleCV/deeplabv3+/output/model
step 0, loss: 2.983996, step_time_cost: 99.687
step 1, loss: 2.971399, step_time_cost: 0.306
step 2, loss: 2.944457, step_time_cost: 0.285
step 3, loss: 2.921980, step_time_cost: 0.314
step 4, loss: 2.887958, step_time_cost: 0.264
step 5, loss: 2.875481, step_time_cost: 0.297
step 6, loss: 2.846965, step_time_cost: 0.270
step 7, loss: 2.817877, step_time_cost: 0.268
step 8, loss: 2.750695, step_time_cost: 0.275
step 9, loss: 2.748191, step_time_cost: 0.227
.
.
.
step 80, loss: 1.955167, step_time_cost: 0.684
step 81, loss: 1.801220, step_time_cost: 0.730
step 82, loss: 1.815337, step_time_cost: 0.984
step 83, loss: 2.285317, step_time_cost: 0.716
step 84, loss: 1.681211, step_time_cost: 0.769
step 85, loss: 1.846096, step_time_cost: 0.703
step 86, loss: 1.768831, step_time_cost: 0.700
step 87, loss: 1.848565, step_time_cost: 0.700
step 88, loss: 2.082924, step_time_cost: 0.665
step 89, loss: 2.023702, step_time_cost: 0.710
step 90, loss: 1.891955, step_time_cost: 0.684
step 91, loss: 2.388388, step_time_cost: 0.675
step 92, loss: 1.955183, step_time_cost: 0.661
step 93, loss: 1.663321, step_time_cost: 0.688
step 94, loss: 1.549457, step_time_cost: 0.677
step 95, loss: 1.801949, step_time_cost: 0.690
step 96, loss: 2.006641, step_time_cost: 0.713
step 97, loss: 2.062234, step_time_cost: 0.679
step 98, loss: 1.865428, step_time_cost: 0.806
step 99, loss: 1.550068, step_time_cost: 0.735
  • develop 8 GPUs, I met a crashed:
You can install 'prefetch_generator' for acceleration of data reading.W0619 07:44:57.616894  2866 parallel_executor.cc:328] The number of CUDAPlace, which is used in ParallelExecutor, is 8. And the Program will be copied 8 copies
^[[37m---  detected 63 subgraphs^[[0m                                                                                                                                                        
I0619 07:46:31.033058  2866 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1

/work/models/PaddleCV/deeplabv3+/reader.py:39: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  a = a[slices]
Traceback (most recent call last):
  File "train.py", line 218, in <module>
    train_loss, = exe.run(binary, fetch_list=[loss_mean])
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 665, in run
    return_numpy=return_numpy)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 527, in _run_parallel
    exe.run(fetch_var_names, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator conv2d error.
Python Callstacks: 
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1699, in append_op
    attrs=kwargs.get("attrs", None))
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layers/nn.py", line 2172, in conv2d
    'fuse_relu_before_depthwise_conv': False
  File "/work/models/PaddleCV/deeplabv3+/models.py", line 87, in conv
    return append_op_result(fluid.layers.conv2d(*args, **kargs), 'conv')
  File "/work/models/PaddleCV/deeplabv3+/models.py", line 223, in entry_flow
    data = conv(data, 32, 3, stride=2, padding=1)
  File "/work/models/PaddleCV/deeplabv3+/models.py", line 320, in deeplabv3p
    data, decode_shortcut = entry_flow(img)
  File "train.py", line 139, in <module>
    logit = deeplabv3p(img)
C++ Callstacks: 
CUDNN_STATUS_INTERNAL_ERROR at [/paddle/paddle/fluid/platform/device_context.cc:217]
PaddlePaddle Call Stacks: 
0       0x7fb4f3b54b90p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352
1       0x7fb4f3b54f09p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2       0x7fb4f593bc94p paddle::platform::CudnnHolder::CudnnHolder(CUstream_st* const*, paddle::platform::CUDAPlace const&) + 996
  • This PR, 1 GPU:
Model is saved to /work/models/PaddleCV/deeplabv3+/output/model                                                                                                                              
step 0, loss: 2.912637, step_time_cost: 84.644 s
step 1, loss: 2.953294, step_time_cost: 0.152 s
step 2, loss: 2.960867, step_time_cost: 0.145 s
step 3, loss: 2.903931, step_time_cost: 0.145 s
step 4, loss: 2.878676, step_time_cost: 0.145 s
step 5, loss: 2.867251, step_time_cost: 0.145 s
step 6, loss: 2.771031, step_time_cost: 0.145 s
step 7, loss: 2.716584, step_time_cost: 0.146 s
step 8, loss: 2.858055, step_time_cost: 0.151 s
step 9, loss: 2.667201, step_time_cost: 0.147 s
.
.
.
step 80, loss: 2.292397, step_time_cost: 0.146 s
step 81, loss: 1.690406, step_time_cost: 0.146 s
step 82, loss: 2.052620, step_time_cost: 0.146 s
step 83, loss: 2.411162, step_time_cost: 0.147 s
step 84, loss: 2.740360, step_time_cost: 0.146 s
step 85, loss: 1.674617, step_time_cost: 0.146 s
step 86, loss: 1.910911, step_time_cost: 0.147 s
step 87, loss: 1.513423, step_time_cost: 0.146 s
step 88, loss: 1.196872, step_time_cost: 0.147 s
step 89, loss: 1.315766, step_time_cost: 0.147 s
step 90, loss: 1.827139, step_time_cost: 0.146 s
step 91, loss: 1.918458, step_time_cost: 0.146 s
step 92, loss: 2.161615, step_time_cost: 0.146 s
step 93, loss: 2.564105, step_time_cost: 0.146 s
step 94, loss: 1.961227, step_time_cost: 0.145 s
step 95, loss: 2.553188, step_time_cost: 0.146 s
step 96, loss: 1.538951, step_time_cost: 0.146 s
step 97, loss: 1.376045, step_time_cost: 0.147 s
step 98, loss: 1.565625, step_time_cost: 0.146 s
step 99, loss: 2.470505, step_time_cost: 0.147 s
  • This PR, 8 GPUs:
Model is saved to /work/models/PaddleCV/deeplabv3+/output/model
step 0, loss: 2.956059, step_time_cost: 102.699 s
step 1, loss: 2.947663, step_time_cost: 0.298 s
step 2, loss: 2.929053, step_time_cost: 0.275 s
step 3, loss: 2.918531, step_time_cost: 0.274 s
step 4, loss: 2.901184, step_time_cost: 0.273 s
step 5, loss: 2.887325, step_time_cost: 0.268 s
step 6, loss: 2.835522, step_time_cost: 0.275 s
step 7, loss: 2.834700, step_time_cost: 0.269 s
step 8, loss: 2.760000, step_time_cost: 0.263 s
step 9, loss: 2.746251, step_time_cost: 0.294 s
.
.
.
step 80, loss: 2.015818, step_time_cost: 0.256 s
step 81, loss: 1.977413, step_time_cost: 0.252 s
step 82, loss: 1.945783, step_time_cost: 0.276 s
step 83, loss: 1.794533, step_time_cost: 0.266 s
step 84, loss: 1.996547, step_time_cost: 0.268 s
step 85, loss: 1.904252, step_time_cost: 0.266 s
step 86, loss: 2.275229, step_time_cost: 0.282 s
step 87, loss: 1.786942, step_time_cost: 0.266 s
step 88, loss: 2.130292, step_time_cost: 0.262 s
step 89, loss: 1.796950, step_time_cost: 0.291 s
step 90, loss: 1.877793, step_time_cost: 0.304 s
step 91, loss: 2.187347, step_time_cost: 0.305 s
step 92, loss: 1.790633, step_time_cost: 0.287 s
step 93, loss: 2.159842, step_time_cost: 0.270 s
step 94, loss: 1.959462, step_time_cost: 0.276 s
step 95, loss: 1.524466, step_time_cost: 0.294 s
step 96, loss: 1.591714, step_time_cost: 0.310 s
step 97, loss: 2.138076, step_time_cost: 0.298 s
step 98, loss: 2.048481, step_time_cost: 0.317 s
step 99, loss: 1.692513, step_time_cost: 0.321 s
指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/models!2447
Source branch: github/fork/Xreki/update_deeplab
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7