Created by: Xreki
Two things are done in this PR:
-
Set the default value of some Paddle flags, such as GC flags, which are tested in benchmark for more than a month.
-
The reading and preprocessing of data is slow in deeplabv3+, so that we use python multi-process to speedup this. Training logs are shown as follows:
-
develop, 1 GPU
Model is saved to /work/models/PaddleCV/deeplabv3+/output/model
step 0, loss: 2.931588, step_time_cost: 90.530
step 1, loss: 2.992406, step_time_cost: 0.155
step 2, loss: 2.889170, step_time_cost: 0.151
step 3, loss: 2.928668, step_time_cost: 0.149
step 4, loss: 2.867778, step_time_cost: 0.146
step 5, loss: 2.917643, step_time_cost: 0.146
step 6, loss: 2.862031, step_time_cost: 0.146
step 7, loss: 2.900028, step_time_cost: 0.146
step 8, loss: 2.788363, step_time_cost: 0.146
step 9, loss: 2.714418, step_time_cost: 0.146
.
.
.
step 80, loss: 1.859994, step_time_cost: 0.155
step 81, loss: 1.939524, step_time_cost: 0.175
step 82, loss: 1.974417, step_time_cost: 0.171
step 83, loss: 2.802828, step_time_cost: 0.173
step 84, loss: 2.613242, step_time_cost: 0.183
step 85, loss: 1.916699, step_time_cost: 0.166
step 86, loss: 2.218951, step_time_cost: 0.182
step 87, loss: 2.365781, step_time_cost: 0.181
step 88, loss: 1.862683, step_time_cost: 0.172
step 89, loss: 2.137310, step_time_cost: 0.179
step 90, loss: 1.812432, step_time_cost: 0.182
step 91, loss: 2.779344, step_time_cost: 0.176
step 92, loss: 1.489150, step_time_cost: 0.168
step 93, loss: 3.127990, step_time_cost: 0.173
step 94, loss: 1.663515, step_time_cost: 0.250
step 95, loss: 2.504054, step_time_cost: 0.245
step 96, loss: 2.406195, step_time_cost: 0.332
step 97, loss: 1.640830, step_time_cost: 0.301
step 98, loss: 2.027133, step_time_cost: 0.313
step 99, loss: 1.782999, step_time_cost: 0.303
- develop 4 GPUs:
Model is saved to /work/models/PaddleCV/deeplabv3+/output/model
step 0, loss: 2.983996, step_time_cost: 99.687
step 1, loss: 2.971399, step_time_cost: 0.306
step 2, loss: 2.944457, step_time_cost: 0.285
step 3, loss: 2.921980, step_time_cost: 0.314
step 4, loss: 2.887958, step_time_cost: 0.264
step 5, loss: 2.875481, step_time_cost: 0.297
step 6, loss: 2.846965, step_time_cost: 0.270
step 7, loss: 2.817877, step_time_cost: 0.268
step 8, loss: 2.750695, step_time_cost: 0.275
step 9, loss: 2.748191, step_time_cost: 0.227
.
.
.
step 80, loss: 1.955167, step_time_cost: 0.684
step 81, loss: 1.801220, step_time_cost: 0.730
step 82, loss: 1.815337, step_time_cost: 0.984
step 83, loss: 2.285317, step_time_cost: 0.716
step 84, loss: 1.681211, step_time_cost: 0.769
step 85, loss: 1.846096, step_time_cost: 0.703
step 86, loss: 1.768831, step_time_cost: 0.700
step 87, loss: 1.848565, step_time_cost: 0.700
step 88, loss: 2.082924, step_time_cost: 0.665
step 89, loss: 2.023702, step_time_cost: 0.710
step 90, loss: 1.891955, step_time_cost: 0.684
step 91, loss: 2.388388, step_time_cost: 0.675
step 92, loss: 1.955183, step_time_cost: 0.661
step 93, loss: 1.663321, step_time_cost: 0.688
step 94, loss: 1.549457, step_time_cost: 0.677
step 95, loss: 1.801949, step_time_cost: 0.690
step 96, loss: 2.006641, step_time_cost: 0.713
step 97, loss: 2.062234, step_time_cost: 0.679
step 98, loss: 1.865428, step_time_cost: 0.806
step 99, loss: 1.550068, step_time_cost: 0.735
- develop 8 GPUs, I met a crashed:
You can install 'prefetch_generator' for acceleration of data reading.W0619 07:44:57.616894 2866 parallel_executor.cc:328] The number of CUDAPlace, which is used in ParallelExecutor, is 8. And the Program will be copied 8 copies
^[[37m--- detected 63 subgraphs^[[0m
I0619 07:46:31.033058 2866 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
/work/models/PaddleCV/deeplabv3+/reader.py:39: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
a = a[slices]
Traceback (most recent call last):
File "train.py", line 218, in <module>
train_loss, = exe.run(binary, fetch_list=[loss_mean])
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 665, in run
return_numpy=return_numpy)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 527, in _run_parallel
exe.run(fetch_var_names, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator conv2d error.
Python Callstacks:
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 1699, in append_op
attrs=kwargs.get("attrs", None))
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/layers/nn.py", line 2172, in conv2d
'fuse_relu_before_depthwise_conv': False
File "/work/models/PaddleCV/deeplabv3+/models.py", line 87, in conv
return append_op_result(fluid.layers.conv2d(*args, **kargs), 'conv')
File "/work/models/PaddleCV/deeplabv3+/models.py", line 223, in entry_flow
data = conv(data, 32, 3, stride=2, padding=1)
File "/work/models/PaddleCV/deeplabv3+/models.py", line 320, in deeplabv3p
data, decode_shortcut = entry_flow(img)
File "train.py", line 139, in <module>
logit = deeplabv3p(img)
C++ Callstacks:
CUDNN_STATUS_INTERNAL_ERROR at [/paddle/paddle/fluid/platform/device_context.cc:217]
PaddlePaddle Call Stacks:
0 0x7fb4f3b54b90p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352
1 0x7fb4f3b54f09p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2 0x7fb4f593bc94p paddle::platform::CudnnHolder::CudnnHolder(CUstream_st* const*, paddle::platform::CUDAPlace const&) + 996
- This PR, 1 GPU:
Model is saved to /work/models/PaddleCV/deeplabv3+/output/model
step 0, loss: 2.912637, step_time_cost: 84.644 s
step 1, loss: 2.953294, step_time_cost: 0.152 s
step 2, loss: 2.960867, step_time_cost: 0.145 s
step 3, loss: 2.903931, step_time_cost: 0.145 s
step 4, loss: 2.878676, step_time_cost: 0.145 s
step 5, loss: 2.867251, step_time_cost: 0.145 s
step 6, loss: 2.771031, step_time_cost: 0.145 s
step 7, loss: 2.716584, step_time_cost: 0.146 s
step 8, loss: 2.858055, step_time_cost: 0.151 s
step 9, loss: 2.667201, step_time_cost: 0.147 s
.
.
.
step 80, loss: 2.292397, step_time_cost: 0.146 s
step 81, loss: 1.690406, step_time_cost: 0.146 s
step 82, loss: 2.052620, step_time_cost: 0.146 s
step 83, loss: 2.411162, step_time_cost: 0.147 s
step 84, loss: 2.740360, step_time_cost: 0.146 s
step 85, loss: 1.674617, step_time_cost: 0.146 s
step 86, loss: 1.910911, step_time_cost: 0.147 s
step 87, loss: 1.513423, step_time_cost: 0.146 s
step 88, loss: 1.196872, step_time_cost: 0.147 s
step 89, loss: 1.315766, step_time_cost: 0.147 s
step 90, loss: 1.827139, step_time_cost: 0.146 s
step 91, loss: 1.918458, step_time_cost: 0.146 s
step 92, loss: 2.161615, step_time_cost: 0.146 s
step 93, loss: 2.564105, step_time_cost: 0.146 s
step 94, loss: 1.961227, step_time_cost: 0.145 s
step 95, loss: 2.553188, step_time_cost: 0.146 s
step 96, loss: 1.538951, step_time_cost: 0.146 s
step 97, loss: 1.376045, step_time_cost: 0.147 s
step 98, loss: 1.565625, step_time_cost: 0.146 s
step 99, loss: 2.470505, step_time_cost: 0.147 s
- This PR, 8 GPUs:
Model is saved to /work/models/PaddleCV/deeplabv3+/output/model
step 0, loss: 2.956059, step_time_cost: 102.699 s
step 1, loss: 2.947663, step_time_cost: 0.298 s
step 2, loss: 2.929053, step_time_cost: 0.275 s
step 3, loss: 2.918531, step_time_cost: 0.274 s
step 4, loss: 2.901184, step_time_cost: 0.273 s
step 5, loss: 2.887325, step_time_cost: 0.268 s
step 6, loss: 2.835522, step_time_cost: 0.275 s
step 7, loss: 2.834700, step_time_cost: 0.269 s
step 8, loss: 2.760000, step_time_cost: 0.263 s
step 9, loss: 2.746251, step_time_cost: 0.294 s
.
.
.
step 80, loss: 2.015818, step_time_cost: 0.256 s
step 81, loss: 1.977413, step_time_cost: 0.252 s
step 82, loss: 1.945783, step_time_cost: 0.276 s
step 83, loss: 1.794533, step_time_cost: 0.266 s
step 84, loss: 1.996547, step_time_cost: 0.268 s
step 85, loss: 1.904252, step_time_cost: 0.266 s
step 86, loss: 2.275229, step_time_cost: 0.282 s
step 87, loss: 1.786942, step_time_cost: 0.266 s
step 88, loss: 2.130292, step_time_cost: 0.262 s
step 89, loss: 1.796950, step_time_cost: 0.291 s
step 90, loss: 1.877793, step_time_cost: 0.304 s
step 91, loss: 2.187347, step_time_cost: 0.305 s
step 92, loss: 1.790633, step_time_cost: 0.287 s
step 93, loss: 2.159842, step_time_cost: 0.270 s
step 94, loss: 1.959462, step_time_cost: 0.276 s
step 95, loss: 1.524466, step_time_cost: 0.294 s
step 96, loss: 1.591714, step_time_cost: 0.310 s
step 97, loss: 2.138076, step_time_cost: 0.298 s
step 98, loss: 2.048481, step_time_cost: 0.317 s
step 99, loss: 1.692513, step_time_cost: 0.321 s