There is bug in backward transpiler when using parallel_op.
Created by: qingqing01
When I use the parallel_op
in MobileNet-SSD https://github.com/PaddlePaddle/models/pull/746/files. The main code is as follows:
image_shape = [3, 300, 300]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
gt_box = fluid.layers.data(
name='gt_box', shape=[4], dtype='float32', lod_level=1)
gt_label = fluid.layers.data(
name='gt_label', shape=[1], dtype='int32', lod_level=1)
places = fluid.layers.get_places()
pd = fluid.layers.ParallelDo(places)
with pd.do():
image_ = pd.read_input(image)
gt_box_ = pd.read_input(gt_box)
gt_label_ = pd.read_input(gt_label)
difficult_ = pd.read_input(difficult)
locs, confs, box, box_var = mobile_net(image_, image_shape)
loss = fluid.layers.ssd_loss(locs, confs, gt_box_, gt_label_,
box, box_var)
pd.write_output(loss)
pd.write_output(locs)
pd.write_output(confs)
pd.write_output(box)
pd.write_output(box_var)
loss, locs, confs, box, box_var = pd()
loss = fluid.layers.reduce_sum(loss)
There is an error:
paddle.fluid.core.EnforceNotMet: grad_op_maker_ should not be null
Operator GradOpMaker has not been registered. at [/home/dangqingqing/github/myfork/Paddle/paddle/fluid/framework/op_info.h:61]
This error is caused since one backward op was not registered. When I debug and add print(op) before https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/backward.py#L326 . Find this op is target_assign_op
.
# omitting some code
type: "target_assign"
attrs {
name: "mismatch_value"
type: INT
i: 0
}
Traceback (most recent call last):
File "train.py", line 146, in <module>
num_passes=300)
File "train.py", line 89, in train
optimizer.minimize(loss)
File "/home/dangqingqing/github/myfork/pyenv/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 226, in minimize
[error_clip_callback])
File "/home/dangqingqing/github/myfork/pyenv/lib/python2.7/site-packages/paddle/fluid/backward.py", line 485, in append_backward
grad_to_var, callbacks)
File "/home/dangqingqing/github/myfork/pyenv/lib/python2.7/site-packages/paddle/fluid/backward.py", line 332, in _append_backward_ops_
no_grad_dict, grad_to_var, callbacks)
File "/home/dangqingqing/github/myfork/pyenv/lib/python2.7/site-packages/paddle/fluid/backward.py", line 340, in _append_backward_ops_
op.desc, no_grad_dict[block.idx], grad_sub_block_list)
paddle.fluid.core.EnforceNotMet: grad_op_maker_ should not be null
Operator GradOpMaker has not been registered. at [/home/dangqingqing/github/myfork/Paddle/paddle/fluid/framework/op_info.h:61]
PaddlePaddle Call Stacks:
0 0x7f843bed1b86p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
But in fact, we have set flat stop_gradient = True
for the outputs of target_assing_ops
.
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/layers/detection.py#L538
And there is no problem without parallel_do op. This bug is caused when appending backward ops for the sub-block of parallel_do op.