There is bug in backward transpiler when using parallel_op. (#9281) · Issue · PaddlePaddle / Paddle

There is bug in backward transpiler when using parallel_op.

Created by: qingqing01

When I use the parallel_op in MobileNet-SSD https://github.com/PaddlePaddle/models/pull/746/files. The main code is as follows:

    image_shape = [3, 300, 300]
    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
    gt_box = fluid.layers.data(
        name='gt_box', shape=[4], dtype='float32', lod_level=1)
    gt_label = fluid.layers.data(
        name='gt_label', shape=[1], dtype='int32', lod_level=1)
    places = fluid.layers.get_places()
    pd = fluid.layers.ParallelDo(places)
    with pd.do():
        image_ = pd.read_input(image)
        gt_box_ = pd.read_input(gt_box)
        gt_label_ = pd.read_input(gt_label)
        difficult_ = pd.read_input(difficult)
        locs, confs, box, box_var = mobile_net(image_, image_shape)
        loss = fluid.layers.ssd_loss(locs, confs, gt_box_, gt_label_,
                                         box, box_var)
        pd.write_output(loss)
        pd.write_output(locs)
        pd.write_output(confs)
        pd.write_output(box)
        pd.write_output(box_var)
    loss, locs, confs, box, box_var = pd()
    loss = fluid.layers.reduce_sum(loss)

There is an error:

paddle.fluid.core.EnforceNotMet: grad_op_maker_ should not be null
Operator GradOpMaker has not been registered. at [/home/dangqingqing/github/myfork/Paddle/paddle/fluid/framework/op_info.h:61]

This error is caused since one backward op was not registered. When I debug and add print(op) before https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/backward.py#L326 . Find this op is target_assign_op.

# omitting some code
type: "target_assign"
attrs {
  name: "mismatch_value"
  type: INT
  i: 0
}

Traceback (most recent call last):
  File "train.py", line 146, in <module>
    num_passes=300)
  File "train.py", line 89, in train
    optimizer.minimize(loss)
  File "/home/dangqingqing/github/myfork/pyenv/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 226, in minimize
    [error_clip_callback])
  File "/home/dangqingqing/github/myfork/pyenv/lib/python2.7/site-packages/paddle/fluid/backward.py", line 485, in append_backward
    grad_to_var, callbacks)
  File "/home/dangqingqing/github/myfork/pyenv/lib/python2.7/site-packages/paddle/fluid/backward.py", line 332, in _append_backward_ops_
    no_grad_dict, grad_to_var, callbacks)
  File "/home/dangqingqing/github/myfork/pyenv/lib/python2.7/site-packages/paddle/fluid/backward.py", line 340, in _append_backward_ops_
    op.desc, no_grad_dict[block.idx], grad_sub_block_list)
paddle.fluid.core.EnforceNotMet: grad_op_maker_ should not be null
Operator GradOpMaker has not been registered. at [/home/dangqingqing/github/myfork/Paddle/paddle/fluid/framework/op_info.h:61]
PaddlePaddle Call Stacks:
0       0x7f843bed1b86p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486

But in fact, we have set flat stop_gradient = True for the outputs of target_assing_ops. https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/layers/detection.py#L538

And there is no problem without parallel_do op. This bug is caused when appending backward ops for the sub-block of parallel_do op.

PaddlePaddle / Paddle 1 年多 前同步成功

There is bug in backward transpiler when using parallel_op.

PaddlePaddle / Paddle
1 年多前同步成功