在DynamicRNN和while loop block中的有关乘法op的bug
Created by: zhengkangjie
PaddlePaddle版本:1.5.2 系统:MacOS 10.14.4 Python版本:Python 3.7.3
在DynamicRNN和while loop block中的乘法op在backward过程会出现bug,会出现关于tensor dim的报错信息,代码如下: DynamicRNN版本:
import paddle.fluid as fluid
data_dim = 10
rnn_input = fluid.layers.data(name = 'rnn_input', shape = [data_dim], dtype='float32')
rnn_mem = fluid.layers.data(name = 'rnn_mem', shape = [data_dim], dtype='float32')
label = fluid.layers.data(name = 'label', shape = [1], dtype='int64')
rnn = fluid.layers.DynamicRNN()
with rnn.block():
x = rnn.step_input(rnn_input)
pre_state = rnn.memory(init=rnn_mem, need_reorder=True)
state = pre_state * pre_state
rnn_out = fluid.layers.fc([x, pre_state], size=data_dim)
rnn.update_memory(pre_state, state)
rnn.output(rnn_out)
rnn_out = rnn()
cost = fluid.layers.cross_entropy(input=rnn_out, label=label)
avg_cost = fluid.layers.reduce_mean(cost)
optimizer = fluid.optimizer.SGDOptimizer(learning_rate=1e-5)
optimizer.minimize(avg_cost)
print("model build successful!")
运行此代码,在backward过程中,会在代码state = pre_state * pre_state
处报以下错误:
Enforce failed. Expected x_dims.size() >= y_dims.size(), but received x_dims.size():1 < y_dims.size():2.
Rank of first input must >= rank of second input.
如果把上述代码中state = pre_state * pre_state
替换为state = pre_state + pre_state
则模型可以成功build。
或者,我们也可以把rnn_out = fluid.layers.fc([x, pre_state], size=data_dim)
替换为rnn_out = fluid.layers.fc([x, state], size=data_dim)
,这样模型也可以正常build。
类似的BUG,我们也可以在while loop的block中复现,代码如下:
import paddle.fluid as fluid
data_dim = 10
rnn_mem = fluid.layers.data(name = 'rnn_mem', shape = [data_dim], dtype='float32')
label = fluid.layers.data(name = 'label', shape = [1], dtype='int64')
state_array = fluid.layers.create_array(dtype='float32')
rnn_out_array = fluid.layers.create_array(dtype='float32')
step_idx = fluid.layers.fill_constant(shape=[1], dtype='int64', value=0, force_cpu=True)
step_idx.stop_gradient = False
max_seq_len = fluid.layers.fill_constant(shape=[1], dtype='int64', value=9, force_cpu=True)
max_seq_len.stop_gradient = False
cond = fluid.layers.less_than(x=step_idx,y=max_seq_len,force_cpu=True)
cond.stop_gradient = False
fluid.layers.array_write(x=rnn_mem, i=step_idx, array=state_array)
fluid.layers.array_write(x=rnn_mem, i=step_idx, array=rnn_out_array)
while_op = fluid.layers.While(cond)
with while_op.block():
pre_state = fluid.layers.array_read(state_array, step_idx)
fluid.layers.increment(x=step_idx, value=1.0, in_place=True)
state = pre_state * pre_state
rnn_out = fluid.layers.fc(pre_state, size=data_dim, bias_attr=False)
fluid.layers.array_write(x=state, i=step_idx, array=state_array)
fluid.layers.array_write(x=rnn_out, i=step_idx, array=rnn_out_array)
fluid.layers.less_than(x=step_idx,y=max_seq_len,force_cpu=True,cond=cond)
rnn_out = fluid.layers.array_read(rnn_out_array, step_idx)
cost = fluid.layers.cross_entropy(input=rnn_out, label=label)
avg_cost = fluid.layers.reduce_mean(cost)
optimizer = fluid.optimizer.SGDOptimizer(learning_rate=1e-5)
optimizer.minimize(avg_cost)
print("model build successful!")
和上面的RNN代码类似,上述代码在backward过程中,也会在代码state = pre_state * pre_state
处报同样的错误。
类似的,如果我们把state = pre_state * pre_state
替换为state = pre_state + pre_state
或者把rnn_out = fluid.layers.fc(pre_state, size=data_dim, bias_attr=False)
替换为rnn_out = fluid.layers.fc(state, size=data_dim, bias_attr=False)
,则模型可以正常build。
经过分析,我们认为出现该BUG的原因可能是OP设计有缺陷,导致在反向传播计算梯度的过程中对于tensor dim的计算出现问题进而导致该错误。