1.5和更高版本模型GPU报错 (#19628) · Issue · PaddlePaddle / Paddle

1.5和更高版本模型GPU报错

Created by: kahitomi

模型是一个带copy的lstm生成模型模型在1.3，1.4版本下GPU和CPU都没有问题。在使用paddle 1.5和lastest版本时候，GPU版本报错，CPU版本正常。GPU报错如下~：

提示中没有错误的具体位置~ 根据提示，寻找sum相关的OP，未果~ 最后定位的代码，在DynamicRNN的block()中，如下

......
......
with drnn.block():
......
......
    current_h_expand_seq = pd.reshape(
                current_h, [-1, 1, self.decoder_size])
    current_h_expand_seq = pd.expand(
                current_h_expand_seq, [1, self.max_length, 1])

    copy_score_sub = pd.elementwise_mul(
                copy_score_weight, current_h_expand_seq, axis=0)
    copy_score = pd.reduce_sum(copy_score_sub, dim=2)
......
......

去掉这段代码就不报错了其中self.decoder_size是解码的lstm size，self.max_length是生成的最大步长其中current_h是本时间步的LSTM输出，shape为[batch_size, self.decoder_size] 其中copy_score_weight来自encoder端的所有单词，shape为[batch_size, self.max_length, self.decoder_size]

同时实验去掉了reduce_sum的op，代码是

......
......
with drnn.block():
......
......
    current_h_expand_seq = pd.reshape(
                current_h, [-1, 1, self.decoder_size])
    current_h_expand_seq = pd.expand(
                current_h_expand_seq, [1, self.max_length, 1])

    copy_score_in = pd.concat([copy_score_weight, current_h_expand_seq], axis=2)
    copy_score_in = pd.reshape(
                copy_score_in, [-1, self.decoder_size * 2])
    copy_score = fluid.layers.fc(input=copy_score_in,
                    act='tanh',
                    size=1,
                    bias_attr=False,
                    param_attr=fluid.ParamAttr(name="copy_score_combine_weight_w"))
    copy_score = pd.reshape(
                copy_score, [-1, self.max_length])
......
......

结果报的错和开始是一致的，还是报sum to lod tensor~

完整报错如下：

Traceback (most recent call last):
File "train.py", line 528, in <module>
train()
File "train.py", line 467, in train
return_numpy=False)
File "/home/slurm/job/tmp/job-128277/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/executor.py", line 651, in run
use_program_cache=use_program_cache)
File "/home/slurm/job/tmp/job-128277/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/executor.py", line 749, in _run
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: holder_ should not be null
Tensor holds no memory. Call Tensor::mutable_data first. at [/paddle/paddle/fluid/framework/tensor.cc:23]
PaddlePaddle Call Stacks: 
0 0x7fc4e021aff8p void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 360
1 0x7fc4e021b347p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2 0x7fc4e21d8f09p paddle::framework::Tensor::check_memory_size() const + 185
3 0x7fc4e0221c59p float const* paddle::framework::Tensor::data<float>() const + 25
4 0x7fc4e06c38cbp void paddle::operators::SumToLoDTensor<float>(paddle::framework::ExecutionContext const&) + 763
5 0x7fc4e06cca38p std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, long>, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) + 248
6 0x7fc4e2183037p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 375
7 0x7fc4e2183411p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529
8 0x7fc4e2180a0cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
9 0x7fc4e03a746ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
10 0x7fc4e1c2139dp paddle::operators::WhileGradOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 1869
11 0x7fc4e2180a0cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
12 0x7fc4e03a746ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
13 0x7fc4e03aa50fp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool) + 143
14 0x7fc4e020bf8dp
15 0x7fc4e024d936p
16 0x7fc58a29fcc8p PyEval_EvalFrameEx + 28264
17 0x7fc58a2a235dp PyEval_EvalCodeEx + 2061
18 0x7fc58a29fd50p PyEval_EvalFrameEx + 28400
19 0x7fc58a2a235dp PyEval_EvalCodeEx + 2061
20 0x7fc58a29fd50p PyEval_EvalFrameEx + 28400
21 0x7fc58a2a235dp PyEval_EvalCodeEx + 2061
22 0x7fc58a29fd50p PyEval_EvalFrameEx + 28400
23 0x7fc58a2a235dp PyEval_EvalCodeEx + 2061
24 0x7fc58a2a2492p PyEval_EvalCode + 50
25 0x7fc58a2cc1a2p PyRun_FileExFlags + 146
26 0x7fc58a2cd539p PyRun_SimpleFileExFlags + 217
27 0x7fc58a2e31bdp Py_Main + 3149
28 0x7fc5894e0bd5p __libc_start_main + 245
29 0x4007a1p

PaddlePaddle / Paddle 大约 1 年 前同步成功

1.5和更高版本模型GPU报错

PaddlePaddle / Paddle
大约 1 年前同步成功