Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #19628

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 9月 04, 2019 by saxon_zh@saxon_zhGuest

1.5和更高版本模型GPU报错

Created by: kahitomi

模型是一个带copy的lstm生成模型 模型在1.3,1.4版本下GPU和CPU都没有问题。 在使用paddle 1.5和lastest版本时候,GPU版本报错,CPU版本正常。GPU报错如下~: image

提示中没有错误的具体位置~ 根据提示,寻找sum相关的OP,未果~ 最后定位的代码,在DynamicRNN的block()中,如下

......
......
with drnn.block():
......
......
    current_h_expand_seq = pd.reshape(
                current_h, [-1, 1, self.decoder_size])
    current_h_expand_seq = pd.expand(
                current_h_expand_seq, [1, self.max_length, 1])

    copy_score_sub = pd.elementwise_mul(
                copy_score_weight, current_h_expand_seq, axis=0)
    copy_score = pd.reduce_sum(copy_score_sub, dim=2)
......
......

去掉这段代码就不报错了 其中self.decoder_size是解码的lstm size,self.max_length是生成的最大步长 其中current_h是本时间步的LSTM输出,shape为[batch_size, self.decoder_size] 其中copy_score_weight来自encoder端的所有单词,shape为[batch_size, self.max_length, self.decoder_size]

同时实验去掉了reduce_sum的op,代码是

......
......
with drnn.block():
......
......
    current_h_expand_seq = pd.reshape(
                current_h, [-1, 1, self.decoder_size])
    current_h_expand_seq = pd.expand(
                current_h_expand_seq, [1, self.max_length, 1])

    copy_score_in = pd.concat([copy_score_weight, current_h_expand_seq], axis=2)
    copy_score_in = pd.reshape(
                copy_score_in, [-1, self.decoder_size * 2])
    copy_score = fluid.layers.fc(input=copy_score_in,
                    act='tanh',
                    size=1,
                    bias_attr=False,
                    param_attr=fluid.ParamAttr(name="copy_score_combine_weight_w"))
    copy_score = pd.reshape(
                copy_score, [-1, self.max_length])
......
......

结果报的错和开始是一致的,还是报sum to lod tensor~

完整报错如下:

Traceback (most recent call last):
File "train.py", line 528, in <module>
train()
File "train.py", line 467, in train
return_numpy=False)
File "/home/slurm/job/tmp/job-128277/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/executor.py", line 651, in run
use_program_cache=use_program_cache)
File "/home/slurm/job/tmp/job-128277/python27-gcc482/lib/python2.7/site-packages/paddle/fluid/executor.py", line 749, in _run
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: holder_ should not be null
Tensor holds no memory. Call Tensor::mutable_data first. at [/paddle/paddle/fluid/framework/tensor.cc:23]
PaddlePaddle Call Stacks: 
0 0x7fc4e021aff8p void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 360
1 0x7fc4e021b347p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2 0x7fc4e21d8f09p paddle::framework::Tensor::check_memory_size() const + 185
3 0x7fc4e0221c59p float const* paddle::framework::Tensor::data<float>() const + 25
4 0x7fc4e06c38cbp void paddle::operators::SumToLoDTensor<float>(paddle::framework::ExecutionContext const&) + 763
5 0x7fc4e06cca38p std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, long>, paddle::operators::SumKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) + 248
6 0x7fc4e2183037p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 375
7 0x7fc4e2183411p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529
8 0x7fc4e2180a0cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
9 0x7fc4e03a746ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
10 0x7fc4e1c2139dp paddle::operators::WhileGradOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 1869
11 0x7fc4e2180a0cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
12 0x7fc4e03a746ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
13 0x7fc4e03aa50fp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool) + 143
14 0x7fc4e020bf8dp
15 0x7fc4e024d936p
16 0x7fc58a29fcc8p PyEval_EvalFrameEx + 28264
17 0x7fc58a2a235dp PyEval_EvalCodeEx + 2061
18 0x7fc58a29fd50p PyEval_EvalFrameEx + 28400
19 0x7fc58a2a235dp PyEval_EvalCodeEx + 2061
20 0x7fc58a29fd50p PyEval_EvalFrameEx + 28400
21 0x7fc58a2a235dp PyEval_EvalCodeEx + 2061
22 0x7fc58a29fd50p PyEval_EvalFrameEx + 28400
23 0x7fc58a2a235dp PyEval_EvalCodeEx + 2061
24 0x7fc58a2a2492p PyEval_EvalCode + 50
25 0x7fc58a2cc1a2p PyRun_FileExFlags + 146
26 0x7fc58a2cd539p PyRun_SimpleFileExFlags + 217
27 0x7fc58a2e31bdp Py_Main + 3149
28 0x7fc5894e0bd5p __libc_start_main + 245
29 0x4007a1p
指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#19628
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7