Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #24788

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 5月 28, 2020 by saxon_zh@saxon_zhGuest

Cublas error, CUBLAS_STATUS_EXECUTION_FAILED

Created by: MrChengmo

为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】

如果您没有查询到相似问题,为快速解决您的提问,建立issue时请提供如下细节信息:

  • 标题:简洁、精准概括您的问题,例如“Insufficient Memory xxx" ”
  • 版本、环境信息:    1)PaddlePaddle版本:1.8.1    2)CPU:/    3)GPU:V100 Driver Version: 418.39 CUDA Version: 10.1
  • 训练信息    1)单机单卡    3)Operator信息:operator < mul > error

fluid.install_check()没有问题,但是在训练时出core

terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
  what():

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   void paddle::operators::math::Blas<paddle::platform::CUDADeviceContext>::MatMul<float>(paddle::framework::Tensor const&, bool, paddle::framework::Tensor const&, bool, float, paddle::framework::Tensor*, float) const
3   paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
4   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
5   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
6   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
7   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
8   paddle::framework::HogwildWorker::TrainFilesWithProfiler()

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/home/users/wangjiawei04/paddle_release_home/python/lib64/python2.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op
    attrs=kwargs.get("attrs", None))
  File "/home/users/wangjiawei04/paddle_release_home/python/lib64/python2.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/home/users/wangjiawei04/paddle_release_home/python/lib64/python2.7/site-packages/paddle/fluid/layers/nn.py", line 1719, in fc
    "y_num_col_dims": 1})
  File "/home/users/wangjiawei04/chengmo/paddle_attention/paddle/train_net.py", line 639, in fusion_semantic_word
    bias_attr=fluid.ParamAttr(name="tdm.cls_fc.bias"))
  File "/home/users/wangjiawei04/chengmo/paddle_attention/paddle/train_net.py", line 237, in train_net
    semantic_states, word_states)
  File "/home/users/wangjiawei04/chengmo/paddle_attention/paddle/local_train.py", line 96, in run_train
    avg_cost, auc = tdm_model.train_net(inputs)
  File "/home/users/wangjiawei04/chengmo/paddle_attention/paddle/local_train.py", line 209, in main
    run_train(args)
  File "/home/users/wangjiawei04/chengmo/paddle_attention/paddle/local_train.py", line 216, in <module>
    main(args)

----------------------
Error Message Summary:
----------------------
ExternalError:  Cublas error, CUBLAS_STATUS_EXECUTION_FAILED  at (/paddle/paddle/fluid/operators/math/blas_impl.cu.h:34)
  [operator < mul > error]

W0528 13:54:15.679564 213705 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0528 13:54:15.679577 213705 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0528 13:54:15.679581 213705 init.cc:221] The detail failure signal is:

W0528 13:54:15.679586 213705 init.cc:224] *** Aborted at 1590645255 (unix time) try "date -d @1590645255" if you are using GNU date ***
W0528 13:54:15.684528 213705 init.cc:224] PC: @                0x0 (unknown)
W0528 13:54:15.684717 213705 init.cc:224] *** SIGABRT (@0x520520002dd8f) received by PID 187791 (TID 0x7f609d7cc700) from PID 187791; stack trace: ***
W0528 13:54:15.685890 213705 init.cc:224]     @     0x7f60ca1c8160 (unknown)
W0528 13:54:15.687609 213705 init.cc:224]     @     0x7f60c97363f7 __GI_raise
W0528 13:54:15.688812 213705 init.cc:224]     @     0x7f60c97377d8 __GI_abort
W0528 13:54:15.690255 213705 init.cc:224]     @     0x7f5feba34c65 __gnu_cxx::__verbose_terminate_handler()
W0528 13:54:15.690800 213705 init.cc:224]     @     0x7f5feba32e06 __cxxabiv1::__terminate()
W0528 13:54:15.691521 213705 init.cc:224]     @     0x7f5feba32e33 std::terminate()
W0528 13:54:15.692054 213705 init.cc:224]     @     0x7f5feba85935 execute_native_thread_routine
W0528 13:54:15.693186 213705 init.cc:224]     @     0x7f60ca1c01c3 start_thread
W0528 13:54:15.694511 213705 init.cc:224]     @     0x7f60c97e812d __clone
W0528 13:54:15.695627 213705 init.cc:224]     @                0x0 (unknown)
I0528 13:54:15.863662 213708 mmap_allocator.cc:124] PID: 213708, MemoryMapFdSet: set size - 0
指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#24788
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7