Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • PaddleDetection
  • Issue
  • #1257

P
PaddleDetection
  • 项目概览

PaddlePaddle / PaddleDetection
大约 2 年 前同步成功

通知 708
Star 11112
Fork 2696
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
PaddleDetection
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 184
    • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
    • 合并请求 40
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 8月 19, 2020 by saxon_zh@saxon_zhGuest

PPyolo自定义数据集训练core dump

Created by: nihuizhidao

CentOS7, CUDA 10.2,两个GPU训练,paddlepaddle 1.8.4.post107, paddleDetection release 0.4

在使用自己的数据集(已转换为COCO格式)fine tune时,训练到eval的时候(第1000iter)出现如下错误:

2020-08-19 16:59:40,701-INFO: Save model to output/ppyolo/1000. /home/xxx/anaconda3/envs/pp184/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "tools/train.py", line 368, in main() File "tools/train.py", line 286, in main resolution=resolution) File "/home/xxx/Users/xxx/xxx/PaddleDetection-release-0.4/ppdet/utils/eval_utils.py", line 129, in eval_run return_numpy=False) File "/home/xxx/anaconda3/envs/pp184/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/home/xxx/anaconda3/envs/pp184/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/xxx/anaconda3/envs/pp184/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/home/xxx/anaconda3/envs/pp184/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl return_merged=return_merged) File "/home/xxx/anaconda3/envs/pp184/lib/python3.7/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel tensors = exe.run(fetch_var_names, return_merged)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::exception_ptr::exception_ptr, char const*, int) 2 paddle::platform::CublasHandleHolder::CublasHandleHolder(CUstream_st*, cublasMath_t) 3 paddle::platform::CUDAContext::CUDAContext(paddle::platform::CUDAPlace const&, paddle::platform::stream::Priority const&) 4 paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace) 5 std::Function_handler<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > (), std::reference_wrapper<std::Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > >, std::lesspaddle::platform::Place, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > > > > >, paddle::platform::Place)::{lambda()#1} ()> > >::_M_invoke(std::_Any_data const&) 6 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > > >::_M_invoke(std::_Any_data const&) 7 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 8 std::__future_base::_Deferred_state<std::_Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > >, std::lesspaddle::platform::Place, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > > > > >, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > >::M_run_deferred() 9 paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&) 10 paddle::framework::details::FastThreadedSSAGraphExecutor::InsertFetchOps(std::vector<std::string, std::allocatorstd::string > const&, boost::variant<std::vector<boost::variant<paddle::framework::LoDTensor, std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::allocator<boost::variant<paddle::framework::LoDTensor, std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > >, std::vector<std::vector<boost::variant<paddle::framework::LoDTensor, std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::allocator<boost::variant<paddle::framework::LoDTensor, std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > >, std::allocator<std::vector<boost::variant<paddle::framework::LoDTensor, std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::allocator<boost::variant<paddle::framework::LoDTensor, std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > > >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::unordered_map<std::string, std::vector<paddle::framework::details::VarHandleBase, std::allocatorpaddle::framework::details::VarHandleBase* >, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::vector<paddle::framework::details::VarHandleBase*, std::allocatorpaddle::framework::details::VarHandleBase* > > > >, std::unordered_map<paddle::framework::details::OpHandleBase, std::atomic, std::hashpaddle::framework::details::OpHandleBase*, std::equal_topaddle::framework::details::OpHandleBase*, std::allocator<std::pair<paddle::framework::details::OpHandleBase* const, std::atomic > > >, std::vector<paddle::framework::details::OpHandleBase, std::allocatorpaddle::framework::details::OpHandleBase* >, std::vector<paddle::framework::details::OpHandleBase, std::allocatorpaddle::framework::details::OpHandleBase* >*, bool) 11 paddle::framework::details::FastThreadedSSAGraphExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&, bool) 12 paddle::framework::details::ScopeBufferedMonitor::Apply(std::function<void ()> const&, bool) 13 paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&, bool) 14 paddle::framework::ParallelExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&, bool)


Error Message Summary:

ExternalError: Cublas error, CUBLAS_STATUS_ALLOC_FAILED at (/paddle/paddle/fluid/platform/cuda_helper.h:81)

terminate called without an active exception W0819 16:59:47.753170 32004 init.cc:226] Warning: PaddlePaddle catches a failure signal, it may not work properly W0819 16:59:47.753242 32004 init.cc:228] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0819 16:59:47.753255 32004 init.cc:231] The detail failure signal is:

W0819 16:59:47.753284 32004 init.cc:234] *** Aborted at 1597827587 (unix time) try "date -d @1597827587" if you are using GNU date *** W0819 16:59:47.757735 32004 init.cc:234] PC: @ 0x0 (unknown) W0819 16:59:47.757876 32004 init.cc:234] *** SIGABRT (@0x3e8000059a7) received by PID 22951 (TID 0x7f6071fff700) from PID 22951; stack trace: *** W0819 16:59:47.760473 32004 init.cc:234] @ 0x7f6577ecb630 (unknown) W0819 16:59:47.763413 32004 init.cc:234] @ 0x7f6577b24387 __GI_raise W0819 16:59:47.766125 32004 init.cc:234] @ 0x7f6577b25a78 __GI_abort W0819 16:59:47.785084 32004 init.cc:234] @ 0x7f65450ce84a __gnu_cxx::__verbose_terminate_handler() W0819 16:59:47.787011 32004 init.cc:234] @ 0x7f65450ccf47 __cxxabiv1::__terminate() W0819 16:59:47.789022 32004 init.cc:234] @ 0x7f65450ccf7d std::terminate() W0819 16:59:47.790805 32004 init.cc:234] @ 0x7f65450ccc5a __gxx_personality_v0 W0819 16:59:47.797116 32004 init.cc:234] @ 0x7f657812db97 _Unwind_ForcedUnwind_Phase2 W0819 16:59:47.800135 32004 init.cc:234] @ 0x7f657812de7d _Unwind_ForcedUnwind W0819 16:59:47.802764 32004 init.cc:234] @ 0x7f6577eca362 __GI___pthread_unwind W0819 16:59:47.805341 32004 init.cc:234] @ 0x7f6577ec4ef7 __pthread_exit W0819 16:59:47.850818 32004 init.cc:234] @ 0x561b9fe731c9 PyThread_exit_thread W0819 16:59:47.851243 32004 init.cc:234] @ 0x561b9fd05cb1 PyEval_RestoreThread.cold.787 W0819 16:59:47.851969 32004 init.cc:234] @ 0x7f650e8435d5 (unknown) W0819 16:59:47.853279 32004 init.cc:234] @ 0x561b9fdfd114 _PyMethodDef_RawFastCallKeywords W0819 16:59:47.854166 32004 init.cc:234] @ 0x561b9fdfd231 _PyCFunction_FastCallKeywords W0819 16:59:47.854974 32004 init.cc:234] @ 0x561b9fe61a5d _PyEval_EvalFrameDefault W0819 16:59:47.855684 32004 init.cc:234] @ 0x561b9fdb66f9 _PyEval_EvalCodeWithName W0819 16:59:47.856374 32004 init.cc:234] @ 0x561b9fdb7805 _PyFunction_FastCallDict W0819 16:59:47.857045 32004 init.cc:234] @ 0x561b9fdd2943 _PyObject_Call_Prepend W0819 16:59:47.857429 32004 init.cc:234] @ 0x561b9fe1112a slot_tp_call W0819 16:59:47.858115 32004 init.cc:234] @ 0x561b9fe1218b _PyObject_FastCallKeywords W0819 16:59:47.858860 32004 init.cc:234] @ 0x561b9fe61626 _PyEval_EvalFrameDefault W0819 16:59:47.859539 32004 init.cc:234] @ 0x561b9fdb773b _PyFunction_FastCallDict W0819 16:59:47.860206 32004 init.cc:234] @ 0x561b9fdd2943 _PyObject_Call_Prepend W0819 16:59:47.860591 32004 init.cc:234] @ 0x561b9fe1112a slot_tp_call W0819 16:59:47.861285 32004 init.cc:234] @ 0x561b9fe1218b _PyObject_FastCallKeywords W0819 16:59:47.862025 32004 init.cc:234] @ 0x561b9fe61e8f _PyEval_EvalFrameDefault W0819 16:59:47.862701 32004 init.cc:234] @ 0x561b9fdb66f9 _PyEval_EvalCodeWithName W0819 16:59:47.863380 32004 init.cc:234] @ 0x561b9fdb7805 _PyFunction_FastCallDict W0819 16:59:47.864060 32004 init.cc:234] @ 0x561b9fdd2943 _PyObject_Call_Prepend W0819 16:59:47.864805 32004 init.cc:234] @ 0x561b9fdc5b9e PyObject_Call 已放弃(吐核)

这个错误是什么原因呢?是GPU显存不够了么?谢谢

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/PaddleDetection#1257
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7