Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • PaddleDetection
  • Issue
  • #1074

P
PaddleDetection
  • 项目概览

PaddlePaddle / PaddleDetection
大约 2 年 前同步成功

通知 708
Star 11112
Fork 2696
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
PaddleDetection
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 184
    • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
    • 合并请求 40
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 7月 17, 2020 by saxon_zh@saxon_zhGuest

训练到第23060迭代发生core dump

Created by: nihuizhidao

训练几个小时后,发生core dump, 错误信息:

/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "tools/train.py", line 326, in main() File "tools/train.py", line 236, in main outs = exe.run(compiled_train_prog, fetch_list=train_values) File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/home/scc/anaconda3/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl return_merged=return_merged) File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel tensors = exe.run(fetch_var_names, return_merged)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 void paddle::operators::GPUGather<float, int>(paddle::platform::DeviceContext const&, paddle::framework::Tensor const&, paddle::framework::Tensor const&, paddle::framework::Tensor*) 3 paddle::operators::CUDAGenerateProposalsKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 4 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::CUDAGenerateProposalsKernel<paddle::platform::CUDADeviceContext, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 7 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 8 paddle::framework::details::ComputationOpHandle::RunImpl() 9 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 10 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) 11 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) 12 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 13 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const


Python Call Stacks (More useful to users):

File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/layers/detection.py", line 2846, in generate_proposals 'RpnRoisLod': rpn_rois_lod File "/home/scc/Projects/AIDetectionProjects/Jinghuajian/ppdet/core/workspace.py", line 150, in partial_apply return op(*args, **kwargs_) File "/home/scc/Projects/AIDetectionProjects/Jinghuajian/ppdet/modeling/anchor_heads/rpn_head.py", line 438, in _get_single_proposals variances=self.anchor_var) File "/home/scc/Projects/AIDetectionProjects/Jinghuajian/ppdet/modeling/anchor_heads/rpn_head.py", line 462, in get_proposals fpn_feat, im_info, lvl, mode) File "/home/scc/Projects/AIDetectionProjects/Jinghuajian/ppdet/modeling/architectures/faster_rcnn.py", line 100, in build rois = self.rpn_head.get_proposals(body_feats, im_info, mode=mode) File "/home/scc/Projects/AIDetectionProjects/Jinghuajian/ppdet/modeling/architectures/faster_rcnn.py", line 240, in train return self.build(feed_vars, 'train') File "tools/train.py", line 117, in main train_fetches = model.train(feed_vars) File "tools/train.py", line 326, in main()


Error Message Summary:

InvalidArgumentError: The index of gather_op should not be emptywhen the index's rank is 1. [Hint: Expected index.dims()[0] > 0, but received index.dims()[0]:0 <= 0:0.] at (/paddle/paddle/fluid/operators/gather.cu.h:83) [operator < generate_proposals > error] terminate called without an active exception W0716 19:09:20.089466 3925 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly W0716 19:09:20.089524 3925 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0716 19:09:20.089537 3925 init.cc:221] The detail failure signal is:

W0716 19:09:20.089586 3925 init.cc:224] *** Aborted at 1594897760 (unix time) try "date -d @1594897760" if you are using GNU date *** W0716 19:09:20.095199 3925 init.cc:224] PC: @ 0x0 (unknown) W0716 19:09:20.095360 3925 init.cc:224] *** SIGABRT (@0x3e800000dfc) received by PID 3580 (TID 0x7ff221b03700) from PID 3580; stack trace: *** W0716 19:09:20.099287 3925 init.cc:224] @ 0x7ff2764f0630 (unknown) W0716 19:09:20.103327 3925 init.cc:224] @ 0x7ff276149387 __GI_raise W0716 19:09:20.107179 3925 init.cc:224] @ 0x7ff27614aa78 __GI_abort W0716 19:09:20.133680 3925 init.cc:224] @ 0x7ff20281184a __gnu_cxx::__verbose_terminate_handler() W0716 19:09:20.136224 3925 init.cc:224] @ 0x7ff20280ff47 __cxxabiv1::__terminate() W0716 19:09:20.142647 3925 init.cc:224] @ 0x7ff20280ff7d std::terminate() W0716 19:09:20.145336 3925 init.cc:224] @ 0x7ff20280fc5a __gxx_personality_v0 W0716 19:09:20.162426 3925 init.cc:224] @ 0x7ff26f238b97 _Unwind_ForcedUnwind_Phase2 W0716 19:09:20.168728 3925 init.cc:224] @ 0x7ff26f238e7d _Unwind_ForcedUnwind W0716 19:09:20.172485 3925 init.cc:224] @ 0x7ff2764ef362 __GI___pthread_unwind W0716 19:09:20.175529 3925 init.cc:224] @ 0x7ff2764e9ef7 __pthread_exit W0716 19:09:20.176205 3925 init.cc:224] @ 0x5595c2d871c9 PyThread_exit_thread W0716 19:09:20.176414 3925 init.cc:224] @ 0x5595c2c19cb1 PyEval_RestoreThread.cold.787 W0716 19:09:20.202080 3925 init.cc:224] @ 0x7ff1bffa2669 pybind11::gil_scoped_release::~gil_scoped_release() W0716 19:09:20.211416 3925 init.cc:224] @ 0x7ff1c008ab75 ZZN8pybind1112cpp_function10initializeIZN6paddle6pybind10BindReaderEPNS_6moduleEEUlRNS2_9operators6reader40OrderedMultiDeviceLoDTensorBlockingQueueERKSt6vectorINS2_9framework9LoDTensorESaISC_EEE2_bIS9_SG_EINS_4nameENS_9is_methodENS_7siblingENS_10call_guardIINS_18gil_scoped_releaseEEEEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNES11 W0716 19:09:20.218363 3925 init.cc:224] @ 0x7ff1bffbfe49 pybind11::cpp_function::dispatcher() W0716 19:09:20.219130 3925 init.cc:224] @ 0x5595c2d11114 _PyMethodDef_RawFastCallKeywords W0716 19:09:20.219820 3925 init.cc:224] @ 0x5595c2d11231 _PyCFunction_FastCallKeywords W0716 19:09:20.220517 3925 init.cc:224] @ 0x5595c2d75e8f _PyEval_EvalFrameDefault W0716 19:09:20.221141 3925 init.cc:224] @ 0x5595c2cca9da _PyEval_EvalCodeWithName W0716 19:09:20.221791 3925 init.cc:224] @ 0x5595c2ccb805 _PyFunction_FastCallDict W0716 19:09:20.222491 3925 init.cc:224] @ 0x5595c2d729f5 _PyEval_EvalFrameDefault W0716 19:09:20.223073 3925 init.cc:224] @ 0x5595c2d1068b _PyFunction_FastCallKeywords W0716 19:09:20.223747 3925 init.cc:224] @ 0x5595c2d71260 _PyEval_EvalFrameDefault W0716 19:09:20.224359 3925 init.cc:224] @ 0x5595c2d1068b _PyFunction_FastCallKeywords W0716 19:09:20.225055 3925 init.cc:224] @ 0x5595c2d71260 _PyEval_EvalFrameDefault W0716 19:09:20.225721 3925 init.cc:224] @ 0x5595c2ccb73b _PyFunction_FastCallDict W0716 19:09:20.226342 3925 init.cc:224] @ 0x5595c2ce6943 _PyObject_Call_Prepend W0716 19:09:20.227012 3925 init.cc:224] @ 0x5595c2cd9b9e PyObject_Call W0716 19:09:20.227315 3925 init.cc:224] @ 0x5595c2dc5af7 t_bootstrap W0716 19:09:20.227457 3925 init.cc:224] @ 0x5595c2d82e18 pythread_wrapper W0716 19:09:20.231935 3925 init.cc:224] @ 0x7ff2764e8ea5 start_thread 已放弃(吐核)

部分配置信息:

architecture: FasterRCNN max_iters: 180000 snapshot_iter: 800 use_gpu: true log_smooth_window: 20 save_dir: output pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar weights: output/faster_rcnn_x101_vd_64x4d_fpn_1x/model_final metric: COCO num_classes: 5

使用的是自己的数据,进行fine tuning迁移学习

好像没有什么明显的有用提示是为什么。。。请帮忙看看,有点着急

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/PaddleDetection#1074
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7