Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • PaddleDetection
  • Issue
  • #856

P
PaddleDetection
  • 项目概览

PaddlePaddle / PaddleDetection
大约 2 年 前同步成功

通知 708
Star 11112
Fork 2696
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
PaddleDetection
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 184
    • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
    • 合并请求 40
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 6月 02, 2020 by saxon_zh@saxon_zhGuest

SENet154-vd-FPN Cascade Mask read数据时出错,求大佬帮忙

Created by: KK-Jiang

大佬在上: aistudio训练报[operator < read > error]:Blocking queue is killed because the data reader raises an exception。

版本、环境信息: 1)PaddlePaddle版本:1.8.0 2)系统环境|GPU:aistudio上,v100 3)PaddleDetection 0.3

训练信息 1)单卡 2)16G 3)错误为[operator < read > error]

复现信息:使用官方PaddleDetection-release-0.3,配置文件cascade_mask_rcnn_dcnv2_se154_vd_fpn_gn_s1x.yml,修改了class_num, batch_size, 数据路径,lr策略等基础信息,然后直接训练模型,就出错。 我尝试在我的window机器上训练,好不容易装好环境,报同样的错误

问题描述: err log如下,我做如下尝试都没有解决问题:我尝试了将work_num逐渐减小,非0的时候仍然报错,为0的时候,放十几个小时都显示正常log然后不动;将DataLoader.from_generator的capacity改小;use_double_buffer改为False;iterable改为True(默认False);以上都无效:

具体错误如下: 2020-05-26 11:24:35,963-INFO: places would be ommited when DataLoader is not iterable 2020-05-26 11:24:39,118-WARNING: recv endsignal from outq with errmsg[consumer[consumer-c14-0] exits for reason[producer[producer-c14] failed with error: cannot reshape array of size 1 into shape (2)]] 2020-05-26 11:24:39,119-WARNING: recv endsignal from outq with errmsg[consumer[consumer-c14-1] exits for reason[consumer[consumer-c14-0] exits for reason[producer[producer-c14] failed with error: cannot reshape array of size 1 into shape (2)]]] 2020-05-26 11:24:39,119-WARNING: recv endsignal from outq with errmsg[consumer[consumer-c14-2] exits for reason[consumer[consumer-c14-1] exits for reason[consumer[consumer-c14-0] exits for reason[producer[producer-c14] failed with error: cannot reshape array of size 1 into shape (2)]]]] 2020-05-26 11:24:39,119-WARNING: recv endsignal from outq with errmsg[consumer[consumer-c14-3] exits for reason[consumer[consumer-c14-2] exits for reason[consumer[consumer-c14-1] exits for reason[consumer[consumer-c14-0] exits for reason[producer[producer-c14] failed with error: cannot reshape array of size 1 into shape (2)]]]]] 2020-05-26 11:24:39,119-WARNING: recv endsignal from outq with errmsg[consumer[consumer-c14-4] exits for reason[consumer[consumer-c14-3] exits for reason[consumer[consumer-c14-2] exits for reason[consumer[consumer-c14-1] exits for reason[consumer[consumer-c14-0] exits for reason[producer[producer-c14] failed with error: cannot reshape array of size 1 into shape (2)]]]]]] 2020-05-26 11:24:39,119-WARNING: recv endsignal from outq with errmsg[consumer[consumer-c14-5] exits for reason[consumer[consumer-c14-4] exits for reason[consumer[consumer-c14-3] exits for reason[consumer[consumer-c14-2] exits for reason[consumer[consumer-c14-1] exits for reason[consumer[consumer-c14-0] exits for reason[producer[producer-c14] failed with error: cannot reshape array of size 1 into shape (2)]]]]]]] 2020-05-26 11:24:39,119-WARNING: recv endsignal from outq with errmsg[consumer[consumer-c14-6] exits for reason[consumer[consumer-c14-5] exits for reason[consumer[consumer-c14-4] exits for reason[consumer[consumer-c14-3] exits for reason[consumer[consumer-c14-2] exits for reason[consumer[consumer-c14-1] exits for reason[consumer[consumer-c14-0] exits for reason[producer[producer-c14] failed with error: cannot reshape array of size 1 into shape (2)]]]]]]]] 2020-05-26 11:24:39,120-WARNING: recv endsignal from outq with errmsg[consumer[consumer-c14-7] exits for reason[consumer[consumer-c14-6] exits for reason[consumer[consumer-c14-5] exits for reason[consumer[consumer-c14-4] exits for reason[consumer[consumer-c14-3] exits for reason[consumer[consumer-c14-2] exits for reason[consumer[consumer-c14-1] exits for reason[consumer[consumer-c14-0] exits for reason[producer[producer-c14] failed with error: cannot reshape array of size 1 into shape (2)]]]]]]]]] 2020-05-26 11:24:39,120-WARNING: Your reader has raised an exception! Exception in thread Thread-10: Traceback (most recent call last): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1156, in thread_main six.reraise(*sys.exc_info()) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 693, in reraise raise value File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1136, in thread_main for tensors in self._tensor_reader(): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1206, in tensor_reader_impl for slots in paddle_reader(): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 505, in reader_creator for item in reader(): File "/home/aistudio/PaddleDetection-release-0.3/ppdet/data/reader.py", line 421, in _reader reader.reset() File "/home/aistudio/PaddleDetection-release-0.3/ppdet/data/parallel_map.py", line 259, in reset assert not self._exit, "cannot reset for already stopped dataset" AssertionError: cannot reset for already stopped dataset

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") loading annotations into memory... Done (t=12.03s) creating index... index created! Traceback (most recent call last): File "PaddleDetection-release-0.3/tools/train.py", line 366, in main() File "PaddleDetection-release-0.3/tools/train.py", line 239, in main outs = exe.run(compiled_train_prog, fetch_list=train_values) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 693, in reraise raise value File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl return_merged=return_merged) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel tensors = exe.run(fetch_var_names, return_merged)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers): 0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&) 5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Python Call Stacks (More useful to users): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1078, in _init_non_iterable attrs={'drop_last': self._drop_last}) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 976, in init self._init_non_iterable() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 608, in from_generator iterable, return_list, drop_last) File "/home/aistudio/PaddleDetection-release-0.3/ppdet/modeling/architectures/cascade_mask_rcnn.py", line 426, in build_inputs iterable=iterable) if use_dataloader else None File "PaddleDetection-release-0.3/tools/train.py", line 112, in main feed_vars, train_loader = model.build_inputs(**inputs_def) File "PaddleDetection-release-0.3/tools/train.py", line 366, in main()

Error Message Summary: Error: Blocking queue is killed because the data reader raises an exception [Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141) [operator < read > error]

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/PaddleDetection#856
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7