用自己的数据集跑"mask_rcnn_r50_1x.yml"训练时出现问题 (#1187) · Issue · PaddlePaddle / PaddleDetection

用自己的数据集跑"mask_rcnn_r50_1x.yml"训练时出现问题

Created by: lky-bit

问题描述：用自己的数据集跑训练时出现问题数据集情况：标注为lableme格式，采用X2COCO转为coco格式，3类，将“mask_rcnn_r50_1x.yml”中的num_class改为3。运行环境：windows10+anaconda3+paddledetection 完整报错信息如下： (paddle) C:\Users\HX\PaddleDetection>python -u tools/train.py -c configs/mask_rcnn_r50_1x.yml --eval 2020-08-10 22:41:46,220-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters! loading annotations into memory... Done (t=0.01s) creating index... index created! 2020-08-10 22:41:47,232-INFO: places would be ommited when DataLoader is not iterable W0810 22:41:47.251142 19100 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.2, Runtime API Version: 10.0 W0810 22:41:47.325584 19100 device_context.cc:260] device: 0, cuDNN Version: 7.6. 2020-08-10 22:41:49,836-WARNING: C:\Users\HX/.cache/paddle/weights\ResNet50_cos_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-08-10 22:41:51,268-WARNING: C:\Users\HX/.cache/paddle/weights\ResNet50_cos_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-08-10 22:41:51,275-WARNING: variable file [ C:/Users/HX/.cache/paddle/weights/ResNet50_cos_pretrained/fc_0.b_0 C:/Users/HX/.cache/paddle/weights/ResNet50_cos_pretrained/fc_0.w_0 ] not used loading annotations into memory... Done (t=0.06s) creating index... index created! 2020-08-10 22:41:51,809-INFO: places would be ommited when DataLoader is not iterable W0810 22:41:51.820875 19100 build_strategy.cc:170] fusion_group is not enabled for Windows/MacOS now, and only effective when running with CUDA GPU. 2020-08-10 22:41:52,678-INFO: iter: 0, lr: 0.000333, 'loss_cls': '1.187765', 'loss_bbox': '0.017275', 'loss_rpn_cls': '0.693055', 'loss_rpn_bbox': '0.062487', 'loss_mask': '3.645032', 'loss': '5.605614', time: 0.000, eta: 0:00:00 2020-08-10 22:42:03,473-INFO: iter: 20, lr: 0.000360, 'loss_cls': '0.198564', 'loss_bbox': '0.015967', 'loss_rpn_cls': '0.691949', 'loss_rpn_bbox': '0.067279', 'loss_mask': '0.622899', 'loss': '1.668356', time: 0.556, eta: 2:46:38 C:\Users\HX.conda\envs\paddle\lib\site-packages\paddle\fluid\executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "tools/train.py", line 368, in main() File "tools/train.py", line 241, in main outs = exe.run(compiled_train_prog, fetch_list=train_values) File "C:\Users\HX.conda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "C:\Users\HX.conda\envs\paddle\lib\site-packages\six.py", line 703, in reraise raise value File "C:\Users\HX.conda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 1066, in run return_merged=return_merged) File "C:\Users\HX.conda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 1167, in _run_impl return_merged=return_merged) File "C:\Users\HX.conda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 879, in _run_parallel tensors = exe.run(fetch_var_names, return_merged)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

Windows not support stack backtrace yet.

Python Call Stacks (More useful to users):

File "C:\Users\HX.conda\envs\paddle\lib\site-packages\paddle\fluid\framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "C:\Users\HX.conda\envs\paddle\lib\site-packages\paddle\fluid\layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "C:\Users\HX.conda\envs\paddle\lib\site-packages\paddle\fluid\layers\detection.py", line 2752, in generate_mask_labels 'resolution': resolution}) File "C:\Users\HX\PaddleDetection\ppdet\core\workspace.py", line 164, in partial_apply return op(*args, **kwargs_) File "C:\Users\HX\PaddleDetection\ppdet\modeling\architectures\mask_rcnn.py", line 134, in build labels_int32=labels_int32) File "C:\Users\HX\PaddleDetection\ppdet\modeling\architectures\mask_rcnn.py", line 333, in train return self.build(feed_vars, 'train') File "tools/train.py", line 114, in main train_fetches = model.train(feed_vars) File "tools/train.py", line 368, in main()

Error Message Summary:

Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.

New issue link: https://github.com/PaddlePaddle/Paddle/issues/new
Recommended issue content: all error stack information [Hint: Expected desc->total_size >= size, but received desc->total_size:0 < size:4096.] at (D:\1.8.3\paddle\paddle\fluid\memory\detail\memory_block.cc:41) [operator < generate_mask_labels > error]

然后用aistudio跑了一遍，也报了类似的错误。报错信息如下： 2020-08-11 00:24:54,820-INFO: font search path ['/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/ttf', '/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/afm', '/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts'] 2020-08-11 00:24:55,170-INFO: generated new fontManager ----------- Configuration Arguments ----------- MASK_ON: 1 anchor_sizes: [32, 64, 128, 256, 512] aspect_ratios: [0.5, 1.0, 2.0] batch_size_per_im: 512 class_num: 3 data_dir: smalldataset dataset: coco2017 draw_threshold: 0.8 enable_ce: False freeze_model_save_dir: freeze_model im_per_batch: 1 image_path: dataset/coco/val2017 learning_rate: 0.01 log_window: 20 max_iter: 3000 max_size: 1333 model_save_dir: output/ nms_thresh: 0.5 padding_minibatch: False parallel: False pixel_means: [102.9801, 115.9465, 122.7717] pretrained_model: imagenet_resnet50_fusebn rpn_nms_thresh: 0.7 rpn_stride: [16.0, 16.0] scales: [800] score_thresh: 0.05 snapshot_stride: 10000 use_gpu: 1 use_profile: False use_pyreader: False variance: [1.0, 1.0, 1.0, 1.0]

W0811 00:24:57.596515 270 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0 W0811 00:24:57.601276 270 device_context.cc:245] device: 0, cuDNN Version: 7.3. Creating: coco2017 loading annotations into memory... Done (t=0.10s) creating index... index created! _add_gt_annotations took 0.032s Appending horizontally-flipped training examples... Loaded dataset: coco2017 1132 roidb entries Filtered 0 roidb entries: 1132 -> 1132 train on coco2017 with 1132 roidbs 2020-08-11 00:25:00.030219, iter: 0, lr: 0.00333, 'loss': 3.468, 'loss_cls': 1.132, 'loss_bbox': 0.02, 'loss_rpn_cls': 0.697, 'loss_rpn_bbox': 0.013, 'loss_mask': 1.606, time: 0.080 2020-08-11 00:25:20.756619, iter: 50, lr: 0.00400, 'loss': 0.605, 'loss_cls': 0.115, 'loss_bbox': 0.101, 'loss_rpn_cls': 0.066, 'loss_rpn_bbox': 0.082, 'loss_mask': 0.22, time: 0.423 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:782: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "train.py", line 183, in train() File "train.py", line 175, in train train_loop() File "train.py", line 148, in train_loop feed=feeder.feed(data)) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 783, in run six.reraise(*sys.exc_info()) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 693, in reraise raise value File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 778, in run use_program_cache=use_program_cache) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 831, in _run_impl use_program_cache=use_program_cache) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 905, in _run_program fetch_var_name) paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 paddle::memory::detail::MemoryBlock::Split(paddle::memory::detail::MetadataCache*, unsigned long) 3 paddle::memory::detail::BuddyAllocator::SplitToAlloc(std::_Rb_tree_const_iterator<std::tuple<unsigned long, unsigned long, void*> >, unsigned long) 4 paddle::memory::detail::BuddyAllocator::Alloc(unsigned long) 5 void* paddle::memory::legacy::Allocpaddle::platform::CPUPlace(paddle::platform::CPUPlace const&, unsigned long) 6 paddle::memory::allocation::NaiveBestFitAllocator::AllocateImpl(unsigned long) 7 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long) 8 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long) 9 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long) 10 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long) 11 std::vector<paddle::framework::Tensor, std::allocatorpaddle::framework::Tensor > paddle::operators::SampleMaskForOneImage(paddle::platform::CPUDeviceContext const&, paddle::framework::Tensor const&, paddle::framework::Tensor const&, paddle::framework::Tensor const&, paddle::framework::Tensor const&, paddle::framework::Tensor const&, paddle::framework::Tensor const&, int, int, std::vector<paddle::framework::Vector, std::allocator<paddle::framework::Vector > > const&) 12 paddle::operators::GenerateMaskLabelsKernel::Compute(paddle::framework::ExecutionContext const&) const 13 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::GenerateMaskLabelsKernel >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 14 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 15 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 16 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 17 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) 18 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool, bool)

Python Call Stacks (More useful to users):

File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/detection.py", line 2704, in generate_mask_labels 'resolution': resolution}) File "/home/aistudio/mask-rcnn/models/model_builder.py", line 241, in rpn_heads resolution=cfg.resolution) File "/home/aistudio/mask-rcnn/models/model_builder.py", line 41, in build_model self.rpn_heads(body_conv) File "train.py", line 79, in train model.build_model(image_shape) File "train.py", line 183, in train()

Error Message Summary:

New issue link: https://github.com/PaddlePaddle/Paddle/issues/new
Recommended issue content: all error stack information [Hint: Expected desc->total_size >= size, but received desc->total_size:0 < size:4096.] at (/paddle/paddle/fluid/memory/detail/memory_block.cc:41) [operator < generate_mask_labels > error]

PaddlePaddle / PaddleDetection 1 年多 前同步成功

用自己的数据集跑"mask_rcnn_r50_1x.yml"训练时出现问题

C++ Call Stacks (More useful to developers):

Python Call Stacks (More useful to users):

Error Message Summary:

C++ Call Stacks (More useful to developers):

Python Call Stacks (More useful to users):

Error Message Summary:

PaddlePaddle / PaddleDetection
1 年多前同步成功