在自有数据集上训练mask_rcnn_r50_1x报错valueError (#454) · Issue · PaddlePaddle / PaddleDetection

在自有数据集上训练mask_rcnn_r50_1x报错valueError

Created by: Magsun

@百度专家萌新求助

自己的数据是用labelme标注后使用labelme自带的labelme2coco转换成coco格式数据集，标注了三类数据（不含背景），修改了mask_reader.yml中的data_dir，和drop_last，修改了mask rcnn r50 1x 里num_class=4。

训练时报错广播错误，无法将(3,800,1067)广播到(3,800)。 ValueError: could not broadcast input array from shape (3,800,1067) into shape (3,800)

百度了一下有的人说是输入尺寸有问题，跟模板不一样，我找了找没看见哪里规定了输入尺寸，我看框架本身会对数据进行resize的感觉应该不会存在尺寸问题额。跪求解答QAQ

log如下： (base) root@8bdbc88b7a0b:/workspace/PaddleDetection# python tools/train.py -c configs/mask_rcnn_r50_1x.yml BBoxAssigner: batch_size_per_im: 512 bbox_reg_weights:

0.1
0.1
0.2
0.2 bg_thresh_hi: 0.5 bg_thresh_lo: 0.0 fg_fraction: 0.25 fg_thresh: 0.5 num_classes: 81 shuffle_before_sample: true BBoxHead: [32mhead[0m: ResNetC5 [32mnms[0m: keep_top_k: 100 nms_threshold: 0.5 normalized: false score_threshold: 0.05 bbox_loss: sigma: 1.0 box_coder: axis: 1 box_normalized: false code_type: decode_center_size prior_box_var:
- 0.1
- 0.1
- 0.2
- 0.2 num_classes: 81 EvalReader: batch_size: 1 dataset: !COCODataSet anno_path: annotations/instances_val2017.json dataset_dir: /data/cloudcover_is/20200331/coco/ image_dir: val2017 sample_num: -1 with_background: true drop_empty: false drop_last: false inputs_def: fields:
- image
- im_info
- im_id
- im_shape sample_transforms:
!DecodeImage to_rgb: true with_mixup: false
!NormalizeImage is_channel_first: false is_scale: true mean:
- 0.485
- 0.456
- 0.406 std:
- 0.229
- 0.224
- 0.225
!ResizeImage interp: 1 max_size: 1333 target_size: 800 use_cv2: true
!Permute channel_first: true to_bgr: false shuffle: false worker_num: 4 LearningRate: [32mschedulers[0m:
!PiecewiseDecay gamma: 0.1 milestones:
- 120000
- 160000 values: null
!LinearWarmup start_factor: 0.3333333333333333 steps: 500 base_lr: 0.01 MaskAssigner: num_classes: 81 resolution: 14 MaskHead: conv_dim: 256 dilation: 1 norm_type: null num_classes: 81 num_convs: 0 resolution: 14 MaskRCNN: [32mbackbone[0m: ResNet [32mrpn_head[0m: RPNHead bbox_assigner: BBoxAssigner bbox_head: BBoxHead fpn: null mask_assigner: MaskAssigner mask_head: MaskHead roi_extractor: RoIAlign rpn_only: false OptimizerBuilder: optimizer: momentum: 0.9 type: Momentum regularizer: factor: 0.0001 type: L2 RPNHead: [32mrpn_target_assign[0m: rpn_batch_size_per_im: 256 rpn_fg_fraction: 0.5 rpn_negative_overlap: 0.3 rpn_positive_overlap: 0.7 rpn_straddle_thresh: 0.0 [32mtest_proposal[0m: min_size: 0.0 nms_thresh: 0.7 post_nms_top_n: 1000 pre_nms_top_n: 6000 [32mtrain_proposal[0m: min_size: 0.0 nms_thresh: 0.7 post_nms_top_n: 2000 pre_nms_top_n: 12000 anchor_generator: anchor_sizes:
- 32
- 64
- 128
- 256
- 512 aspect_ratios:
- 0.5
- 1.0
- 2.0 stride:
- 16.0
- 16.0 variance:
- 1.0
- 1.0
- 1.0
- 1.0 num_classes: 1 ResNet: [32mfeature_maps[0m: 4 [32mnorm_type[0m: affine_channel dcn_v2_stages: [] depth: 50 freeze_at: 2 freeze_norm: true gcb_params: {} gcb_stages: [] nonlocal_stages: [] norm_decay: 0.0 variant: b weight_prefix_name: '' ResNetC5: [32mnorm_type[0m: affine_channel depth: 50 feature_maps:
5 freeze_at: 2 freeze_norm: true norm_decay: 0.0 variant: b weight_prefix_name: '' RoIAlign: [32mresolution[0m: 14 sampling_ratio: 0 spatial_scale: 0.0625 TestReader: batch_size: 1 dataset: !ImageFolder anno_path: annotations/instances_val2017.json dataset_dir: '' image_dir: '' sample_num: -1 use_default_label: null with_background: true drop_last: false inputs_def: fields:
- image
- im_info
- im_id
- im_shape sample_transforms:
!DecodeImage to_rgb: true with_mixup: false
!NormalizeImage is_channel_first: false is_scale: true mean:
- 0.485
- 0.456
- 0.406 std:
- 0.229
- 0.224
- 0.225
!ResizeImage interp: 1 max_size: 1333 target_size: 800 use_cv2: true
!Permute channel_first: true to_bgr: false shuffle: false TrainReader: batch_size: 4 dataset: !COCODataSet anno_path: annotations/instances_train2017.json dataset_dir: /data/cloudcover_is/20200331/coco/ image_dir: train2017 sample_num: -1 with_background: true drop_last: true inputs_def: fields:
- image
- im_info
- im_id
- gt_bbox
- gt_class
- is_crowd
- gt_mask sample_transforms:
!DecodeImage to_rgb: true with_mixup: false
!RandomFlipImage is_mask_flip: true is_normalized: false prob: 0.5
!NormalizeImage is_channel_first: false is_scale: true mean:
- 0.485
- 0.456
- 0.406 std:
- 0.229
- 0.224
- 0.225
!ResizeImage interp: 1 max_size: 1333 target_size: 800 use_cv2: true
!Permute channel_first: true to_bgr: false shuffle: true use_process: false worker_num: 4 architecture: MaskRCNN log_smooth_window: 20 max_iters: 180000 metric: COCO num_classes: 4 pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar save_dir: output snapshot_iter: 1000 use_gpu: true weights: /data/cloudcover_is/20200331/output/mask_rcnn_r50_1x/model_final

W0407 08:11:58.271080 3546 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0 W0407 08:11:58.277384 3546 device_context.cc:245] device: 0, cuDNN Version: 7.6. 2020-04-07 08:12:02,633-INFO: Load model and fuse batch norm if have from https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar... 2020-04-07 08:12:02,634-INFO: Found /root/.cache/paddle/weights/ResNet50_cos_pretrained 2020-04-07 08:12:02,641-INFO: Loading parameters from /root/.cache/paddle/weights/ResNet50_cos_pretrained... 2020-04-07 08:12:02,641-WARNING: /root/.cache/paddle/weights/ResNet50_cos_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-04-07 08:12:02,641-WARNING: /root/.cache/paddle/weights/ResNet50_cos_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-04-07 08:12:02,650-WARNING: variable file [ /root/.cache/paddle/weights/ResNet50_cos_pretrained/fc_0.w_0 /root/.cache/paddle/weights/ResNet50_cos_pretrained/fc_0.b_0 ] not used 2020-04-07 08:12:02,650-WARNING: variable file [ /root/.cache/paddle/weights/ResNet50_cos_pretrained/fc_0.w_0 /root/.cache/paddle/weights/ResNet50_cos_pretrained/fc_0.b_0 ] not used loading annotations into memory... Done (t=1.09s) creating index... index created! 2020-04-07 08:12:05,022-INFO: 2960 samples in file /data/cloudcover_is/20200331/coco/annotations/instances_train2017.json 2020-04-07 08:12:05,047-INFO: places would be ommited when DataLoader is not iterable I0407 08:12:05.064105 3546 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel. I0407 08:12:05.089402 3546 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1 I0407 08:12:05.130786 3546 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True I0407 08:12:05.149138 3546 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0 2020-04-07 08:12:07,916-WARNING: Your reader has raised an exception! Exception in thread Thread-6: Traceback (most recent call last): File "/root/anaconda3/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/root/anaconda3/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/reader.py", line 805, in thread_main six.reraise(*sys.exc_info()) File "/root/anaconda3/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/reader.py", line 785, in thread_main for tensors in self._tensor_reader(): File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/reader.py", line 853, in tensor_reader_impl for slots in paddle_reader(): File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 489, in reader_creator yield self.feed(item) File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 330, in feed ret_dict[each_name] = each_converter.done() File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 139, in done arr = numpy.array(self.data, dtype=self.dtype) ValueError: could not broadcast input array from shape (3,800,1067) into shape (3,800)

/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:782: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "tools/train.py", line 323, in main() File "tools/train.py", line 233, in main outs = exe.run(compiled_train_prog, fetch_list=train_values) File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 783, in run six.reraise(*sys.exc_info()) File "/root/anaconda3/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 778, in run use_program_cache=use_program_cache) File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 843, in _run_impl return_numpy=return_numpy) File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 677, in _run_parallel tensors = exe.run(fetch_var_names)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >) 4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&) 5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Python Call Stacks (More useful to users):

File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/reader.py", line 733, in _init_non_iterable outputs={'Out': self._feed_list}) File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/reader.py", line 646, in init self._init_non_iterable() File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/reader.py", line 280, in from_generator iterable, return_list) File "/workspace/PaddleDetection/ppdet/modeling/architectures/mask_rcnn.py", line 329, in build_inputs iterable=iterable) if use_dataloader else None File "tools/train.py", line 115, in main feed_vars, train_loader = model.build_inputs(**inputs_def) File "tools/train.py", line 323, in main()

Error Message Summary:

Error: Blocking queue is killed because the data reader raises an exception [Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141) [operator < read > error] terminate called without an active exception W0407 08:12:08.303510 3594 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly W0407 08:12:08.303535 3594 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0407 08:12:08.303544 3594 init.cc:214] The detail failure signal is:

W0407 08:12:08.303556 3594 init.cc:217] *** Aborted at 1586247128 (unix time) try "date -d @1586247128" if you are using GNU date *** W0407 08:12:08.306300 3594 init.cc:217] PC: @ 0x0 (unknown) W0407 08:12:08.306440 3594 init.cc:217] *** SIGABRT (@0xdda) received by PID 3546 (TID 0x7f445d9fd700) from PID 3546; stack trace: *** W0407 08:12:08.308898 3594 init.cc:217] @ 0x7f454f537390 (unknown) W0407 08:12:08.311234 3594 init.cc:217] @ 0x7f454f191428 gsignal W0407 08:12:08.313705 3594 init.cc:217] @ 0x7f454f19302a abort W0407 08:12:08.317917 3594 init.cc:217] @ 0x7f45267d084a __gnu_cxx::__verbose_terminate_handler() W0407 08:12:08.319195 3594 init.cc:217] @ 0x7f45267cef47 __cxxabiv1::__terminate() W0407 08:12:08.320966 3594 init.cc:217] @ 0x7f45267cef7d std::terminate() W0407 08:12:08.322532 3594 init.cc:217] @ 0x7f45267cec5a __gxx_personality_v0 W0407 08:12:08.324553 3594 init.cc:217] @ 0x7f454e796b97 _Unwind_ForcedUnwind_Phase2 W0407 08:12:08.326314 3594 init.cc:217] @ 0x7f454e796e7d _Unwind_ForcedUnwind W0407 08:12:08.327750 3594 init.cc:217] @ 0x7f454f536070 __GI___pthread_unwind W0407 08:12:08.329143 3594 init.cc:217] @ 0x7f454f52e845 __pthread_exit W0407 08:12:08.329684 3594 init.cc:217] @ 0x556edfe321c9 PyThread_exit_thread W0407 08:12:08.329820 3594 init.cc:217] @ 0x556edfcc4cb1 PyEval_RestoreThread.cold.787 W0407 08:12:08.330205 3594 init.cc:217] @ 0x7f450c77bcde (unknown) W0407 08:12:08.330754 3594 init.cc:217] @ 0x556edfdbc114 _PyMethodDef_RawFastCallKeywords W0407 08:12:08.331284 3594 init.cc:217] @ 0x556edfdbc231 _PyCFunction_FastCallKeywords W0407 08:12:08.331825 3594 init.cc:217] @ 0x556edfe20a5d _PyEval_EvalFrameDefault W0407 08:12:08.332321 3594 init.cc:217] @ 0x556edfd756f9 _PyEval_EvalCodeWithName W0407 08:12:08.332814 3594 init.cc:217] @ 0x556edfd76805 _PyFunction_FastCallDict W0407 08:12:08.333302 3594 init.cc:217] @ 0x556edfd91943 _PyObject_Call_Prepend W0407 08:12:08.333562 3594 init.cc:217] @ 0x556edfdd012a slot_tp_call W0407 08:12:08.334058 3594 init.cc:217] @ 0x556edfdd118b _PyObject_FastCallKeywords W0407 08:12:08.334596 3594 init.cc:217] @ 0x556edfe20626 _PyEval_EvalFrameDefault W0407 08:12:08.335088 3594 init.cc:217] @ 0x556edfd7673b _PyFunction_FastCallDict W0407 08:12:08.335577 3594 init.cc:217] @ 0x556edfd91943 _PyObject_Call_Prepend W0407 08:12:08.335862 3594 init.cc:217] @ 0x556edfdd012a slot_tp_call W0407 08:12:08.336364 3594 init.cc:217] @ 0x556edfdd118b _PyObject_FastCallKeywords W0407 08:12:08.336897 3594 init.cc:217] @ 0x556edfe20e8f _PyEval_EvalFrameDefault W0407 08:12:08.337393 3594 init.cc:217] @ 0x556edfd756f9 _PyEval_EvalCodeWithName W0407 08:12:08.337884 3594 init.cc:217] @ 0x556edfd76805 _PyFunction_FastCallDict W0407 08:12:08.338369 3594 init.cc:217] @ 0x556edfd91943 _PyObject_Call_Prepend W0407 08:12:08.338919 3594 init.cc:217] @ 0x556edfd84b9e PyObject_Call Aborted (core dumped)

PaddlePaddle / PaddleDetection 1 年多 前同步成功

在自有数据集上训练mask_rcnn_r50_1x报错valueError

C++ Call Stacks (More useful to developers):

Python Call Stacks (More useful to users):

Error Message Summary:

PaddlePaddle / PaddleDetection
1 年多前同步成功