cascade配置进行训练，报错Received [223] in X is not equal to [224] in Y at i:3 (#1492) · Issue · PaddlePaddle / PaddleDetection

cascade配置进行训练，报错Received [223] in X is not equal to [224] in Y at i:3

Created by: skywalk163

参加目标检测7日打卡营，使用cascade配置进行训练，报错Received [223] in X is not equal to [224] in Y at i:3。

环境为aistudio 目标检测7日打卡营作业二：RCNN系列模型实战的默认环境。 python 3.7.4 paddle 1.8.0 paddledetection 0.4

命令 !python -u tools/train.py -c ../cascade_rcnn_r50_fpn_1x.yml –eval

报错信息为： 2020-09-24 11:08:00,218-WARNING: paddle.fluid.layers.matrix_nms OP not found, maybe a newer version of paddle is required. 2020-09-24 11:08:00,753-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters! 2020-09-24 11:08:01,292-INFO: font search path ['/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/ttf', '/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/afm', '/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts'] 2020-09-24 11:08:01,717-INFO: generated new fontManager loading annotations into memory... Done (t=0.00s) creating index... index created! 2020-09-24 11:08:01,989-INFO: places would be ommited when DataLoader is not iterable W0924 11:08:02.011317 174 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0 W0924 11:08:02.016479 174 device_context.cc:260] device: 0, cuDNN Version: 7.6. 2020-09-24 11:08:05,304-INFO: Downloading ResNet50_cos_pretrained.tar from https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar 100%|████████████████████████████████| 100310/100310 [00:03<00:00, 28276.20KB/s] 2020-09-24 11:08:08,935-INFO: Decompressing /home/aistudio/.cache/paddle/weights/ResNet50_cos_pretrained.tar... 2020-09-24 11:08:09,122-WARNING: /home/aistudio/.cache/paddle/weights/ResNet50_cos_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-09-24 11:08:09,512-WARNING: /home/aistudio/.cache/paddle/weights/ResNet50_cos_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-09-24 11:08:09,520-WARNING: variable file [ /home/aistudio/.cache/paddle/weights/ResNet50_cos_pretrained/fc_0.b_0 /home/aistudio/.cache/paddle/weights/ResNet50_cos_pretrained/fc_0.w_0 ] not used loading annotations into memory... Done (t=0.01s) creating index... index created! 2020-09-24 11:08:09,759-INFO: places would be ommited when DataLoader is not iterable /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "tools/train.py", line 372, in main() File "tools/train.py", line 245, in main outs = exe.run(compiled_train_prog, fetch_list=train_values) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl return_merged=return_merged) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel tensors = exe.run(fetch_var_names, return_merged)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 paddle::operators::GetBroadcastDimsArrays(paddle::framework::DDim const&, paddle::framework::DDim const&, int*, int*, int*, int, int) 3 paddle::operators::ElementwiseOp::InferShape(paddle::framework::InferShapeContext*) const 4 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 6 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 7 paddle::framework::details::ComputationOpHandle::RunImpl() 8 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 9 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) 10 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) 11 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 12 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Python Call Stacks (More useful to users):

File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py", line 242, in impl attrs={'axis': axis}) File "/home/aistudio/work/PaddleDetection/ppdet/modeling/backbones/fpn.py", line 108, in _add_topdown_lateral return lateral + topdown File "/home/aistudio/work/PaddleDetection/ppdet/modeling/backbones/fpn.py", line 160, in get_output top_output) File "/home/aistudio/work/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn.py", line 107, in build body_feats, spatial_scale = self.fpn.get_output(body_feats) File "/home/aistudio/work/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn.py", line 327, in train return self.build(feed_vars, 'train') File "tools/train.py", line 117, in main train_fetches = model.train(feed_vars) File "tools/train.py", line 372, in main()

Error Message Summary:

InvalidArgumentError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 256, 200, 223] and the shape of Y = [1, 256, 200, 224]. Received [223] in X is not equal to [224] in Y at i:3. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] at (/paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:157) [operator < elementwise_add > error] terminate called without an active exception W0924 11:08:10.730060 240 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly W0924 11:08:10.730101 240 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0924 11:08:10.730108 240 init.cc:221] The detail failure signal is:

W0924 11:08:10.730116 240 init.cc:224] *** Aborted at 1600916890 (unix time) try "date -d @1600916890" if you are using GNU date *** W0924 11:08:10.731886 240 init.cc:224] PC: @ 0x0 (unknown) W0924 11:08:10.731998 240 init.cc:224] *** SIGABRT (@0x3e8000000ae) received by PID 174 (TID 0x7fabaebfd700) from PID 174; stack trace: *** W0924 11:08:10.733278 240 init.cc:224] @ 0x7facd102c390 (unknown) W0924 11:08:10.734443 240 init.cc:224] @ 0x7facd0c86428 gsignal W0924 11:08:10.735666 240 init.cc:224] @ 0x7facd0c8802a abort W0924 11:08:10.736531 240 init.cc:224] @ 0x7fac91a0584a __gnu_cxx::__verbose_terminate_handler() W0924 11:08:10.737229 240 init.cc:224] @ 0x7fac91a03f47 __cxxabiv1::__terminate() W0924 11:08:10.738013 240 init.cc:224] @ 0x7fac91a03f7d std::terminate() W0924 11:08:10.738782 240 init.cc:224] @ 0x7fac91a03c5a __gxx_personality_v0 W0924 11:08:10.739491 240 init.cc:224] @ 0x7fac91d36b97 _Unwind_ForcedUnwind_Phase2 W0924 11:08:10.740166 240 init.cc:224] @ 0x7fac91d36e7d _Unwind_ForcedUnwind W0924 11:08:10.741370 240 init.cc:224] @ 0x7facd102b070 __GI___pthread_unwind W0924 11:08:10.742544 240 init.cc:224] @ 0x7facd1023845 __pthread_exit W0924 11:08:10.742830 240 init.cc:224] @ 0x55639407be59 PyThread_exit_thread W0924 11:08:10.742926 240 init.cc:224] @ 0x556393f01c17 PyEval_RestoreThread.cold.798 W0924 11:08:10.743966 240 init.cc:224] @ 0x7facc3b1329c (unknown) W0924 11:08:10.744233 240 init.cc:224] @ 0x556393ffd744 _PyMethodDef_RawFastCallKeywords W0924 11:08:10.744460 240 init.cc:224] @ 0x556393ffd861 _PyCFunction_FastCallKeywords W0924 11:08:10.744705 240 init.cc:224] @ 0x5563940692bd _PyEval_EvalFrameDefault W0924 11:08:10.744915 240 init.cc:224] @ 0x556393fad539 _PyEval_EvalCodeWithName W0924 11:08:10.745121 240 init.cc:224] @ 0x556393fae635 _PyFunction_FastCallDict W0924 11:08:10.745321 240 init.cc:224] @ 0x556393fcce53 _PyObject_Call_Prepend W0924 11:08:10.745432 240 init.cc:224] @ 0x556394004a3a slot_tp_call W0924 11:08:10.745647 240 init.cc:224] @ 0x5563940058fb _PyObject_FastCallKeywords W0924 11:08:10.745869 240 init.cc:224] @ 0x556394068e86 _PyEval_EvalFrameDefault W0924 11:08:10.746074 240 init.cc:224] @ 0x556393fae56b _PyFunction_FastCallDict W0924 11:08:10.746273 240 init.cc:224] @ 0x556393fcce53 _PyObject_Call_Prepend W0924 11:08:10.746382 240 init.cc:224] @ 0x556394004a3a slot_tp_call W0924 11:08:10.746623 240 init.cc:224] @ 0x5563940058fb _PyObject_FastCallKeywords W0924 11:08:10.746874 240 init.cc:224] @ 0x5563940696e8 _PyEval_EvalFrameDefault W0924 11:08:10.747083 240 init.cc:224] @ 0x556393fad539 _PyEval_EvalCodeWithName W0924 11:08:10.747289 240 init.cc:224] @ 0x556393fae635 _PyFunction_FastCallDict W0924 11:08:10.747532 240 init.cc:224] @ 0x556393fcce53 _PyObject_Call_Prepend W0924 11:08:10.747802 240 init.cc:224] @ 0x556393fbfdbe PyObject_Call Aborted (core dumped)

配置文件为paddledetection config目录里的cascade_rcnn_r50_fpn_1x.yml ，修改了数据集部分。

architecture: CascadeRCNN max_iters: 90000 snapshot_iter: 1000 use_gpu: true log_smooth_window: 20 save_dir: output pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar #pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.tar

weights: output/cascade_rcnn_r50_fpn_1x/model_final metric: COCO num_classes: 7

CascadeRCNN: backbone: ResNet fpn: FPN rpn_head: FPNRPNHead roi_extractor: FPNRoIAlign bbox_head: CascadeBBoxHead bbox_assigner: CascadeBBoxAssigner

ResNet: norm_type: affine_channel depth: 50 feature_maps: [2, 3, 4, 5] freeze_at: 2 variant: b

FPN: min_level: 2 max_level: 6 num_chan: 256 spatial_scale: [0.03125, 0.0625, 0.125, 0.25]

FPNRPNHead: anchor_generator: anchor_sizes: [32, 64, 128, 256, 512] aspect_ratios: [0.5, 1.0, 2.0] stride: [16.0, 16.0] variance: [1.0, 1.0, 1.0, 1.0] anchor_start_size: 32 min_level: 2 max_level: 6 num_chan: 256 rpn_target_assign: rpn_batch_size_per_im: 256 rpn_fg_fraction: 0.5 rpn_positive_overlap: 0.7 rpn_negative_overlap: 0.3 rpn_straddle_thresh: 0.0 train_proposal: min_size: 0.0 nms_thresh: 0.7 pre_nms_top_n: 2000 post_nms_top_n: 2000 test_proposal: min_size: 0.0 nms_thresh: 0.7 pre_nms_top_n: 1000 post_nms_top_n: 1000

FPNRoIAlign: canconical_level: 4 canonical_size: 224 min_level: 2 max_level: 5 box_resolution: 7 sampling_ratio: 2

CascadeBBoxAssigner: batch_size_per_im: 512 bbox_reg_weights: [10, 20, 30] bg_thresh_lo: [0.0, 0.0, 0.0] bg_thresh_hi: [0.5, 0.6, 0.7] fg_thresh: [0.5, 0.6, 0.7] fg_fraction: 0.25

CascadeBBoxHead: head: CascadeTwoFCHead nms: keep_top_k: 100 nms_threshold: 0.5 score_threshold: 0.05

CascadeTwoFCHead: mlp_dim: 1024

LearningRate: base_lr: 0.002 schedulers:

!PiecewiseDecay gamma: 0.1 milestones: [60000, 80000]
!LinearWarmup start_factor: 0.3333333333333333 steps: 500

OptimizerBuilder: optimizer: momentum: 0.9 type: Momentum regularizer: factor: 0.0001 type: L2

TrainReader: inputs_def: fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] dataset: !COCODataSet image_dir: images anno_path: Annotations/train.json dataset_dir: /home/aistudio/work/PCB_DATASET sample_transforms:

!DecodeImage to_rgb: true
!RandomFlipImage prob: 0.5
!NormalizeImage is_channel_first: false is_scale: true mean: [0.485,0.456,0.406] std: [0.229, 0.224,0.225]
!ResizeImage target_size: 800 max_size: 1333 interp: 1 use_cv2: true
!Permute to_bgr: false channel_first: true batch_transforms:
!PadBatch pad_to_stride: -1. use_padded_im_info: false batch_size: 1 shuffle: true worker_num: 2 use_process: false

EvalReader: inputs_def: fields: ['image', 'im_info', 'im_id', 'im_shape'] # for voc #fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult'] dataset: !COCODataSet image_dir: images anno_path: Annotations/val.json dataset_dir: /home/aistudio/work/PCB_DATASET sample_transforms:

!DecodeImage to_rgb: true
!NormalizeImage is_channel_first: false is_scale: true mean: [0.485,0.456,0.406] std: [0.229, 0.224,0.225]
!ResizeImage interp: 1 max_size: 1333 target_size: 800 use_cv2: true
!Permute channel_first: true to_bgr: false batch_size: 1 shuffle: false drop_empty: false worker_num: 2

TestReader: inputs_def: fields: ['image', 'im_info', 'im_id', 'im_shape'] dataset: !ImageFolder anno_path: /home/aistudio/work/PCB_DATASET/Annotations/val.json sample_transforms:

!DecodeImage to_rgb: true with_mixup: false
!NormalizeImage is_channel_first: false is_scale: true mean: [0.485,0.456,0.406] std: [0.229, 0.224,0.225]
!ResizeImage interp: 1 max_size: 1333 target_size: 800 use_cv2: true
!Permute channel_first: true to_bgr: false batch_size: 1 shuffle: false

PaddlePaddle / PaddleDetection 1 年多 前同步成功

cascade配置进行训练，报错Received [223] in X is not equal to [224] in Y at i:3

C++ Call Stacks (More useful to developers):

Python Call Stacks (More useful to users):

Error Message Summary:

PaddlePaddle / PaddleDetection
1 年多前同步成功