Faster-RCNN在做迁移学习时出现shape 错误 (#977) · Issue · PaddlePaddle / PaddleDetection

Faster-RCNN在做迁移学习时出现shape 错误

Created by: nihuizhidao

在使用下面的config.yml时：

architecture: FasterRCNN max_iters: 18000 snapshot_iter: 1000 use_gpu: true log_smooth_window: 20 save_dir: output pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_2x.tar weights: output/fasterrcnn/model_final metric: COCO finetune_exclude_pretrained_params: ['cls_score', 'bbox_pred'] num_classes: 7

FasterRCNN: backbone: ResNet fpn: FPN rpn_head: FPNRPNHead roi_extractor: FPNRoIAlign bbox_head: BBoxHead bbox_assigner: BBoxAssigner

ResNet: depth: 101 feature_maps: [2, 3, 4, 5] freeze_at: 2 norm_type: affine_channel variant: d

FPN: max_level: 6 min_level: 2 num_chan: 256 spatial_scale: [0.03125, 0.0625, 0.125, 0.25]

FPNRPNHead: anchor_generator: anchor_sizes: [32, 64, 128, 256, 512] aspect_ratios: [0.5, 1.0, 2.0] stride: [16.0, 16.0] variance: [1.0, 1.0, 1.0, 1.0] anchor_start_size: 32 max_level: 6 min_level: 2 num_chan: 256 rpn_target_assign: rpn_batch_size_per_im: 256 rpn_fg_fraction: 0.5 rpn_negative_overlap: 0.3 rpn_positive_overlap: 0.7 rpn_straddle_thresh: 0.0 train_proposal: min_size: 0.0 nms_thresh: 0.7 post_nms_top_n: 2000 pre_nms_top_n: 2000 test_proposal: min_size: 0.0 nms_thresh: 0.7 post_nms_top_n: 1000 pre_nms_top_n: 1000

FPNRoIAlign: canconical_level: 4 canonical_size: 224 max_level: 5 min_level: 2 box_resolution: 7 sampling_ratio: 2

BBoxAssigner: batch_size_per_im: 512 bbox_reg_weights: [0.1, 0.1, 0.2, 0.2] bg_thresh_hi: 0.5 bg_thresh_lo: 0.0 fg_fraction: 0.25 fg_thresh: 0.5

BBoxHead: head: TwoFCHead nms: keep_top_k: 100 nms_threshold: 0.5 score_threshold: 0.05

TwoFCHead: mlp_dim: 1024

LearningRate: base_lr: 0.0005 schedulers: - !PiecewiseDecay gamma: 0.1 milestones: [12000, 16000] - !LinearWarmup start_factor: 0.3333333333333333 steps: 100

OptimizerBuilder: optimizer: momentum: 0.9 type: Momentum regularizer: factor: 0.0001 type: L2

TrainReader: inputs_def: fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] dataset: !COCODataSet image_dir: train anno_path: annotations/instance_train.json dataset_dir: dataset/coco0621 sample_transforms: - !DecodeImage to_rgb: true - !RandomFlipImage prob: 0.5 - !NormalizeImage is_channel_first: false is_scale: true mean: [0.485,0.456,0.406] std: [0.229, 0.224,0.225] - !ResizeImage target_size: 800 max_size: 1333 interp: 1 use_cv2: true - !Permute to_bgr: false channel_first: true batch_size: 1 shuffle: true worker_num: 2

EvalReader: inputs_def: fields: ['image', 'im_info', 'im_id', 'im_shape'] # for voc #fields: ['image', 'im_info', 'im_id', 'im_shape', 'gt_bbox', 'gt_class', 'is_difficult'] dataset: !COCODataSet anno_path: annotations/instance_val.json dataset_dir: dataset/coco0621 image_dir: val sample_transforms: - !DecodeImage to_rgb: true - !NormalizeImage is_channel_first: false is_scale: true mean: [0.485,0.456,0.406] std: [0.229, 0.224,0.225] - !ResizeImage interp: 1 max_size: 1333 target_size: 800 use_cv2: true - !Permute channel_first: true to_bgr: false batch_size: 1 shuffle: false drop_empty: false worker_num: 2

TestReader: inputs_def: fields: ['image', 'im_info', 'im_id', 'im_shape'] dataset: !ImageFolder anno_path: annotations/instance_val.json dataset_dir: dataset/coco0621 sample_transforms: - !DecodeImage to_rgb: true with_mixup: false - !NormalizeImage is_channel_first: false is_scale: true mean: [0.485,0.456,0.406] std: [0.229, 0.224,0.225] - !ResizeImage interp: 1 max_size: 1333 target_size: 800 use_cv2: true - !Permute channel_first: true to_bgr: false batch_size: 1 shuffle: false

出现了下面的错误：

index created! 2020-06-21 22:05:50,966-INFO: 1019 samples in file dataset/coco0621/annotations/instance_train.json 2020-06-21 22:05:50,968-INFO: places would be ommited when DataLoader is not iterable /home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "tools/train.py", line 326, in main() File "tools/train.py", line 236, in main outs = exe.run(compiled_train_prog, fetch_list=train_values) File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/home/scc/anaconda3/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl return_merged=return_merged) File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel tensors = exe.run(fetch_var_names, return_merged)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) 2 paddle::operators::GetBroadcastDimsArrays(paddle::framework::DDim const&, paddle::framework::DDim const&, int*, int*, int*, int, int) 3 paddle::operators::ElementwiseOp::InferShape(paddle::framework::InferShapeContext*) const 4 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 6 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 7 paddle::framework::details::ComputationOpHandle::RunImpl() 8 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 9 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) 10 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) 11 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 12 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Python Call Stacks (More useful to users):

File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "/home/scc/anaconda3/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py", line 242, in impl attrs={'axis': axis}) File "/home/scc/Projects/AIDetectionProjects/HuaxueYaopin/ppdet/modeling/backbones/fpn.py", line 93, in _add_topdown_lateral return lateral + topdown File "/home/scc/Projects/AIDetectionProjects/HuaxueYaopin/ppdet/modeling/backbones/fpn.py", line 144, in get_output top_output) File "/home/scc/Projects/AIDetectionProjects/HuaxueYaopin/ppdet/modeling/architectures/faster_rcnn.py", line 98, in build body_feats, spatial_scale = self.fpn.get_output(body_feats) File "/home/scc/Projects/AIDetectionProjects/HuaxueYaopin/ppdet/modeling/architectures/faster_rcnn.py", line 240, in train return self.build(feed_vars, 'train') File "tools/train.py", line 117, in main train_fetches = model.train(feed_vars) File "tools/train.py", line 326, in main()

Error Message Summary:

InvalidArgumentError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 256, 47, 84] and the shape of Y = [1, 256, 48, 84]. Received [47] in X is not equal to [48] in Y at i:2. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] at (/paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:157) [operator < elementwise_add > error]

请问这个是什么原因呢？

数据集是我自己的数据集，在YOLOV3上面训练没有问题。

PaddlePaddle / PaddleDetection 大约 2 年 前同步成功

Faster-RCNN在做迁移学习时出现shape 错误

C++ Call Stacks (More useful to developers):

Python Call Stacks (More useful to users):

Error Message Summary:

PaddlePaddle / PaddleDetection
大约 2 年前同步成功