未验证 提交 47fc188b 编写于 作者: M Mark Ma 提交者: GitHub

[PaddlePaddle Hackathon] Task 66 (#4317)

* allow forwarding for multi scale inputs

* fixed PadBatch not working with multi scale input

* add faster_rcnn_r34_fpn_multiscaletest_1x_coco.yml

* fixed multi scale input checking for batch transforms (PadBatch)

* less scales and faster inference for multi scale test configuration file

* fixed retrieving im_id from multi-scale inputs

* add multi scale prediction boxes merging

* allow multi scale testing for tools/infer.py (actually trainer.predict method)

* add MultiscaleTestResize setting for TestReader

* use BBoxPostProcess.nms parameter for multi scale prediction NMS of CascadeRCNN

* add documentation for multi scale testing

* raise exception if the network architecture is not supported in multi scale test mode

* move documentation file

* add test case for multi scale test

* load pretrained weights from URL

* add image files for testing. use relative path for input images

* update documentation

* use isinstance for sequence type checking. add empty checking for samples sequence
上级 5b949596
_BASE_: [
'faster_rcnn_r34_fpn_1x_coco.yml',
]
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_pretrained.pdparams
weights: output/faster_rcnn_r34_fpn_multiscaletest_1x_coco/model_final
EvalReader:
sample_transforms:
- Decode: {}
# - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900], use_flip: False}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
TestReader:
sample_transforms:
- Decode: {}
# - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900], use_flip: False}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
\ No newline at end of file
# Multi Scale Test Configuration
Tags: Configuration
---
```yaml
##################################### Multi scale test configuration #####################################
EvalReader:
sample_transforms:
- Decode: {}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
TestReader:
sample_transforms:
- Decode: {}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
```
---
Multi Scale Test is a TTA (Test Time Augmentation) method, it can improve object detection performance.
The input image will be scaled into different scales, then model generated predictions (bboxes) at different scales, finally all the predictions will be combined to generate final prediction. (Here **NMS** is used to aggregate the predictions.)
## _MultiscaleTestResize_ option
`MultiscaleTestResize` option is used to enable multi scale test prediction.
`origin_target_size: [800, 1333]` means the input image will be scaled to 800 (for short edge) and 1333 (max edge length cannot be greater than 1333) at first
`target_size: [700 , 900]` property is used to specify different scales.
It can be plugged into evaluation process or test (inference) process, by adding `MultiscaleTestResize` entry to `EvalReader.sample_transforms` or `TestReader.sample_transforms`
---
###Note
Now only CascadeRCNN, FasterRCNN and MaskRCNN are supported for multi scale testing. And batch size must be 1.
\ No newline at end of file
# 多尺度测试的配置
标签: 配置
---
```yaml
##################################### 多尺度测试的配置 #####################################
EvalReader:
sample_transforms:
- Decode: {}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
TestReader:
sample_transforms:
- Decode: {}
- MultiscaleTestResize: {origin_target_size: [800, 1333], target_size: [700 , 900]}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
```
---
多尺度测试是一种TTA方法(测试时增强),可以用于提高目标检测的准确率
输入图像首先被缩放为不同尺度的图像,然后模型对这些不同尺度的图像进行预测,最后将这些不同尺度上的预测结果整合为最终预测结果。(这里使用了**NMS**来整合不同尺度的预测结果)
## _MultiscaleTestResize_ 选项
`MultiscaleTestResize` 选项用于开启多尺度测试.
`origin_target_size: [800, 1333]` 项代表输入图像首先缩放为短边为800,最长边不超过1333.
`target_size: [700 , 900]` 项设置不同的预测尺度。
通过在`EvalReader.sample_transforms``TestReader.sample_transforms`中设置`MultiscaleTestResize`项,可以在评估过程或预测过程中开启多尺度测试。
---
###注意
目前多尺度测试只支持CascadeRCNN, FasterRCNN and MaskRCNN网络, 并且batch size需要是1.
\ No newline at end of file
......@@ -16,6 +16,8 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import typing
try:
from collections.abc import Sequence
except Exception:
......@@ -58,7 +60,13 @@ class PadBatch(BaseOperator):
"""
coarsest_stride = self.pad_to_stride
max_shape = np.array([data['image'].shape for data in samples]).max(
# multi scale input is nested list
if isinstance(samples, typing.Sequence) and len(samples) > 0 and isinstance(samples[0], typing.Sequence):
inner_samples = samples[0]
else:
inner_samples = samples
max_shape = np.array([data['image'].shape for data in inner_samples]).max(
axis=0)
if coarsest_stride > 0:
max_shape[1] = int(
......@@ -66,7 +74,7 @@ class PadBatch(BaseOperator):
max_shape[2] = int(
np.ceil(max_shape[2] / coarsest_stride) * coarsest_stride)
for data in samples:
for data in inner_samples:
im = data['image']
im_c, im_h, im_w = im.shape[:]
padding_im = np.zeros(
......
......@@ -22,6 +22,7 @@ import copy
import time
import numpy as np
import typing
from PIL import Image
import paddle
......@@ -428,7 +429,11 @@ class Trainer(object):
for metric in self._metrics:
metric.update(data, outs)
sample_num += data['im_id'].numpy().shape[0]
# multi-scale inputs: all inputs have same im_id
if isinstance(data, typing.Sequence):
sample_num += data[0]['im_id'].numpy().shape[0]
else:
sample_num += data['im_id'].numpy().shape[0]
self._compose_callback.on_step_end(self.status)
self.status['sample_num'] = sample_num
......@@ -471,7 +476,10 @@ class Trainer(object):
outs = self.model(data)
for key in ['im_shape', 'scale_factor', 'im_id']:
outs[key] = data[key]
if isinstance(data, typing.Sequence):
outs[key] = data[0][key]
else:
outs[key] = data[key]
for key, value in outs.items():
if hasattr(value, 'numpy'):
outs[key] = value.numpy()
......
......@@ -21,6 +21,7 @@ import sys
import json
import paddle
import numpy as np
import typing
from .map_utils import prune_zero_padding, DetectionMAP
from .coco_utils import get_infer_results, cocoapi_eval
......@@ -97,7 +98,11 @@ class COCOMetric(Metric):
for k, v in outputs.items():
outs[k] = v.numpy() if isinstance(v, paddle.Tensor) else v
im_id = inputs['im_id']
# multi-scale inputs: all inputs have same im_id
if isinstance(inputs, typing.Sequence):
im_id = inputs[0]['im_id']
else:
im_id = inputs['im_id']
outs['im_id'] = im_id.numpy() if isinstance(im_id,
paddle.Tensor) else im_id
......
......@@ -2,9 +2,13 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import paddle
import paddle.nn as nn
import typing
from ppdet.core.workspace import register
from static.ppdet.utils.post_process import nms
__all__ = ['BaseArch']
......@@ -25,7 +29,53 @@ class BaseArch(nn.Layer):
if self.training:
out = self.get_loss()
else:
out = self.get_pred()
inputs_list = []
# multi-scale input
if not isinstance(inputs, typing.Sequence):
inputs_list.append(inputs)
else:
inputs_list.extend(inputs)
outs = []
for inp in inputs_list:
self.inputs = inp
outs.append(self.get_pred())
# multi-scale test
if len(outs)>1:
out = self.merge_multi_scale_predictions(outs)
else:
out = outs[0]
return out
def merge_multi_scale_predictions(self, outs):
# default values for architectures not included in following list
num_classes = 80
nms_threshold = 0.5
keep_top_k = 100
if self.__class__.__name__ in ('CascadeRCNN', 'FasterRCNN', 'MaskRCNN'):
num_classes = self.bbox_head.num_classes
keep_top_k = self.bbox_post_process.nms.keep_top_k
nms_threshold = self.bbox_post_process.nms.nms_threshold
else:
raise Exception("Multi scale test only supports CascadeRCNN, FasterRCNN and MaskRCNN for now")
final_boxes = []
all_scale_outs = paddle.concat([o['bbox'] for o in outs]).numpy()
for c in range(num_classes):
idxs = all_scale_outs[:, 0] == c
if np.count_nonzero(idxs) == 0:
continue
r = nms(all_scale_outs[idxs, 1:], nms_threshold)
final_boxes.append(np.concatenate([np.full((r.shape[0], 1), c), r], 1))
out = np.concatenate(final_boxes)
out = np.concatenate(sorted(out, key=lambda e: e[1])[-keep_top_k:]).reshape((-1, 6))
out = {
'bbox': paddle.to_tensor(out),
'bbox_num': paddle.to_tensor(np.array([out.shape[0], ]))
}
return out
def build_inputs(self, data, input_def):
......
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import unittest
from ppdet.core.workspace import load_config
from ppdet.engine import Trainer
class TestMultiScaleInference(unittest.TestCase):
def setUp(self):
self.set_config()
def set_config(self):
self.mstest_cfg_file = 'configs/faster_rcnn/faster_rcnn_r34_fpn_multiscaletest_1x_coco.yml'
# test evaluation with multi scale test
def test_eval_mstest(self):
cfg = load_config(self.mstest_cfg_file)
trainer = Trainer(cfg, mode='eval')
cfg.weights = 'https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_fpn_1x_coco.pdparams'
trainer.load_weights(cfg.weights)
trainer.evaluate()
# test inference with multi scale test
def test_infer_mstest(self):
cfg = load_config(self.mstest_cfg_file)
trainer = Trainer(cfg, mode='test')
cfg.weights = 'https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_fpn_1x_coco.pdparams'
trainer.load_weights(cfg.weights)
tests_img_root = os.path.join(os.path.dirname(__file__), 'imgs')
# input images to predict
imgs = ['coco2017_val2017_000000000139.jpg', 'coco2017_val2017_000000000724.jpg']
imgs = [os.path.join(tests_img_root, img) for img in imgs]
trainer.predict(imgs,
draw_threshold=0.5,
output_dir='output',
save_txt=True)
if __name__ == '__main__':
unittest.main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册