未验证 提交 2b8f904e 编写于 作者: C chengjuntao 提交者: GitHub

Add RRPN models for PaddleCV (#4148)



* add rrpn for models
上级 1853d687
# RRPN 旋转物体检测
---
## 内容
- [安装](#安装)
- [简介](#简介)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
- [模型评估](#模型评估)
- [模型推断及可视化](#模型推断及可视化)
## 安装
在当前目录下运行样例代码需要PadddlePaddle Fluid的develop或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/)中的说明来更新PaddlePaddle。
## 简介
RRPN是在Faster RCNN基础上拓展出的两阶段目标检测器,可用于文字检测和旋转物体检测。通过对图像生成候选区域,提取特征,判别特征类别并修正候选框位置。
[RRPN](https://arxiv.org/abs/1703.01086) 整体网络可以分为4个主要内容:
1. 基础卷积层。作为一种卷积神经网络目标检测方法,RRPN首先使用一组基础的卷积网络提取图像的特征图。特征图被后续RPN层和全连接层共享。本示例采用[ResNet-50](https://arxiv.org/abs/1512.03385)作为基础卷积层。
2. 区域生成网络(RPN)。RPN网络用于生成候选区域(proposals)。该层通过一组固定的尺寸、比例和角度得到一组带方向锚点(anchors), 通过softmax判断旋转的锚点属于前景或者背景,再利用区域回归修正锚点从而获得精确的候选区域。
3. Rotated RoI Align。该层收集输入的特征图和带方向的候选区域,将带方向的候选区域映射到特征图中进行并池化为统一大小的区域特征图,送入全连接层判定目标类别。
4. 检测层。利用区域特征图计算候选区域的类别,同时再次通过区域回归获得检测框最终的精确位置。
### 编译自定义OP
自定义OP编译方式如下:
进入 `ext_op/src` 目录,执行编译脚本
```
cd ext_op/src
sh make.sh ${cuda_path} ${cudnn_path} ${nccl_path}
'''
其中${cuda_path}、$cudnn_path}和{nccl_path}分别为cuda、cudnn、nccl的安装路径,需通过命令行进行指定
成功编译后,`ext_op/src` 目录下将会生成 `rrpn_lib.so`
## 数据准备
### 公开数据集
[ICDAR2015数据集](https://rrc.cvc.uab.es/?ch=4&com=downloads)上进行训练,数据集需进入官网进行注册后方可下载。
数据目录结构如下:
```
dataset/icdar2015/
├── ch4_training_images
│ ├── img_143.jpg
│ ├── img_144.jpg
| ...
├── ch4_training_localization_transcription_gt
│ ├── gt_img_143.txt
│ ├── gt_img_144.txt
| ...
├── ch4_test_images
│ ├── img_111.jpg
│ ├── img_112.jpg
| ...
├── ch4_test_localization_transcription_gt
│ ├── img_111.jpg
│ ├── img_112.jpg
| ...
```
### 自定义数据
原始的RRPN只提供了二分类,若要使用自己数据进行训练多分类,需在utility.py中将dataset改为icdar2017,然后将class_num改为需求类别数,其中0为背景类。
训练自定义数据时,数据目录结构和ICDAR2015一致,标注数据格式如下:
```
x1, y1, x2, y2, x3, y3, x4, y4, class_name
x1, y1, x2, y2, x3, y3, x4, y4, class_name
```
## 模型训练
**下载预训练模型:** 本示例提供Resnet-50预训练模型,采用如下命令下载预训练模型:
sh ./pretrained/download.sh
通过初始化`pretrained_model` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。
请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。
- RRPN
```
python train.py \
--model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model} \
--data_dir=${path_to_data} \
```
- 通过设置export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。
- 可选参数见:
python train.py --help
**数据读取器说明:** 数据读取器定义在reader.py中。所有图像将短边等比例缩放至`scales`,若长边大于`max_size`, 则再次将长边等比例缩放至`max_size`。在训练阶段,对图像采用随机旋转。
**模型设置:**
* 使用RotatedRoIAlign方法。
* 训练过程pre\_nms=12000, post\_nms=2000,测试过程pre\_nms=6000, post\_nms=1000。nms阈值为0.7。
* RPN网络得到labels的过程中,fg\_fraction=0.25,fg\_thresh=0.5,bg\_thresh_hi=0.5,bg\_thresh\_lo=0.0
* RPN选择anchor时,rpn\_fg\_fraction=0.5,rpn\_positive\_overlap=0.7,rpn\_negative\_overlap=0.3
**训练策略:**
* 默认配置采用8卡,每卡batch size=1
* 采用momentum优化算法训练,momentum=0.9。
* 权重衰减系数为0.02,前500轮学习率从0.00333线性增加至0.01。在6250,12500轮时使用0.1,0.01乘子进行学习率衰减,最大训练17500轮。训练最大轮数和学习率策略可以在config.py中对max_iter和lr_steps进行设置。
* 非基础卷积层卷积bias学习率为整体学习率2倍。
* 基础卷积层中,affine_layers参数不更新,res2层参数不更新。
## 模型评估
模型评估是指对训练完毕的模型评估各类性能指标。本示例采用[ICDAR2015官方评估](https://rrc.cvc.uab.es/?com=contestant)
`eval.py`是评估模块的主要执行程序,调用示例如下:
- RRPN
```
python eval.py \
--dataset=icdar2015 \
--pretrained_model=${path_to_trained_model}
```
- 通过设置`--pretrained_model=${path_to_trained_model}`指定训练好的模型,注意不是初始化的模型。
- 通过设置`export CUDA\_VISIBLE\_DEVICES=0`指定单卡GPU评估。
下表为模型评估结果:
RRPN
| 模型 | 批量大小 | 迭代次数 | F1 |
| :--------------- | :------------: | :------------------: |------: |
| [RRPN](https://paddleseg.bj.bcebos.com/deploy/temp/model_final.tar) |8 | 17500 | 0.8048 |
## 模型推断及可视化
模型推断可以获取图像中的物体及其对应的类别,`infer.py`是主要执行程序,调用示例如下:
```
python infer.py \
--pretrained_model=${path_to_trained_model} \
--image_path=dataset/icdar2015 \
--draw_threshold=0.6
```
注意,请正确设置模型路径`${path_to_trained_model}`和预测图片路径。默认使用GPU设备,也可通过设置`--use_gpu=False`使用CPU设备。可通过设置`draw_threshold`调节得分阈值控制检测框的个数。
下图为模型可视化预测结果:
<p align="center">
<img src="image/img_120.jpg" height=576 width=1024 hspace='10'/>
<img src="image/img_119.jpg" height=576 width=1024 hspace='10'/> <br />
RRPN 预测可视化
</p>
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import errno
import os
import shutil
import time
import numpy as np
import re
import paddle.fluid as fluid
import logging
logger = logging.getLogger(__name__)
def load_params(exe, prog, path):
"""
Load model from the given path.
Args:
exe (fluid.Executor): The fluid.Executor object.
prog (fluid.Program): load weight to which Program object.
path (string): URL string or loca model path.
"""
if not os.path.exists(path):
raise ValueError("Model pretrain path {} does not "
"exists.".format(path))
logger.info('Loading parameters from {}...'.format(path))
def _if_exist(var):
param_exist = os.path.exists(os.path.join(path, var.name))
do_load = param_exist
if do_load:
logger.debug('load weight {}'.format(var.name))
return do_load
fluid.io.load_vars(exe, path, prog, predicate=_if_exist)
def save(exe, prog, path):
"""
Load model from the given path.
Args:
exe (fluid.Executor): The fluid.Executor object.
prog (fluid.Program): save weight from which Program object.
path (string): the path to save model.
"""
if os.path.isdir(path):
shutil.rmtree(path)
logger.info('Save model to {}.'.format(path))
fluid.io.save_persistables(exe, path, prog)
def load_and_fusebn(exe, prog, path):
"""
Fuse params of batch norm to scale and bias.
Args:
exe (fluid.Executor): The fluid.Executor object.
prog (fluid.Program): save weight from which Program object.
path (string): the path to save model.
"""
logger.info('Load model and fuse batch norm if have from {}...'.format(
path))
if not os.path.exists(path):
raise ValueError("Model path {} does not exists.".format(path))
def _if_exist(var):
b = os.path.exists(os.path.join(path, var.name))
if b:
logger.debug('load weight {}'.format(var.name))
return b
all_vars = list(filter(_if_exist, prog.list_vars()))
# Since the program uses affine-channel, there is no running mean and var
# in the program, here append running mean and var.
# NOTE, the params of batch norm should be like:
# x_scale
# x_offset
# x_mean
# x_variance
# x is any prefix
mean_variances = set()
bn_vars = []
bn_in_path = True
inner_prog = fluid.Program()
inner_start_prog = fluid.Program()
inner_block = inner_prog.global_block()
with fluid.program_guard(inner_prog, inner_start_prog):
for block in prog.blocks:
ops = list(block.ops)
if not bn_in_path:
break
for op in ops:
if op.type == 'affine_channel':
# remove 'scale' as prefix
scale_name = op.input('Scale')[0] # _scale
bias_name = op.input('Bias')[0] # _offset
prefix = scale_name[:-5]
mean_name = prefix + 'mean'
variance_name = prefix + 'variance'
if not os.path.exists(os.path.join(path, mean_name)):
bn_in_path = False
break
if not os.path.exists(os.path.join(path, variance_name)):
bn_in_path = False
break
bias = block.var(bias_name)
mean_vb = inner_block.create_var(
name=mean_name,
type=bias.type,
shape=bias.shape,
dtype=bias.dtype,
persistable=True)
variance_vb = inner_block.create_var(
name=variance_name,
type=bias.type,
shape=bias.shape,
dtype=bias.dtype,
persistable=True)
mean_variances.add(mean_vb)
mean_variances.add(variance_vb)
bn_vars.append(
[scale_name, bias_name, mean_name, variance_name])
if not bn_in_path:
fluid.io.load_vars(exe, path, prog, vars=all_vars)
logger.warning(
"There is no paramters of batch norm in model {}. "
"Skip to fuse batch norm. And load paramters done.".format(path))
return
# load running mean and running variance on cpu place into global scope.
place = fluid.CPUPlace()
exe_cpu = fluid.Executor(place)
fluid.io.load_vars(exe_cpu, path, vars=[v for v in mean_variances])
# load params on real place into global scope.
fluid.io.load_vars(exe, path, prog, vars=all_vars)
eps = 1e-5
for names in bn_vars:
scale_name, bias_name, mean_name, var_name = names
scale = fluid.global_scope().find_var(scale_name).get_tensor()
bias = fluid.global_scope().find_var(bias_name).get_tensor()
mean = fluid.global_scope().find_var(mean_name).get_tensor()
var = fluid.global_scope().find_var(var_name).get_tensor()
scale_arr = np.array(scale)
bias_arr = np.array(bias)
mean_arr = np.array(mean)
var_arr = np.array(var)
bn_std = np.sqrt(np.add(var_arr, eps))
new_scale = np.float32(np.divide(scale_arr, bn_std))
new_bias = bias_arr - mean_arr * new_scale
# fuse to scale and bias in affine_channel
scale.set(new_scale, exe.place)
bias.set(new_bias, exe.place)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from edict import AttrDict
import six
import numpy as np
_C = AttrDict()
cfg = _C
#
# Training options
#
_C.TRAIN = AttrDict()
# scales an image's shortest side
_C.TRAIN.scales = [800]
# max size of longest side
_C.TRAIN.max_size = 1333
# images per GPU in minibatch
_C.TRAIN.im_per_batch = 1
# roi minibatch size per image
_C.TRAIN.batch_size_per_im = 256
# target fraction of foreground roi minibatch
_C.TRAIN.fg_fractrion = 0.25
# overlap threshold for a foreground roi
_C.TRAIN.fg_thresh = 0.5
# overlap threshold for a background roi
_C.TRAIN.bg_thresh_hi = 0.5
_C.TRAIN.bg_thresh_lo = 0.0
# If False, only resize image and not pad, image shape is different between
# GPUs in one mini-batch. If True, image shape is the same in one mini-batch.
_C.TRAIN.padding_minibatch = False
# Snapshot period
_C.TRAIN.snapshot_iter = 1000
# number of RPN proposals to keep before NMS
_C.TRAIN.rpn_pre_nms_top_n = 12000
# number of RPN proposals to keep after NMS
_C.TRAIN.rpn_post_nms_top_n = 2000
# NMS threshold used on RPN proposals
_C.TRAIN.rpn_nms_thresh = 0.7
# min size in RPN proposals
_C.TRAIN.rpn_min_size = 0.0
# eta for adaptive NMS in RPN
_C.TRAIN.rpn_eta = 1.0
# number of RPN examples per image
_C.TRAIN.rpn_batch_size_per_im = 256
# remove anchors out of the image
_C.TRAIN.rpn_straddle_thresh = 0.
# target fraction of foreground examples pre RPN minibatch
_C.TRAIN.rpn_fg_fraction = 0.5
# min overlap between anchor and gt box to be a positive examples
_C.TRAIN.rpn_positive_overlap = 0.7
# max overlap between anchor and gt box to be a negative examples
_C.TRAIN.rpn_negative_overlap = 0.3
# stopgrad at a specified stage
_C.TRAIN.freeze_at = 2
# min area of ground truth box
_C.TRAIN.gt_min_area = -1
#
# Inference options
#
_C.TEST = AttrDict()
# scales an image's shortest side
_C.TEST.scales = [800]
# max size of longest side
_C.TEST.max_size = 1333
# eta for adaptive NMS in RPN
_C.TEST.rpn_eta = 1.0
# min score threshold to infer
_C.TEST.score_thresh = 0.01
# overlap threshold used for NMS
_C.TEST.nms_thresh = 0.3
# number of RPN proposals to keep before NMS
_C.TEST.rpn_pre_nms_top_n = 6000
# number of RPN proposals to keep after NMS
_C.TEST.rpn_post_nms_top_n = 1000
# min size in RPN proposals
_C.TEST.rpn_min_size = 0.0
# max number of detections
_C.TEST.detections_per_im = 300
# NMS threshold used on RPN proposals
_C.TEST.rpn_nms_thresh = 0.7
#
# Model options
#
# Whether use mask rcnn head
_C.MASK_ON = True
# weight for bbox regression targets
_C.bbox_reg_weights = [10.0, 10.0, 5.0, 5.0, 1.0]
# RPN anchor sizes
_C.anchor_sizes = [128, 256, 512]
# RPN anchor ratio
_C.aspect_ratio = [0.2, 0.5, 1.0]
# RPN anchor angle
_C.anchor_angle = [-30.0, 0.0, 30.0, 60.0, 90.0, 120.0]
# variance of anchors
_C.variances = [1., 1., 1., 1., 1.]
# stride of feature map
_C.rpn_stride = [16.0, 16.0]
# pooled width and pooled height
_C.roi_resolution = 14
# spatial scale
_C.spatial_scale = 1. / 16.
# resolution to represent rotated roi align
_C.resolution = 14
#
# SOLVER options
#
# derived learning rate the to get the final learning rate.
_C.learning_rate = 0.01
# maximum number of iterations
_C.max_iter = 140000
# warm up to learning rate
_C.warm_up_iter = 500
_C.start_factor = 1. / 3
# lr steps_with_decay
_C.lr_steps = [6250, 12500]
_C.lr_gamma = 0.1
# L2 regularization hyperparameter
_C.weight_decay = 0.0001
# momentum with SGD
_C.momentum = 0.9
#
# ENV options
#
# support both CPU and GPU
_C.use_gpu = True
# Whether use parallel
_C.parallel = True
# Class number
_C.class_num = 81
# support pyreader
_C.use_pyreader = True
_C.TRAIN.min_size = 800
_C.TRAIN.max_size = 1333
_C.TEST.min_size = 1000
# pixel mean values
_C.pixel_means = [0.485, 0.456, 0.406]
_C.pixel_std = [0.229, 0.224, 0.225]
# clip box to prevent overflowing
_C.bbox_clip = np.log(1000. / 16.)
def merge_cfg_from_args(args, mode):
"""Merge config keys, values in args into the global config."""
if mode == 'train':
sub_d = _C.TRAIN
else:
sub_d = _C.TEST
for k, v in sorted(six.iteritems(vars(args))):
d = _C
try:
value = eval(v)
except:
value = v
if k in sub_d:
sub_d[k] = value
else:
d[k] = value
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# Detectron
# Copyright (c) 2017-present, Facebook, Inc.
# Licensed under the Apache License, Version 2.0;
# Written by Ross Girshick
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import cv2
import numpy as np
from config import cfg
import os
from PIL import Image
class DatasetPath(object):
def __init__(self, mode, dataset_name):
self.mode = mode
self.data_dir = dataset_name
def get_data_dir(self):
if self.mode == 'train':
return os.path.join(self.data_dir, 'ch4_training_images')
elif self.mode == 'val':
return os.path.join(self.data_dir, 'ch4_test_images')
def get_file_list(self):
if self.mode == 'train':
return os.path.join(self.data_dir,
'ch4_training_localization_transcription_gt')
elif self.mode == 'val':
return os.path.join(self.data_dir,
'ch4_test_localization_transcription_gt')
def get_image_blob(roidb, mode):
"""Builds an input blob from the images in the roidb at the specified
scales.
"""
if mode == 'train' or mode == 'val':
with open(roidb['image'], 'rb') as f:
data = f.read()
data = np.frombuffer(data, dtype='uint8')
img = cv2.imdecode(data, 1)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
gt_boxes = roidb['boxes']
gt_label = roidb['gt_classes']
# resize
if mode == 'train':
img, im_scale = _resize(img, target_size=800, max_size=1333)
need_gt_boxes = gt_boxes.copy()
need_gt_boxes[:, :4] *= im_scale
img, need_gt_boxes, need_gt_label = _rotation(
img, need_gt_boxes, gt_label, prob=1.0, gt_margin=1.4)
else:
img, im_scale = _resize(img, target_size=1000, max_size=1778)
need_gt_boxes = gt_boxes
need_gt_label = gt_label
img = img.astype(np.float32, copy=False)
img = img / 255.0
mean = np.array(cfg.pixel_means)[np.newaxis, np.newaxis, :]
std = np.array(cfg.pixel_std)[np.newaxis, np.newaxis, :]
img -= mean
img /= std
img = img.transpose((2, 0, 1))
return img, im_scale, need_gt_boxes, need_gt_label
def _get_size_scale(w, h, min_size, max_size=None):
size = min_size
scale = 1.0
if max_size is not None:
min_original_size = float(min((w, h)))
max_original_size = float(max((w, h)))
if max_original_size / min_original_size * size > max_size:
size = int(round(max_size * min_original_size / max_original_size))
if (w <= h and w == size) or (h <= w and h == size):
return (h, w), scale
if w < h:
ow = size
oh = int(size * h / w)
scale = size / w
else:
oh = size
ow = int(size * w / h)
scale = size / h
scale = ow / w
return (oh, ow), scale
def _resize(im, target_size=800, max_size=1333):
if not isinstance(im, np.ndarray):
raise TypeError("{}: image type is not numpy.")
if len(im.shape) != 3:
raise ImageError('{}: image is not 3-dimensional.')
im_shape = im.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
selected_size = target_size
if float(im_size_min) == 0:
raise ZeroDivisionError('min size of image is 0')
if max_size != 0:
im_scale = float(selected_size) / float(im_size_min)
# Prevent the biggest axis from being more than max_size
if np.round(im_scale * im_size_max) > max_size:
im_scale = float(max_size) / float(im_size_max)
im_scale_x = im_scale
im_scale_y = im_scale
resize_w = np.round(im_scale_x * float(im_shape[1]))
resize_h = np.round(im_scale_y * float(im_shape[0]))
im_info = [resize_h, resize_w, im_scale]
else:
im_scale_x = float(selected_size) / float(im_shape[1])
im_scale_y = float(selected_size) / float(im_shape[0])
resize_w = selected_size
resize_h = selected_size
im = Image.fromarray(im)
im = im.resize((int(resize_w), int(resize_h)), 2)
im = np.array(im)
return im, im_scale_x
def _rotation(image,
gt_boxes,
gt_label,
prob,
fixed_angle=-1,
r_range=(360, 0),
gt_margin=1.4):
rotate_range = r_range[0]
shift = r_range[1]
angle = np.array([np.max([0, fixed_angle])])
if np.random.rand() <= prob:
angle = np.array(
np.random.rand(1) * rotate_range - shift, dtype=np.int16)
'''
rotate image
'''
image = np.array(image)
(h, w) = image.shape[:2]
scale = 1.0
# set the rotation center
center = (w / 2, h / 2)
# anti-clockwise angle in the function
M = cv2.getRotationMatrix2D(center, angle, scale)
image = cv2.warpAffine(image, M, (w, h))
# back to PIL image
im_width, im_height = w, h
'''
rotate boxes
'''
need_gt_boxes = gt_boxes.copy()
origin_gt_boxes = need_gt_boxes
rotated_gt_boxes = np.empty((len(need_gt_boxes), 5), dtype=np.float32)
# anti-clockwise to clockwise arc
cos_cita = np.cos(np.pi / 180 * angle)
sin_cita = np.sin(np.pi / 180 * angle)
# clockwise matrix
rotation_matrix = np.array([[cos_cita, sin_cita], [-sin_cita, cos_cita]])
pts_ctr = origin_gt_boxes[:, 0:2]
pts_ctr = pts_ctr - np.tile((im_width / 2, im_height / 2),
(gt_boxes.shape[0], 1))
pts_ctr = np.array(np.dot(pts_ctr, rotation_matrix), dtype=np.int16)
pts_ctr = np.squeeze(
pts_ctr, axis=-1) + np.tile((im_width / 2, im_height / 2),
(gt_boxes.shape[0], 1))
origin_gt_boxes[:, 0:2] = pts_ctr
len_of_gt = len(origin_gt_boxes)
# rectificate the angle in the range of [-45, 45]
for idx in range(len_of_gt):
ori_angle = origin_gt_boxes[idx, 4]
height = origin_gt_boxes[idx, 3]
width = origin_gt_boxes[idx, 2]
# step 1: normalize gt (-45,135)
if width < height:
ori_angle += 90
width, height = height, width
# step 2: rotate (-45,495)
rotated_angle = ori_angle + angle
# step 3: normalize rotated_angle (-45,135)
while rotated_angle > 135:
rotated_angle = rotated_angle - 180
rotated_gt_boxes[idx, 0] = origin_gt_boxes[idx, 0]
rotated_gt_boxes[idx, 1] = origin_gt_boxes[idx, 1]
rotated_gt_boxes[idx, 3] = height * gt_margin
rotated_gt_boxes[idx, 2] = width * gt_margin
rotated_gt_boxes[idx, 4] = rotated_angle
x_inbound = np.logical_and(rotated_gt_boxes[:, 0] >= 0,
rotated_gt_boxes[:, 0] < im_width)
y_inbound = np.logical_and(rotated_gt_boxes[:, 1] >= 0,
rotated_gt_boxes[:, 1] < im_height)
inbound = np.logical_and(x_inbound, y_inbound)
need_gt_boxes = rotated_gt_boxes[inbound]
need_gt_label = gt_label.copy()
need_gt_label = need_gt_label[inbound]
return image, need_gt_boxes, need_gt_label
def prep_im_for_blob(im, pixel_means, target_size, max_size):
"""Prepare an image for use as a network input blob. Specially:
- Subtract per-channel pixel mean
- Convert to float32
- Rescale to each of the specified target size (capped at max_size)
Returns a list of transformed images, one for each target size. Also returns
the scale factors that were used to compute each returned image.
"""
im = im.astype(np.float32, copy=False)
im -= pixel_means
im_shape = im.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than max_size
if np.round(im_scale * im_size_max) > max_size:
im_scale = float(max_size) / float(im_size_max)
im = cv2.resize(
im,
None,
None,
fx=im_scale,
fy=im_scale,
interpolation=cv2.INTER_LINEAR)
im_height, im_width, channel = im.shape
channel_swap = (2, 0, 1) #(batch, channel, height, width)
im = im.transpose(channel_swap)
return im, im_scale
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __setattr__(self, name, value):
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import time
import numpy as np
import pickle
import paddle
import paddle.fluid as fluid
import reader
import models.model_builder as model_builder
import models.resnet as resnet
import checkpoint as checkpoint
from config import cfg
from utility import print_arguments, parse_args, check_gpu
from data_utils import DatasetPath
from eval_helper import *
import logging
FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
logger = logging.getLogger(__name__)
def eval():
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
class_nums = cfg.class_num
model = model_builder.RRPN(
add_conv_body_func=resnet.ResNet(),
add_roi_box_head_func=resnet.ResNetC5(),
use_pyreader=False,
mode='val')
startup_prog = fluid.Program()
infer_prog = fluid.Program()
with fluid.program_guard(infer_prog, startup_prog):
with fluid.unique_name.guard():
model.build_model(image_shape)
pred_boxes = model.eval_bbox_out()
infer_prog = infer_prog.clone(True)
exe.run(startup_prog)
# yapf: disable
def if_exist(var):
return os.path.exists(os.path.join(cfg.pretrained_model, var.name))
if cfg.pretrained_model:
checkpoint.load_params(exe, infer_prog, cfg.pretrained_model)
# yapf: enable
test_reader = reader.test(1)
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
fetch_list = [pred_boxes]
res_list = []
keys = [
'bbox', 'gt_box', 'gt_class', 'is_crowed', 'im_info', 'im_id',
'is_difficult'
]
for i, data in enumerate(test_reader()):
im_info = [data[0][1]]
result = exe.run(infer_prog,
fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(data),
return_numpy=False)
pred_boxes_v = result[0]
nmsed_out = pred_boxes_v
outs = np.array(nmsed_out)
res = get_key_dict(outs, data[0], keys)
res_list.append(res)
if i % 50 == 0:
logger.info('test_iter {}'.format(i))
icdar_eval(res_list)
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
check_gpu(args.use_gpu)
eval()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import numpy as np
import paddle.fluid as fluid
import math
from config import cfg
import six
import numpy as np
import cv2
import Polygon as plg
from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont
from config import cfg
import logging
logger = logging.getLogger(__name__)
def get_key_dict(out, data, key):
res = {}
for i in range(len(key)):
if i == 0:
res[key[i]] = out
else:
res[key[i]] = data[i]
return res
def get_labels_maps():
default_labels_maps = {1: 'text'}
if cfg.dataset == 'icdar2015':
return default_labels_maps
labels_map = {}
with open(os.path.join(cfg.data_dir, 'label_list')) as f:
lines = f.readlines()
for idx, line in enumerate(lines):
labels_map[idx + 1] = line.strip()
return labels_map
def draw_bounding_box_on_image(image_path,
image_name,
nms_out,
im_scale,
draw_threshold=0.8):
#if image is None:
image = Image.open(os.path.join(image_path, image_name))
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
labels_map = get_labels_maps()
for dt in np.array(nms_out):
num_id, score = dt.tolist()[:2]
x1, y1, x2, y2, x3, y3, x4, y4 = dt.tolist()[2:] / im_scale
if score < draw_threshold:
continue
draw.line(
[(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
width=2,
fill='red')
if image.mode == 'RGB':
draw.text((x1, y1), labels_map[num_id], (255, 255, 0))
print("image with bbox drawed saved as {}".format(image_name))
image.save(image_name)
def polygon_from_points(points):
"""
Returns a Polygon object to use with the Polygon2 class from a list of 8 points: x1,y1,x2,y2,x3,y3,x4,y4
"""
res_boxes = np.empty([1, 8], dtype='int32')
res_boxes[0, 0] = int(points[0])
res_boxes[0, 4] = int(points[1])
res_boxes[0, 1] = int(points[2])
res_boxes[0, 5] = int(points[3])
res_boxes[0, 2] = int(points[4])
res_boxes[0, 6] = int(points[5])
res_boxes[0, 3] = int(points[6])
res_boxes[0, 7] = int(points[7])
point_mat = res_boxes[0].reshape([2, 4]).T
return plg.Polygon(point_mat)
def clip_box(bbox, im_info):
h = im_info[0]
w = im_info[1]
res = []
for b in bbox:
pts = b.reshape(4, 2)
pts[np.where(pts < 0)] = 1
pts[np.where(pts[:, 0] > w), 0] = w - 1
pts[np.where(pts[:, 1] > h), 1] = h - 1
pts = pts.reshape(-1)
pts /= im_info[2]
res.append(pts)
return np.array(res)
def get_union(det, gt):
area_det = det.area()
area_gt = gt.area()
return area_det + area_gt - get_intersection(det, gt)
def get_intersection_over_union(det, gt):
try:
return get_intersection(det, gt) / get_union(det, gt)
except:
return 0
def get_intersection(det, gt):
inter = det & gt
if len(inter) == 0:
return 0
return inter.area()
def parse_gt(result, im_id):
for res in result:
if res['im_id'] == im_id:
gt_boxes = list(res['gt_box'])
gt_class = res['gt_class']
is_difficult = res['is_difficult'].reshape(-1)
objects = []
for i in range(len(gt_boxes)):
object_struct = {}
object_struct['bbox'] = gt_boxes[i]
object_struct['class'] = gt_class[i]
if is_difficult[i] == 1:
object_struct['difficult'] = 1
else:
object_struct['difficult'] = 0
object_struct['im_id'] = im_id
objects.append(object_struct)
return objects
def calculate_ap(rec, prec):
# 11 point metric
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(rec >= t) == 0:
p = 0
else:
p = np.max(prec[rec >= t])
ap = ap + p / 11.
return ap
def icdar_map(result, class_name, ovthresh):
im_ids = []
for res in result:
im_ids.append(res['im_id'])
recs = {}
for i, im_id in enumerate(im_ids):
recs[str(im_id)] = parse_gt(result, im_id)
class_recs = {}
npos = 0
for k in im_ids:
res = [obj for obj in recs[str(k)] if obj['class'] == class_name]
bbox = np.array([x['bbox'] for x in res])
difficult = np.array([x['difficult'] for x in res]).astype(np.bool)
det = [False] * len(res)
npos = npos + sum(~difficult)
class_recs[k] = {'bbox': bbox, 'difficult': difficult, 'det': det}
image_ids = []
confidence = []
bbox = []
for res in result:
im_info = res['im_info']
pred_boxes = res['bbox']
for box in pred_boxes:
if box[0] == class_name:
image_ids.append(res['im_id'])
confidence.append(box[1])
clipd_box = clip_box(box[2:].reshape(-1, 8), im_info)
bbox.append(clipd_box[0])
confidence = np.array(confidence)
sorted_ind = np.argsort(-confidence)
sorted_scores = np.sort(-confidence)
bbox = np.array(bbox)
bbox = bbox[sorted_ind, :]
image_ids = [image_ids[x] for x in sorted_ind]
nd = len(image_ids)
tp = np.zeros(nd)
fp = np.zeros(nd)
for d in range(nd):
res = class_recs[image_ids[d]]
bb = bbox[d, :].astype(float)
ovmax = -np.inf
gt_bbox = res['bbox'].astype(float)
if gt_bbox.size > 0:
# compute overlaps
gt_bbox_xmin = np.min(gt_bbox[:, 0::2], axis=1)
gt_bbox_ymin = np.min(gt_bbox[:, 1::2], axis=1)
gt_bbox_xmax = np.max(gt_bbox[:, 0::2], axis=1)
gt_bbox_ymax = np.max(gt_bbox[:, 1::2], axis=1)
bb_xmin = np.min(bb[0::2])
bb_ymin = np.min(bb[1::2])
bb_xmax = np.max(bb[0::2])
bb_ymax = np.max(bb[1::2])
ixmin = np.maximum(gt_bbox_xmin, bb_xmin)
iymin = np.maximum(gt_bbox_ymin, bb_ymin)
ixmax = np.minimum(gt_bbox_xmax, bb_xmax)
iymax = np.minimum(gt_bbox_ymax, bb_ymax)
iw = np.maximum(ixmax - ixmin + 1., 0.)
ih = np.maximum(iymax - iymin + 1., 0.)
inters = iw * ih
# union
uni = ((bb_xmax - bb_xmin + 1.) * (bb_ymax - bb_ymin + 1.) +
(gt_bbox_xmax - gt_bbox_xmin + 1.) *
(gt_bbox_ymax - gt_bbox_ymin + 1.) - inters)
overlaps = inters / uni
gt_bbox_keep_mask = overlaps > 0
gt_bbox_keep = gt_bbox[gt_bbox_keep_mask, :]
gt_bbox_keep_index = np.where(overlaps > 0)[0]
def calcoverlaps(gt_bbox_keep, bb):
overlaps = []
for index, _ in enumerate(gt_bbox_keep):
p_g = polygon_from_points(gt_bbox_keep[index])
p_d = polygon_from_points(bb)
overlap = get_intersection_over_union(p_d, p_g)
overlaps.append(overlap)
return overlaps
if len(gt_bbox_keep) > 0:
overlaps = calcoverlaps(gt_bbox_keep, bb)
ovmax = np.max(overlaps)
jmax = np.argmax(overlaps)
jmax = gt_bbox_keep_index[jmax]
if ovmax > ovthresh:
if not res['difficult'][jmax]:
if not res['det'][jmax]:
tp[d] = 1.
res['det'][jmax] = 1
else:
fp[d] = 1.
else:
fp[d] = 1.
# compute precision recall
fp = np.cumsum(fp)
tp = np.cumsum(tp)
rec = tp / float(npos)
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = calculate_ap(rec, prec)
return rec, prec, ap
def icdar_map_eval(result, num_class):
map = 0
for i in range(num_class - 1):
rec, prec, ap = icdar_map(result, i + 1, ovthresh=0.5)
map = map + ap
map = map / (num_class - 1)
logger.info('mAP {}'.format(map))
def icdar_box_eval(result, thresh):
matched_sum = 0
num_global_care_gt = 0
num_global_care_det = 0
for res in result:
im_info = res['im_info']
h = im_info[1]
w = im_info[2]
gt_boxes = res['gt_box']
pred_boxes = res['bbox']
pred_boxes = pred_boxes[np.where(pred_boxes[:, 1] > thresh)]
pred_boxes = pred_boxes[:, 2:]
pred_boxes = clip_box(pred_boxes, im_info)
is_difficult = res['is_difficult']
det_matched = 0
iou_mat = np.empty([1, 1])
gt_pols = []
det_pols = []
gt_pol_points = []
det_pol_points = []
gt_dont_care_pols_num = []
det_dont_care_pols_num = []
det_matched_nums = []
points_list = list(gt_boxes)
dony_care = is_difficult.reshape(-1)
for i, points in enumerate(points_list):
gt_pol = polygon_from_points(list(points))
gt_pols.append(gt_pol)
gt_pol_points.append(list(points))
if dony_care[i] == 1:
gt_dont_care_pols_num.append(len(gt_pols) - 1)
for i, points in enumerate(pred_boxes):
points = list(points.reshape(8).astype(np.int32))
det_pol = polygon_from_points(points)
det_pols.append(det_pol)
det_pol_points.append(points)
if len(gt_dont_care_pols_num) > 0:
for dont_care_pol in gt_dont_care_pols_num:
dont_care_pol = gt_pols[dont_care_pol]
intersected_area = get_intersection(dont_care_pol, det_pol)
pd_dimensions = det_pol.area()
precision = 0 if pd_dimensions == 0 else intersected_area / pd_dimensions
if (precision > 0.5):
det_dont_care_pols_num.append(len(det_pols) - 1)
break
if len(gt_pols) > 0 and len(det_pols) > 0:
# Calculate IoU and precision matrixs
output_shape = [len(gt_pols), len(det_pols)]
iou_mat = np.empty(output_shape)
gt_rect_mat = np.zeros(len(gt_pols), np.int8)
det_rect_mat = np.zeros(len(det_pols), np.int8)
for gt_num in range(len(gt_pols)):
for det_num in range(len(det_pols)):
p_d = gt_pols[gt_num]
p_g = det_pols[det_num]
iou_mat[gt_num, det_num] = get_intersection_over_union(p_d,
p_g)
for gt_num in range(len(gt_pols)):
for det_num in range(len(det_pols)):
if gt_rect_mat[gt_num] == 0 and det_rect_mat[
det_num] == 0 and gt_num not in gt_dont_care_pols_num and det_num not in det_dont_care_pols_num:
if iou_mat[gt_num, det_num] > 0.5:
gt_rect_mat[gt_num] = 1
det_rect_mat[det_num] = 1
det_matched += 1
det_matched_nums.append(det_num)
num_gt_care = (len(gt_pols) - len(gt_dont_care_pols_num))
num_det_care = (len(det_pols) - len(det_dont_care_pols_num))
matched_sum += det_matched
num_global_care_gt += num_gt_care
num_global_care_det += num_det_care
method_recall = 0 if num_global_care_gt == 0 else float(
matched_sum) / num_global_care_gt
method_precision = 0 if num_global_care_det == 0 else float(
matched_sum) / num_global_care_det
method_hmean = 0 if method_recall + method_precision == 0 else 2 * method_recall * method_precision / (
method_recall + method_precision)
logger.info('Recall {}'.format(method_recall))
logger.info('Precision {}'.format(method_precision))
logger.info('F1 {}'.format(method_hmean))
def icdar_eval(result):
if cfg.dataset == 'icdar2015':
icdar_box_eval(result, 0.8)
else:
icdar_map_eval(result, cfg.class_num)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import time
import numpy as np
import pickle
import paddle
import paddle.fluid as fluid
import reader
import models.model_builder as model_builder
import models.resnet as resnet
import checkpoint as checkpoint
from config import cfg
from data_utils import DatasetPath
from eval_helper import *
from utility import print_arguments, parse_args, check_gpu
def infer():
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
class_nums = cfg.class_num
model = model_builder.RRPN(
add_conv_body_func=resnet.ResNet(),
add_roi_box_head_func=resnet.ResNetC5(),
use_pyreader=False,
mode='infer')
startup_prog = fluid.Program()
infer_prog = fluid.Program()
with fluid.program_guard(infer_prog, startup_prog):
with fluid.unique_name.guard():
model.build_model(image_shape)
pred_boxes = model.eval_bbox_out()
infer_prog = infer_prog.clone(True)
exe.run(startup_prog)
# yapf: disable
def if_exist(var):
return os.path.exists(os.path.join(cfg.pretrained_model, var.name))
if cfg.pretrained_model:
checkpoint.load_params(exe, infer_prog, cfg.pretrained_model)
# yapf: enable
infer_reader = reader.infer(cfg.image_path)
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
fetch_list = [pred_boxes]
imgs = os.listdir(cfg.image_path)
imgs.sort()
for i, data in enumerate(infer_reader()):
result = exe.run(infer_prog,
fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(data),
return_numpy=False)
nmsed_out = result[0]
im_info = data[0][1]
im_scale = im_info[2]
outs = np.array(nmsed_out)
draw_bounding_box_on_image(cfg.image_path, imgs[i], outs, im_scale,
cfg.draw_threshold)
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
check_gpu(args.use_gpu)
infer()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle.fluid as fluid
from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.framework import Variable
fluid.load_op_library('models/ext_op/src/rrpn_lib.so')
def rrpn_target_assign(bbox_pred,
cls_logits,
anchor_box,
gt_boxes,
im_info,
rpn_batch_size_per_im=256,
rpn_straddle_thresh=0.0,
rpn_fg_fraction=0.5,
rpn_positive_overlap=0.7,
rpn_negative_overlap=0.3,
use_random=True):
"""
**Target Assign Layer for rotated region proposal network (RRPN).**
This layer can be, for given the Intersection-over-Union (IoU) overlap
between anchors and ground truth boxes, to assign classification and
regression targets to each each anchor, these target labels are used for
train RPN. The classification targets is a binary class label (of being
an object or not). Following the paper of RRPN, the positive labels
are two kinds of anchors: (i) the anchor/anchors with the highest IoU
overlap with a ground-truth box, or (ii) an anchor that has an IoU overlap
higher than rpn_positive_overlap(0.7) with any ground-truth box. Note
that a single ground-truth box may assign positive labels to multiple
anchors. A non-positive anchor is when its IoU ratio is lower than
rpn_negative_overlap (0.3) for all ground-truth boxes. Anchors that are
neither positive nor negative do not contribute to the training objective.
The regression targets are the encoded ground-truth boxes associated with
the positive anchors.
Args:
bbox_pred(Variable): A 3-D Tensor with shape [N, M, 5] represents the
predicted locations of M bounding bboxes. N is the batch size,
and each bounding box has five coordinate values and the layout
is [x, y, w, h, angle]. The data type can be float32 or float64.
cls_logits(Variable): A 3-D Tensor with shape [N, M, 1] represents the
predicted confidence predictions. N is the batch size, 1 is the
frontground and background sigmoid, M is number of bounding boxes.
The data type can be float32 or float64.
anchor_box(Variable): A 2-D Tensor with shape [M, 5] holds M boxes,
each box is represented as [x, y, w, h, angle],
[x, y] is the left top coordinate of the anchor box,
if the input is image feature map, they are close to the origin
of the coordinate system. [w, h] is the right bottom
coordinate of the anchor box, angle is the rotation angle of box.
The data type can be float32 or float64.
gt_boxes (Variable): The ground-truth bounding boxes (bboxes) are a 2D
LoDTensor with shape [Ng, 5], Ng is the total number of ground-truth
bboxes of mini-batch input. The data type can be float32 or float64.
im_info (Variable): A 2-D LoDTensor with shape [N, 3]. N is the batch size,
3 is the height, width and scale.
rpn_batch_size_per_im(int): Total number of RPN examples per image.
The data type must be int32.
rpn_straddle_thresh(float): Remove RPN anchors that go outside the image
by straddle_thresh pixels. The data type must be float32.
rpn_fg_fraction(float): Target fraction of RoI minibatch that is labeled
foreground (i.e. class > 0), 0-th class is background. The data type must be float32.
rpn_positive_overlap(float): Minimum overlap required between an anchor
and ground-truth box for the (anchor, gt box) pair to be a positive
example. The data type must be float32.
rpn_negative_overlap(float): Maximum overlap allowed between an anchor
and ground-truth box for the (anchor, gt box) pair to be a negative
examples. The data type must be float32.
use_random(bool): Whether to sample randomly when sampling.
Returns:
tuple:
A tuple(predicted_scores, predicted_location, target_label,
target_bbox) is returned. The predicted_scores
and predicted_location is the predicted result of the RPN.
The target_label and target_bbox is the ground truth,
respectively. The predicted_location is a 2D Tensor with shape
[F, 5], and the shape of target_bbox is same as the shape of
the predicted_location, F is the number of the foreground
anchors. The predicted_scores is a 2D Tensor with shape
[F + B, 1], and the shape of target_label is same as the shape
of the predicted_scores, B is the number of the background
anchors, the F and B is depends on the input of this operator.
Bbox_inside_weight represents whether the predicted loc is fake_fg
or not and the shape is [F, 5].
Examples:
.. code-block:: python
import paddle.fluid as fluid
bbox_pred = fluid.data(name='bbox_pred', shape=[None, 5], dtype='float32')
cls_logits = fluid.data(name='cls_logits', shape=[None, 1], dtype='float32')
anchor_box = fluid.data(name='anchor_box', shape=[None, 5], dtype='float32')
gt_boxes = fluid.data(name='gt_boxes', shape=[None, 5], dtype='float32')
im_info = fluid.data(name='im_infoss', shape=[None, 3], dtype='float32')
loc, score, loc_target, score_target = rrpn_target_assign(
bbox_pred, cls_logits, anchor_box, gt_boxes, im_info)
"""
helper = LayerHelper('rrpn_target_assign', **locals())
# Assign target label to anchors
loc_index = helper.create_variable_for_type_inference(dtype='int32')
score_index = helper.create_variable_for_type_inference(dtype='int32')
target_label = helper.create_variable_for_type_inference(dtype='int32')
target_bbox = helper.create_variable_for_type_inference(
dtype=anchor_box.dtype)
helper.append_op(
type="rrpn_target_assign",
inputs={'Anchor': anchor_box,
'GtBoxes': gt_boxes,
'ImInfo': im_info},
outputs={
'LocationIndex': loc_index,
'ScoreIndex': score_index,
'TargetLabel': target_label,
'TargetBBox': target_bbox
},
attrs={
'rpn_batch_size_per_im': rpn_batch_size_per_im,
'rpn_straddle_thresh': rpn_straddle_thresh,
'rpn_positive_overlap': rpn_positive_overlap,
'rpn_negative_overlap': rpn_negative_overlap,
'rpn_fg_fraction': rpn_fg_fraction,
'use_random': use_random
})
loc_index.stop_gradient = True
score_index.stop_gradient = True
target_label.stop_gradient = True
target_bbox.stop_gradient = True
cls_logits = fluid.layers.reshape(x=cls_logits, shape=(-1, 1))
bbox_pred = fluid.layers.reshape(x=bbox_pred, shape=(-1, 5))
predicted_cls_logits = fluid.layers.gather(cls_logits, score_index)
predicted_bbox_pred = fluid.layers.gather(bbox_pred, loc_index)
return predicted_cls_logits, predicted_bbox_pred, target_label, target_bbox
def rotated_anchor_generator(input,
anchor_sizes=None,
aspect_ratios=None,
angles=None,
variance=[1.0, 1.0, 1.0, 1.0, 1.0],
stride=None,
offset=0.5,
name=None):
"""
**Rotated Anchor generator operator**
Generate anchors for RRPN algorithm.
Each position of the input produce N anchors, N =
size(anchor_sizes) * size(aspect_ratios) * size(angles).
The order of generated anchors is firstly aspect_ratios
loop then anchor_sizes loop.
Args:
input(Variable): 4-D Tensor with shape [N,C,H,W]. The input feature map.
anchor_sizes(float32|list|tuple): The anchor sizes of generated
anchors, given in absolute pixels e.g. [64., 128., 256., 512.].
For instance, the anchor size of 64 means the area of this anchor
equals to 64**2. None by default.
aspect_ratios(float32|list|tuple): The height / width ratios
of generated anchors, e.g. [0.5, 1.0, 2.0]. None by default.
angle(list|tuple): Rotated angle of prior boxes. The data type is float32.
variance(list|tuple): The variances to be used in box
regression deltas. The data type is float32, [1.0, 1.0, 1.0, 1.0, 1.0] by
default.
stride(list|tuple): The anchors stride across width and height.
The data type is float32. e.g. [16.0, 16.0]. None by default.
offset(float32): Prior boxes center offset. 0.5 by default.
name(str): Name of this layer. None by default.
Returns:
Anchors(Variable): The output anchors with a layout of [H, W, num_anchors, 5].
H is the height of input, W is the width of input,
num_anchors is the box count of each position. Each anchor is
in (x, y, w, h, angle) format.
Variances(Variable): The expanded variances of anchors with a layout of
[H, W, num_priors, 5]. H is the height of input,
W is the width of input num_anchors is the box count
of each position. Each variance is in (x, y, w, h, angle) format.
Examples:
.. code-block:: python
import paddle.fluid as fluid
conv1 = fluid.data(name='conv1', shape=[None, 48, 16, 16], dtype='float32')
anchor, var = rotated_anchor_generator(
input=conv1,
anchor_sizes=[128, 256, 512],
aspect_ratios=[0.2, 0.5, 1.0],
variance=[1.0, 1.0, 1.0, 1.0, 1.0],
stride=[16.0, 16.0],
offset=0.5)
"""
helper = LayerHelper("rotated_anchor_generator", **locals())
dtype = helper.input_dtype()
def _is_list_or_tuple_(data):
return (isinstance(data, list) or isinstance(data, tuple))
if not _is_list_or_tuple_(anchor_sizes):
anchor_sizes = [anchor_sizes]
if not _is_list_or_tuple_(aspect_ratios):
aspect_ratios = [aspect_ratios]
if not _is_list_or_tuple_(angles):
angles = [angles]
if not (_is_list_or_tuple_(stride) and len(stride) == 2):
raise ValueError('stride should be a list or tuple ',
'with length 2, (stride_width, stride_height).')
anchor_sizes = list(map(float, anchor_sizes))
aspect_ratios = list(map(float, aspect_ratios))
angles = list(map(float, angles))
stride = list(map(float, stride))
attrs = {
'anchor_sizes': anchor_sizes,
'aspect_ratios': aspect_ratios,
'angles': angles,
'variances': variance,
'stride': stride,
'offset': offset
}
anchor = helper.create_variable_for_type_inference(dtype)
var = helper.create_variable_for_type_inference(dtype)
helper.append_op(
type="rotated_anchor_generator",
inputs={"Input": input},
outputs={"Anchors": anchor,
"Variances": var},
attrs=attrs, )
anchor.stop_gradient = True
var.stop_gradient = True
return anchor, var
def rrpn_box_coder(prior_box, prior_box_var, target_box, name=None):
"""
Args:
prior_box(Variable): Box list prior_box is a 2-D Tensor with shape
[M, 5] holds M boxes and data type is float32 or float64. Each box
is represented as [x, y, w, h, angle], [x, y] is the
center coordinate of the anchor box, [w, h] is the width and height
of the anchor box, angle is rotated angle of prior_box.
prior_box_var(List|Variable|None): "prior_box_var is a 2-D Tensor with
shape [M, 5] holds M group of variance."
target_box(Variable): This input can be a 2-D LoDTensor with shape
[M, 5]. Each box is represented as [x, y, w, h, angle]. The data
type is float32 or float64.
name(str): Name of this layer. None by default.
Returns:
Variable:
output_box(Variable): The output tensor of rrpn_box_coder_op with shape [N, 5] representing the
result of N target boxes encoded with N Prior boxes and variances.
N represents the number of box and 5 represents [x, y, w, h ,angle].
Examples:
.. code-block:: python
import paddle.fluid as fluid
prior_box_decode = fluid.data(name='prior_box_decode',
shape=[512, 5],
dtype='float32')
target_box_decode = fluid.data(name='target_box_decode',
shape=[512, 5],
dtype='float32')
output_decode = rrpn_box_coder(prior_box=prior_box_decode,
prior_box_var=[10, 10, 5, 5, 1],
target_box=target_box_decode)
"""
helper = LayerHelper("rrpn_box_coder", **locals())
if name is None:
output_box = helper.create_variable_for_type_inference(
dtype=prior_box.dtype)
else:
output_box = helper.create_variable(
name=name, dtype=prior_box.dtype, persistable=False)
inputs = {"PriorBox": prior_box, "TargetBox": target_box}
attrs = {}
if isinstance(prior_box_var, Variable):
inputs['PriorBoxVar'] = prior_box_var
elif isinstance(prior_box_var, list):
attrs['variance'] = prior_box_var
else:
raise TypeError(
"Input variance of rrpn_box_coder must be Variable or list")
helper.append_op(
type="rrpn_box_coder",
inputs=inputs,
attrs=attrs,
outputs={"OutputBox": output_box})
return output_box
def rotated_roi_align(input,
rois,
pooled_height=1,
pooled_width=1,
spatial_scale=1.0,
name=None):
"""
**RotatedRoIAlign Operator**
Rotated Region of interest align (also known as Rotated RoI align) is to perform
bilinear interpolation on inputs of nonuniform sizes to obtain
fixed-size feature maps (e.g. 7*7)
Dividing each region proposal into equal-sized sections with
the pooled_width and pooled_height. Location remains the origin
result.
Each ROI bin are transformed to become horizontal by perspective transformation and
values in each ROI bin are computed directly through bilinear interpolation. The output is
the mean of all values.
Thus avoid the misaligned problem.
"""
helper = LayerHelper('rrpn_rotated_roi_align', **locals())
dtype = helper.input_dtype()
align_out = helper.create_variable_for_type_inference(dtype)
cx = helper.create_variable_for_type_inference('float32')
cy = helper.create_variable_for_type_inference('float32')
helper.append_op(
type="rrpn_rotated_roi_align",
inputs={"X": input,
"ROIs": rois},
outputs={"Out": align_out,
"ConIdX": cx,
"ConIdY": cy},
attrs={
"pooled_height": pooled_height,
"pooled_width": pooled_width,
"spatial_scale": spatial_scale,
})
return align_out
def rotated_generate_proposal_labels(rpn_rois,
gt_classes,
is_crowd,
gt_boxes,
im_info,
batch_size_per_im=256,
fg_fraction=0.25,
fg_thresh=0.25,
bg_thresh_hi=0.5,
bg_thresh_lo=0.0,
bbox_reg_weights=[0.1, 0.1, 0.2, 0.2],
class_nums=None,
use_random=True,
is_cls_agnostic=False):
"""
**Rotated Generate Proposal Labels**
This operator can be, for given the RotatedGenerateProposalOp output bounding boxes and groundtruth,
to sample foreground boxes and background boxes, and compute loss target.
RpnRois is the output boxes of RPN and was processed by rotated_generate_proposal_op, these boxes
were combined with groundtruth boxes and sampled according to batch_size_per_im and fg_fraction,
If an instance with a groundtruth overlap greater than fg_thresh, then it was considered as a foreground sample.
If an instance with a groundtruth overlap greater than bg_thresh_lo and lower than bg_thresh_hi,
then it was considered as a background sample.
After all foreground and background boxes are chosen (so called Rois),
then we apply random sampling to make sure
the number of foreground boxes is no more than batch_size_per_im * fg_fraction.
For each box in Rois, we assign the classification (class label) and regression targets (box label) to it.
Finally BboxInsideWeights and BboxOutsideWeights are used to specify whether it would contribute to training loss.
Args:
rpn_rois(Variable): A 2-D LoDTensor with shape [N, 5]. N is the number of the RotatedGenerateProposalOp's output, each element is a bounding box with [x, y, w, h, angle] format. The data type can be float32 or float64.
gt_classes(Variable): A 2-D LoDTensor with shape [M, 1]. M is the number of groundtruth, each element is a class label of groundtruth. The data type must be int32.
is_crowd(Variable): A 2-D LoDTensor with shape [M, 1]. M is the number of groundtruth, each element is a flag indicates whether a groundtruth is crowd. The data type must be int32.
gt_boxes(Variable): A 2-D LoDTensor with shape [M, 5]. M is the number of groundtruth, each element is a bounding box with [x, y, w, h, angle] format.
im_info(Variable): A 2-D LoDTensor with shape [B, 3]. B is the number of input images, each element consists of im_height, im_width, im_scale.
batch_size_per_im(int): Batch size of rois per images. The data type must be int32.
fg_fraction(float): Foreground fraction in total batch_size_per_im. The data type must be float32.
fg_thresh(float): Overlap threshold which is used to chose foreground sample. The data type must be float32.
bg_thresh_hi(float): Overlap threshold upper bound which is used to chose background sample. The data type must be float32.
bg_thresh_lo(float): Overlap threshold lower bound which is used to chose background sample. The data type must be float32.
bbox_reg_weights(list|tuple): Box regression weights. The data type must be float32.
class_nums(int): Class number. The data type must be int32.
use_random(bool): Use random sampling to choose foreground and background boxes.
is_cls_agnostic(bool): bbox regression use class agnostic simply which only represent fg and bg boxes.
Returns:
tuple:
A tuple with format``(rois, labels_int32, bbox_targets, bbox_inside_weights, bbox_outside_weights)``.
- **rois**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 5]``. The data type is the same as ``rpn_rois``.
- **labels_int32**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 1]``. The data type must be int32.
- **bbox_targets**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 5 * class_num]``. The regression targets of all RoIs. The data type is the same as ``rpn_rois``.
- **bbox_inside_weights**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 5 * class_num]``. The weights of foreground boxes' regression loss. The data type is the same as ``rpn_rois``.
- **bbox_outside_weights**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 5 * class_num]``. The weights of regression loss. The data type is the same as ``rpn_rois``.
Examples:
.. code-block:: python
import paddle.fluid as fluid
rpn_rois = fluid.data(name='rpn_rois', shape=[None, 5], dtype='float32')
gt_classes = fluid.data(name='gt_classes', shape=[None, 1], dtype='float32')
is_crowd = fluid.data(name='is_crowd', shape=[None, 1], dtype='float32')
gt_boxes = fluid.data(name='gt_boxes', shape=[None, 5], dtype='float32')
im_info = fluid.data(name='im_info', shape=[None, 3], dtype='float32')
rois, labels, bbox, inside_weights, outside_weights = rotated_generate_proposal_labels(
rpn_rois, gt_classes, is_crowd, gt_boxes, im_info,
class_nums=10)
"""
helper = LayerHelper('rrpn_generate_proposal_labels', **locals())
rois = helper.create_variable_for_type_inference(dtype=rpn_rois.dtype)
labels_int32 = helper.create_variable_for_type_inference(
dtype=gt_classes.dtype)
bbox_targets = helper.create_variable_for_type_inference(
dtype=rpn_rois.dtype)
bbox_inside_weights = helper.create_variable_for_type_inference(
dtype=rpn_rois.dtype)
bbox_outside_weights = helper.create_variable_for_type_inference(
dtype=rpn_rois.dtype)
helper.append_op(
type="rrpn_generate_proposal_labels",
inputs={
'RpnRois': rpn_rois,
'GtClasses': gt_classes,
'IsCrowd': is_crowd,
'GtBoxes': gt_boxes,
'ImInfo': im_info
},
outputs={
'Rois': rois,
'LabelsInt32': labels_int32,
'BboxTargets': bbox_targets,
'BboxInsideWeights': bbox_inside_weights,
'BboxOutsideWeights': bbox_outside_weights
},
attrs={
'batch_size_per_im': batch_size_per_im,
'fg_fraction': fg_fraction,
'fg_thresh': fg_thresh,
'bg_thresh_hi': bg_thresh_hi,
'bg_thresh_lo': bg_thresh_lo,
'bbox_reg_weights': bbox_reg_weights,
'class_nums': class_nums,
'use_random': use_random,
'is_cls_agnostic': is_cls_agnostic
})
rois.stop_gradient = True
labels_int32.stop_gradient = True
bbox_targets.stop_gradient = True
bbox_inside_weights.stop_gradient = True
bbox_outside_weights.stop_gradient = True
return rois, labels_int32, bbox_targets, bbox_inside_weights, bbox_outside_weights
def rotated_generate_proposals(scores,
bbox_deltas,
im_info,
anchors,
variances,
pre_nms_top_n=6000,
post_nms_top_n=1000,
nms_thresh=0.5,
min_size=0.1,
name=None):
"""
**Rotated Generate proposal**
This operation proposes Rotated RoIs according to each box with their
probability to be a foreground object and the box can be calculated by anchors.
bbox_deltas and scores are the output of RPN. Final proposals could be used to
train detection net. For generating proposals, this operation performs following steps:
1. Transposes and resizes scores and bbox_deltas in size of
(H*W*A, 1) and (H*W*A, 5)
2. Calculate box locations as proposals candidates.
3. Remove predicted boxes with small area.
4. Apply NMS to get final proposals as output.
Args:
scores(Variable): A 4-D Tensor with shape [N, A, H, W] represents
the probability for each box to be an object.
N is batch size, A is number of anchors, H and W are height and
width of the feature map. The data type must be float32.
bbox_deltas(Variable): A 4-D Tensor with shape [N, 5*A, H, W]
represents the differece between predicted box locatoin and
anchor location. The data type must be float32.
im_info(Variable): A 2-D Tensor with shape [N, 3] represents origin
image information for N batch. Info contains height, width and scale
between origin image size and the size of feature map.
The data type must be int32.
anchors(Variable): A 4-D Tensor represents the anchors with a layout
of [H, W, A, 5]. H and W are height and width of the feature map,
num_anchors is the box count of each position. Each anchor is
in (x, y, w, h, angle) format. The data type must be float32.
variances(Variable): A 4-D Tensor. The expanded variances of anchors with a layout of
[H, W, num_priors, 5]. Each variance is in
(xcenter, ycenter, w, h) format. The data type must be float32.
pre_nms_top_n(float): Number of total bboxes to be kept per
image before NMS. The data type must be float32. `6000` by default.
post_nms_top_n(float): Number of total bboxes to be kept per
image after NMS. The data type must be float32. `1000` by default.
nms_thresh(float): Threshold in NMS. The data type must be float32. `0.5` by default.
min_size(float): Remove predicted boxes with either height or
width < min_size. The data type must be float32. `0.1` by default.
Returns:
tuple:
A tuple with format ``(rrpn_rois, rrpn_roi_probs)``.
- **rpn_rois**: The generated RoIs. 2-D Tensor with shape ``[N, 5]`` while ``N`` is the number of RoIs. The data type is the same as ``scores``.
- **rpn_roi_probs**: The scores of generated RoIs. 2-D Tensor with shape ``[N, 1]`` while ``N`` is the number of RoIs. The data type is the same as ``scores``.
Examples:
.. code-block:: python
import paddle.fluid as fluid
scores = fluid.data(name='scores', shape=[None, 4, 5, 5], dtype='float32')
bbox_deltas = fluid.data(name='bbox_deltas', shape=[None, 20, 5, 5], dtype='float32')
im_info = fluid.data(name='im_info', shape=[None, 3], dtype='float32')
anchors = fluid.data(name='anchors', shape=[None, 5, 4, 5], dtype='float32')
variances = fluid.data(name='variances', shape=[None, 5, 10, 5], dtype='float32')
rrois, rroi_probs = fluid.layers.rotated_generate_proposals(scores, bbox_deltas,
im_info, anchors, variances)
"""
helper = LayerHelper('rrpn_generate_proposals', **locals())
rpn_rois = helper.create_variable_for_type_inference(
dtype=bbox_deltas.dtype)
rpn_roi_probs = helper.create_variable_for_type_inference(
dtype=scores.dtype)
helper.append_op(
type="rrpn_generate_proposals",
inputs={
'Scores': scores,
'BboxDeltas': bbox_deltas,
'ImInfo': im_info,
'Anchors': anchors,
'Variances': variances
},
attrs={
'pre_nms_topN': pre_nms_top_n,
'post_nms_topN': post_nms_top_n,
'nms_thresh': nms_thresh,
'min_size': min_size
},
outputs={'RpnRois': rpn_rois,
'RpnRoiProbs': rpn_roi_probs})
rpn_rois.stop_gradient = True
rpn_roi_probs.stop_gradient = True
return rpn_rois, rpn_roi_probs
# 自定义OP的编译过程
## 代码结构
- src: 扩展OP C++/CUDA 源码
- rrpn_lib.py: Python封装
## 安装PaddlePaddle
请通过如下方式安装PaddlePaddle:
- 通过[Paddle develop分支](https://github.com/PaddlePaddle/Paddle/tree/develop)源码编译安装,编译方法如下:
1. [Ubuntu](https://www.paddlepaddle.org.cn/install/doc/source/ubuntu)
1. [CentOS](https://www.paddlepaddle.org.cn/install/doc/source/centos)
1. [MasOS](https://www.paddlepaddle.org.cn/install/doc/source/macos)
1. [Windows](https://www.paddlepaddle.org.cn/install/doc/source/windows)
**说明:** 推荐使用docker编译
- 安装Paddle develop[每日版本whl包](https://www.paddlepaddle.org.cn/install/doc/tables#多版本whl包列表-dev-11)
**注意:** 编译自定义OP使用的gcc版本须与Paddle编译使用gcc版本一致,Paddle develop每日版本目前采用**gcc 4.8.2**版本编译,若使用每日版本,请使用**gcc 4.8.2**版本编译自定义OP,否则可能出现兼容性问题。
## 编译自定义OP
自定义op需要将实现的C++、CUDA代码编译成动态库,mask.sh中通过g++/nvcc编译,当然您也可以写Makefile或者CMake。
编译需要include PaddlePaddle的相关头文件,链接PaddlePaddle的lib库。 头文件和lib库可通过下面命令获取到:
```
# python
>>> import paddle
>>> print(paddle.sysconfig.get_include())
/paddle/pyenv/local/lib/python2.7/site-packages/paddle/include
>>> print(paddle.sysconfig.get_lib())
/paddle/pyenv/local/lib/python2.7/site-packages/paddle/libs
```
我们提供动态库编译脚本如下:
```
cd src
sh make.sh
```
最终编译会产出`rrpn_lib.so`
**说明:** 若使用源码编译安装PaddlePaddle的方式,编译过程中`cmake`未设置`WITH_MKLDNN`的方式,
编译自定义OP时会报错找不到`mkldnn.h`等文件,可在`make.sh`中删除编译命令中的`-DPADDLE_WITH_MKLDNN`选项。
## 设置环境变量
需要将Paddle的核心库设置到`LD_LIBRARY_PATH`里, 先运行下面程序获取路径:
```
import paddle
print(paddle.sysconfig.get_lib())
```
可通过如下方式添加动态库路径:
```
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`python -c 'import paddle; print(paddle.sysconfig.get_lib())'`
```
更多关于如何在框架外部自定义 C++ OP,可阅读[官网说明文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_usage/index_cn.html)
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Based on
--------------------------------------------------------
@misc{ma2019rrpn,
author = {Jianqi Ma},
title = {{RRPN in pytorch}},
year = {2019},
howpublished = {\url{https://github.com/mjq11302010044/RRPN_pytorch}},
}
@article{Jianqi17RRPN,
Author = {Jianqi Ma and Weiyuan Shao and Hao Ye and Li Wang and Hong Wang
and Yingbin Zheng and Xiangyang Xue},
Title = {Arbitrary-Oriented Scene Text Detection via Rotation Proposals},
journal = {IEEE Transactions on Multimedia},
volume={20},
number={11},
pages={3111-3122},
year={2018}
}
--------------------------------------------------------
*/
#pragma once
#include <algorithm>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/tensor.h"
namespace paddle {
namespace operators {
#define PI 3.141592654
struct RangeInitFunctor {
int start;
int delta;
int* out;
HOSTDEVICE void operator()(size_t i) { out[i] = start + i * delta; }
};
// get trangle area after decompose intersecting polygons into triangles
template <typename T>
inline T trangle_area(T* a, T* b, T* c) {
return ((a[0] - c[0]) * (b[1] - c[1]) - (a[1] - c[1]) * (b[0] - c[0])) / 2.0;
}
// get area of intersecting
template <typename T>
inline T get_area(T* int_pts, int num_of_inter) {
T area = 0.0;
for (int i = 0; i < num_of_inter - 2; i++) {
area += fabs(
trangle_area<T>(int_pts, int_pts + 2 * i + 2, int_pts + 2 * i + 4));
}
return area;
}
// sort points to decompose intersecting polygons into triangles
template <typename T>
inline void reorder_pts(T* int_pts, int num_of_inter) {
if (num_of_inter > 0) {
T center[2] = {0.0, 0.0};
for (int i = 0; i < num_of_inter; i++) {
center[0] += int_pts[2 * i];
center[1] += int_pts[2 * i + 1];
}
center[0] /= num_of_inter;
center[1] /= num_of_inter;
T vs[16];
T v[2];
T d;
for (int i = 0; i < num_of_inter; i++) {
v[0] = int_pts[2 * i] - center[0];
v[1] = int_pts[2 * i + 1] - center[1];
d = sqrt(v[0] * v[0] + v[1] * v[1]);
v[0] = v[0] / d;
v[1] = v[1] / d;
if (v[1] < 0) {
v[0] = -2 - v[0];
}
vs[i] = v[0];
}
float temp, tx, ty;
int j;
for (int i = 1; i < num_of_inter; ++i) {
if (vs[i - 1] > vs[i]) {
temp = vs[i];
tx = int_pts[2 * i];
ty = int_pts[2 * i + 1];
j = i;
while (j > 0 && vs[j - 1] > temp) {
vs[j] = vs[j - 1];
int_pts[j * 2] = int_pts[j * 2 - 2];
int_pts[j * 2 + 1] = int_pts[j * 2 - 1];
j--;
}
vs[j] = temp;
int_pts[j * 2] = tx;
int_pts[j * 2 + 1] = ty;
}
}
}
}
// determine if points intersect
template <typename T>
inline bool inter2line(T* pts1, T* pts2, int i, int j, T* temp_pts) {
T a[2] = {pts1[2 * i], pts1[2 * i + 1]};
T b[2] = {pts1[2 * ((i + 1) % 4)], pts1[2 * ((i + 1) % 4) + 1]};
T c[2] = {pts2[2 * j], pts2[2 * j + 1]};
T d[2] = {pts2[2 * ((j + 1) % 4)], pts2[2 * ((j + 1) % 4) + 1]};
T area_abc, area_abd, area_cda, area_cdb;
area_abc = trangle_area<T>(a, b, c);
area_abd = trangle_area<T>(a, b, d);
if (area_abc * area_abd >= -1e-5) {
return false;
}
area_cda = trangle_area<T>(c, d, a);
area_cdb = area_cda + area_abc - area_abd;
if (area_cda * area_cdb >= -1e-5) {
return false;
}
T t = area_cda / (area_abd - area_abc);
T dx = t * (b[0] - a[0]);
T dy = t * (b[1] - a[1]);
temp_pts[0] = a[0] + dx;
temp_pts[1] = a[1] + dy;
return true;
}
template <typename T>
inline bool inrect(T pt_x, T pt_y, T* pts) {
T ab[2] = {pts[2] - pts[0], pts[3] - pts[1]};
T ad[2] = {pts[6] - pts[0], pts[7] - pts[1]};
T ap[2] = {pt_x - pts[0], pt_y - pts[1]};
T abab = ab[0] * ab[0] + ab[1] * ab[1];
T abap = ab[0] * ap[0] + ab[1] * ap[1];
T adad = ad[0] * ad[0] + ad[1] * ad[1];
T adap = ad[0] * ap[0] + ad[1] * ap[1];
bool result = (abab - abap >= -1) and (abap >= -1) and (adad - adap >= -1) and
(adap >= -1);
return result;
}
// calculate the number of intersection points
template <typename T>
inline int inter_pts(T* pts1, T* pts2, T* int_pts) {
int num_of_inter = 0;
for (int i = 0; i < 4; i++) {
if (inrect<T>(pts1[2 * i], pts1[2 * i + 1], pts2)) {
int_pts[num_of_inter * 2] = pts1[2 * i];
int_pts[num_of_inter * 2 + 1] = pts1[2 * i + 1];
num_of_inter++;
}
if (inrect<T>(pts2[2 * i], pts2[2 * i + 1], pts1)) {
int_pts[num_of_inter * 2] = pts2[2 * i];
int_pts[num_of_inter * 2 + 1] = pts2[2 * i + 1];
num_of_inter++;
}
}
T out_pts[2];
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
bool has_pts = inter2line<T>(pts1, pts2, i, j, out_pts);
if (has_pts) {
int_pts[num_of_inter * 2] = out_pts[0];
int_pts[num_of_inter * 2 + 1] = out_pts[1];
num_of_inter++;
}
}
}
return num_of_inter;
}
// convert x,y,w,h,angle to x1,y1,x2,y2,x3,y3,x4,y4
template <typename T>
inline void convert_region(T* pts,
const framework::Tensor& _region,
int index) {
auto region = framework::EigenTensor<T, 2>::From(_region);
T angle = region(index, 4);
T a_cos = cos(angle / 180.0 * PI);
T a_sin = -sin(angle / 180.0 * PI); // anti clock-wise
T ctr_x = region(index, 0);
T ctr_y = region(index, 1);
T h = region(index, 3);
T w = region(index, 2);
T pts_x[4] = {-w / 2, -w / 2, w / 2, w / 2};
T pts_y[4] = {-h / 2, h / 2, h / 2, -h / 2};
for (int i = 0; i < 4; i++) {
pts[2 * i] = a_cos * pts_x[i] - a_sin * pts_y[i] + ctr_x;
pts[2 * i + 1] = a_sin * pts_x[i] + a_cos * pts_y[i] + ctr_y;
}
}
// Calculate the area of intersection
template <typename T>
inline float inter(const framework::Tensor& _region1,
const framework::Tensor& _region2,
const int& r,
const int& c) {
T pts1[8];
T pts2[8];
T int_pts[16];
int num_of_inter;
convert_region<T>(pts1, _region1, r);
convert_region<T>(pts2, _region2, c);
num_of_inter = inter_pts<T>(pts1, pts2, int_pts);
reorder_pts<T>(int_pts, num_of_inter);
return get_area<T>(int_pts, num_of_inter);
}
template <typename T>
inline float devRotateIoU(const framework::Tensor& _region1,
const framework::Tensor& _region2,
const int r,
const int c) {
auto __region1 = framework::EigenTensor<T, 2>::From(_region1);
auto __region2 = framework::EigenTensor<T, 2>::From(_region2);
if ((fabs(__region1(r, 0) - __region2(c, 0)) < 1e-5) &&
(fabs(__region1(r, 1) - __region2(c, 1)) < 1e-5) &&
(fabs(__region1(r, 2) - __region2(c, 2)) < 1e-5) &&
(fabs(__region1(r, 3) - __region2(c, 3)) < 1e-5) &&
(fabs(__region1(r, 4) - __region2(c, 4)) < 1e-5)) {
return 1.0;
}
T area1, area2, area_inter;
area1 = __region1(r, 2) * __region1(r, 3);
area2 = __region2(c, 2) * __region2(c, 3);
area_inter = inter<T>(_region1, _region2, r, c);
auto result = area_inter / (area1 + area2 - area_inter);
if (result < 0) {
result = 0.0;
}
// may have bugs which cause overlap > 1
if (result > 1.00000001) {
result = 0.0;
}
return result;
}
template <typename T>
inline void BoxToDelta2(const int box_num,
const framework::Tensor& ex_boxes,
const framework::Tensor& gt_boxes,
const float* weights,
framework::Tensor* box_delta) {
auto ex_boxes_et = framework::EigenTensor<T, 2>::From(ex_boxes);
auto gt_boxes_et = framework::EigenTensor<T, 2>::From(gt_boxes);
auto trg = framework::EigenTensor<T, 2>::From(*box_delta);
T ex_w, ex_h, ex_ctr_x, ex_ctr_y, ex_angle, gt_w, gt_h, gt_ctr_x, gt_ctr_y,
gt_angle;
for (int64_t i = 0; i < box_num; ++i) {
ex_w = ex_boxes_et(i, 2);
ex_h = ex_boxes_et(i, 3);
ex_ctr_x = ex_boxes_et(i, 0);
ex_ctr_y = ex_boxes_et(i, 1);
ex_angle = ex_boxes_et(i, 4);
gt_w = gt_boxes_et(i, 2);
gt_h = gt_boxes_et(i, 3);
gt_ctr_x = gt_boxes_et(i, 0);
gt_ctr_y = gt_boxes_et(i, 1);
gt_angle = gt_boxes_et(i, 4);
trg(i, 0) = (gt_ctr_x - ex_ctr_x) / ex_w;
trg(i, 1) = (gt_ctr_y - ex_ctr_y) / ex_h;
trg(i, 2) = std::log(gt_w / ex_w);
trg(i, 3) = std::log(gt_h / ex_h);
trg(i, 4) = gt_angle - ex_angle;
if (weights) {
trg(i, 0) = trg(i, 0) * weights[0];
trg(i, 1) = trg(i, 1) * weights[1];
trg(i, 2) = trg(i, 2) * weights[2];
trg(i, 3) = trg(i, 3) * weights[3];
trg(i, 4) = trg(i, 4) * weights[4];
}
if (gt_angle <= -30 && ex_angle >= 120) {
trg(i, 4) = trg(i, 4) + 180.0;
}
if (gt_angle >= 120 && ex_angle <= -30) {
trg(i, 4) = trg(i, 4) - 180.0;
}
trg(i, 4) = (PI / 180) * trg(i, 4);
}
}
template <typename T>
void Gather(
const T* in, const int in_stride, const int* index, const int num, T* out) {
const int stride_bytes = in_stride * sizeof(T);
for (int i = 0; i < num; ++i) {
int id = index[i];
memcpy(out + i * in_stride, in + id * in_stride, stride_bytes);
}
}
template <typename T>
void BboxOverlaps2(const framework::Tensor& r_boxes,
const framework::Tensor& c_boxes,
framework::Tensor* overlaps) {
auto overlaps_et = framework::EigenTensor<T, 2>::From(*overlaps);
int r_num = r_boxes.dims()[0];
int c_num = c_boxes.dims()[0];
for (int i = 0; i < r_num; ++i) {
for (int j = 0; j < c_num; ++j) {
overlaps_et(i, j) = devRotateIoU<T>(r_boxes, c_boxes, i, j);
}
}
}
} // namespace operators
} // namespace paddle
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/tensor.h"
#ifdef PADDLE_WITH_MKLML
#include "paddle/fluid/platform/dynload/mklml.h"
#endif
#ifdef PADDLE_WITH_LIBXSMM
#include <libxsmm.h>
#endif
#ifdef PADDLE_USE_OPENBLAS
#include <cblas.h>
#endif
namespace paddle {
namespace operators {
namespace math {
/**
* Matrix Descriptor of a memory buffer.
*
* It is used for Blas::MatMul. MatMul operator can be batched.
* if Mat A is [BatchSize, H, W], Mat B is [BatchSize, H, W]. It will be a
* `batch_size` times of GEMM. The batched GEMM could be faster base on the
* implementation of the blas library. The batch size could be zero. If any
* matrix of `matmul` has a batch size, the will be a batched GEMM, too. e.g.,
* Mat A is [BatchSize, H1, W2], and Mat B [H2, W2], The result matrix wil be
* [BatchSize, H1, W2]
*
* The boolean flag, `trans`, describe the memory is the transpose of matrix or
* not. If the trans is true, the last two dims of matrix are transposed. The
* memory layout of the matrix is [Width, Height] or [BatchSize, Width, Height].
*
* The MatDescriptor is not only the dimension or shape of a matrix, it also
* contains the layout, stride of matrix. It is clearer to have a structure than
* reuse `DDim`.
*/
struct MatDescriptor {
int64_t height_;
int64_t width_;
int64_t stride_{0};
int64_t batch_size_{0};
bool trans_;
};
/**
* Create Matrix Descriptor from a tensor dim, num_flatten_cols, and transpose
* flag
*
* @param tensor_dim: The dimension of the tensor. The rank of this dimension
* must larger than 1.
*
* @param num_flatten_cols: Reshape a tensor to a matrix. The matrix's first
* dimension(column length) will be the product of tensor's first `num_col_dims`
* dimensions. If num_flatten_cols is zero, the first N-2 dimension will be the
* batch_size of descriptor.
*
* @param trans: True if the matrix is transposed.
*/
extern MatDescriptor CreateMatrixDescriptor(const framework::DDim& tensor_dim,
int num_flatten_cols,
bool trans);
template <typename DeviceContext>
class Blas {
public:
explicit Blas(const DeviceContext& context) : context_(context) {}
template <typename T>
void GEMM(CBLAS_TRANSPOSE transA,
CBLAS_TRANSPOSE transB,
int M,
int N,
int K,
T alpha,
const T* A,
const T* B,
T beta,
T* C) const;
template <typename T>
void GEMM(bool transA,
bool transB,
int M,
int N,
int K,
T alpha,
const T* A,
int lda,
const T* B,
int ldb,
T beta,
T* C,
int ldc) const;
template <typename T>
void GEMM(CBLAS_TRANSPOSE transA,
CBLAS_TRANSPOSE transB,
int M,
int N,
int K,
T alpha,
const T* A,
int lda,
const T* B,
int ldb,
T beta,
T* C,
int ldc) const;
#ifdef PADDLE_WITH_MKLML
template <typename T>
T* GEMM_ALLOC(const CBLAS_IDENTIFIER id,
const int M,
const int N,
const int K) const;
template <typename T>
void GEMM_PACK(const CBLAS_IDENTIFIER id,
const CBLAS_TRANSPOSE trans,
int M,
int N,
int K,
const T alpha,
const T* src,
const int ld,
T* dst) const;
template <typename T>
void GEMM_COMPUTE(int transA,
int transB,
int M,
int N,
int K,
const T* A,
const int lda,
const T* B,
const int ldb,
T beta,
T* C,
const int ldc) const;
template <typename T>
void GEMM_FREE(T* data) const;
template <typename T>
void CSRMM(const char* transa,
const int* m,
const int* n,
const int* k,
const T* alpha,
const char* matdescra,
const T* val,
const int* indx,
const int* pntrb,
const int* pntre,
const T* b,
const int* ldb,
const T* beta,
T* c,
const int* ldc) const;
#if !defined(PADDLE_WITH_CUDA)
template <typename T>
void MatMulWithHead(const framework::Tensor& mat_a,
const MatDescriptor& dim_a,
const framework::Tensor& mat_b,
const MatDescriptor& dim_b,
T alpha,
int head_number,
framework::Tensor* mat_out,
T beta,
bool mat_y_split_vertical) const;
#endif
#endif
template <typename T>
void MatMul(const int M,
const int N,
const int K,
const T* A,
const T* B,
T* C) const;
template <typename T>
void MatMul(const framework::Tensor& mat_a,
bool trans_a,
const framework::Tensor& mat_b,
bool trans_b,
T alpha,
framework::Tensor* mat_out,
T beta) const;
template <typename T>
void MatMul(const framework::Tensor& mat_a,
bool trans_a,
const framework::Tensor& mat_b,
bool trans_b,
framework::Tensor* mat_out) const {
MatMul(mat_a,
trans_a,
mat_b,
trans_b,
static_cast<T>(1.0),
mat_out,
static_cast<T>(0.0));
}
template <typename T>
void MatMul(const framework::Tensor& mat_a,
const framework::Tensor& mat_b,
framework::Tensor* mat_out) const {
this->template MatMul<T>(mat_a, false, mat_b, false, mat_out);
}
template <typename T>
void AXPY(int n, T alpha, const T* x, T* y) const;
template <typename T>
void VADD(int n, const T* x, const T* y, T* z) const;
template <typename T>
void VSUB(int n, const T* x, const T* y, T* z) const;
template <typename T>
void VMUL(int n, const T* x, const T* y, T* z) const;
template <typename T>
void VDIV(int n, const T* x, const T* y, T* z) const;
template <typename T>
void VCOPY(int n, const T* x, T* y) const;
template <typename T>
void VEXP(int n, const T* x, T* y) const;
template <typename T>
void VSQUARE(int n, const T* x, T* y) const;
template <typename T>
void VPOW(int n, const T* x, T alpha, T* y) const;
template <typename T>
void GEMV(bool trans_a,
int M,
int N,
T alpha,
const T* A,
const T* B,
T beta,
T* C) const;
template <typename T>
T DOT(int n, const T* x, const T* y) const;
template <typename T>
void SCAL(int n, const T a, T* x) const;
template <typename T>
T ASUM(int n, T* x, int inc) const;
template <typename T>
void BatchedGEMM(CBLAS_TRANSPOSE transA,
CBLAS_TRANSPOSE transB,
int M,
int N,
int K,
T alpha,
const T* A,
const T* B,
T beta,
T* C,
int batchCount,
int64_t strideA,
int64_t strideB) const;
#if defined(PADDLE_WITH_MKLML) && !defined(PADDLE_WITH_CUDA)
template <typename T>
void BatchedGEMMWithHead(CBLAS_TRANSPOSE transA,
CBLAS_TRANSPOSE transB,
int W1,
int H1,
int W2,
int H2,
T alpha,
const T* A,
const T* B,
T beta,
T* C,
int batchCount,
int64_t strideA,
int64_t strideB,
int64_t head_number,
bool split_b_vertical) const;
#endif
template <typename T>
void MatMul(const framework::Tensor& mat_a,
const MatDescriptor& dim_a,
const framework::Tensor& mat_b,
const MatDescriptor& dim_b,
T alpha,
framework::Tensor* mat_out,
T beta) const;
template <typename T>
void VINV(int n, const T* a, T* y) const;
template <typename T>
void VMERF(int n, const T* a, T* y, int64_t mode) const;
private:
const DeviceContext& context_;
};
template <typename DeviceContext, typename T>
class BlasT : private Blas<DeviceContext> {
public:
using Blas<DeviceContext>::Blas;
template <typename... ARGS>
void GEMM(ARGS... args) const {
Base()->template GEMM<T>(args...);
}
#ifdef PADDLE_WITH_MKLML
template <typename... ARGS>
T* GEMM_ALLOC(ARGS... args) const {
return Base()->template GEMM_ALLOC<T>(args...);
}
template <typename... ARGS>
void GEMM_PACK(ARGS... args) const {
Base()->template GEMM_PACK<T>(args...);
}
template <typename... ARGS>
void GEMM_COMPUTE(ARGS... args) const {
Base()->template GEMM_COMPUTE<T>(args...);
}
template <typename... ARGS>
void GEMM_FREE(ARGS... args) const {
Base()->template GEMM_FREE<T>(args...);
}
template <typename... ARGS>
void CSRMM(ARGS... args) const {
Base()->template CSRMM<T>(args...);
}
#if !defined(PADDLE_WITH_CUDA)
template <typename... ARGS>
void MatMulWithHead(ARGS... args) const {
Base()->template MatMulWithHead<T>(args...);
}
#endif
#endif
template <typename... ARGS>
void MatMul(ARGS... args) const {
Base()->template MatMul<T>(args...);
}
template <typename... ARGS>
void AXPY(ARGS... args) const {
Base()->template AXPY<T>(args...);
}
template <typename... ARGS>
void VADD(ARGS... args) const {
Base()->template VADD<T>(args...);
}
template <typename... ARGS>
void VSUB(ARGS... args) const {
Base()->template VSUB<T>(args...);
}
template <typename... ARGS>
void VMUL(ARGS... args) const {
Base()->template VMUL<T>(args...);
}
template <typename... ARGS>
void VDIV(ARGS... args) const {
Base()->template VDIV<T>(args...);
}
template <typename... ARGS>
void VCOPY(ARGS... args) const {
Base()->template VCOPY<T>(args...);
}
template <typename... ARGS>
void VEXP(ARGS... args) const {
Base()->template VEXP<T>(args...);
}
template <typename... ARGS>
void VSQUARE(ARGS... args) const {
Base()->template VSQUARE<T>(args...);
}
template <typename... ARGS>
void VPOW(ARGS... args) const {
Base()->template VPOW<T>(args...);
}
template <typename... ARGS>
void GEMV(ARGS... args) const {
Base()->template GEMV<T>(args...);
}
template <typename... ARGS>
T DOT(ARGS... args) const {
return Base()->template DOT<T>(args...);
}
template <typename... ARGS>
void SCAL(ARGS... args) const {
Base()->template SCAL<T>(args...);
}
template <typename... ARGS>
T ASUM(ARGS... args) const {
return Base()->template ASUM<T>(args...);
}
template <typename... ARGS>
void BatchedGEMM(ARGS... args) const {
Base()->template BatchedGEMM<T>(args...);
}
template <typename... ARGS>
void VINV(ARGS... args) const {
Base()->template VINV<T>(args...);
}
template <typename... ARGS>
void VMERF(ARGS... args) const {
Base()->template VMERF<T>(args...);
}
private:
const Blas<DeviceContext>* Base() const {
return static_cast<const Blas<DeviceContext>*>(this);
}
};
template <typename DeviceContext, typename T>
inline BlasT<DeviceContext, T> GetBlas(
const framework::ExecutionContext& exe_ctx) {
return BlasT<DeviceContext, T>(
exe_ctx.template device_context<DeviceContext>());
}
template <typename DeviceContext, typename T>
inline BlasT<DeviceContext, T> GetBlas(const DeviceContext& dev_ctx) {
return BlasT<DeviceContext, T>(dev_ctx);
}
} // namespace math
} // namespace operators
} // namespace paddle
#include "paddle/fluid/operators/math/blas_impl.h"
#ifdef PADDLE_WITH_CUDA
#include "paddle/fluid/operators/math/blas_impl.cu.h"
#endif
/* Copyright (c) 2019 paddlepaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "concat_and_split.h"
#include <vector>
namespace paddle {
namespace operators {
namespace math {
/*
* All tensors' dimension should be the same and the values of
* each dimension must be the same, except the axis dimension.
*/
template <typename T>
class ConcatFunctor<platform::CPUDeviceContext, T> {
public:
void operator()(const platform::CPUDeviceContext& context,
const std::vector<framework::Tensor>& input,
int axis,
framework::Tensor* output) {
// TODO(zcd): Add input data validity checking
int num = input.size();
int rows = 1;
auto dim_0 = input[0].dims();
for (int i = 0; i < axis; ++i) {
rows *= dim_0[i];
}
int out_rows = rows, out_cols = 0;
std::vector<int64_t> input_cols(input.size());
for (int i = 0; i < num; ++i) {
int t_cols = input[i].numel() / rows;
out_cols += t_cols;
input_cols[i] = t_cols;
}
auto cpu_place = boost::get<platform::CPUPlace>(context.GetPlace());
// computation
auto output_data = output->data<T>();
int col_idx = 0;
for (int j = 0; j < num; ++j) {
int col_len = input_cols[j];
auto input_data = input[j].data<T>();
for (int k = 0; k < out_rows; ++k) {
memory::Copy(cpu_place,
output_data + k * out_cols + col_idx,
cpu_place,
input_data + k * col_len,
sizeof(T) * col_len);
}
col_idx += col_len;
}
}
};
#define DEFINE_FUNCTOR(type) \
template class ConcatFunctor<platform::CPUDeviceContext, type>;
FOR_ALL_TYPES(DEFINE_FUNCTOR);
} // namespace math
} // namespace operators
} // namespace paddle
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/framework/lod_tensor.h"
namespace paddle {
namespace operators {
namespace math {
/*
* \brief Concatenate the input tensors along the dimension axis.
* TODO(zcd): maybe it needs to be more detailed.
* Examples:
* Input[0] = [[1,2],[3,4]]
* Input[1] = [[5,6]]
* axis = 0
*
* Output = [[1,2],
* [3,4],
* [5,6]]
*/
template <typename DeviceContext, typename T>
class ConcatFunctor {
public:
void operator()(const DeviceContext& context,
const std::vector<framework::Tensor>& input,
int axis,
framework::Tensor* output);
};
} // namespace math
} // namespace operators
} // namespace paddle
#define FOR_ALL_TYPES(macro) \
macro(int); \
macro(float); \
macro(double); \
macro(bool); \
macro(int64_t); \
macro(int16_t); \
macro(uint8_t); \
macro(int8_t); \
macro(::paddle::platform::float16)
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/fluid/framework/dim.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/tensor.h"
#include "paddle/fluid/memory/malloc.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/platform/place.h"
namespace paddle {
namespace operators {
using framework::Tensor;
using platform::DeviceContext;
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
i += blockDim.x * gridDim.x)
template <typename T, typename IndexT = int>
__global__ void GatherCUDAKernel(const T* params,
const IndexT* indices,
T* output,
size_t index_size,
size_t slice_size) {
CUDA_1D_KERNEL_LOOP(i, index_size * slice_size) {
int indices_i = i / slice_size;
int slice_i = i - indices_i * slice_size; // offset inside the slice
IndexT gather_i = indices[indices_i];
IndexT params_i = gather_i * slice_size + slice_i;
*(output + i) = *(params + params_i);
}
}
template <typename T, typename IndexT = int>
__global__ void GatherNdCUDAKernel(const T* input,
const int* input_dims,
const IndexT* indices,
T* output,
size_t remain_size,
size_t slice_size,
size_t end_size) {
CUDA_1D_KERNEL_LOOP(i, remain_size * slice_size) {
int indices_i = i / slice_size;
int slice_i = i - indices_i * slice_size; // offset inside the slice
IndexT gather_i = 0;
int64_t temp = slice_size;
for (int64_t j = end_size - 1; j >= 0; --j) {
auto index_value = indices[indices_i * end_size + j];
assert(index_value >= 0 && index_value < input_dims[j]);
gather_i += (index_value * temp);
temp *= input_dims[j];
}
IndexT input_i = gather_i + slice_i;
*(output + i) = *(input + input_i);
}
}
/**
* A thin wrapper on gpu tensor
* Return a new tensor from source tensor, gathered according to index
* input[src]: type-T source Tensor
* input[index]: type-IndexT index Tensor (1-D)
* return: output tensor
*/
template <typename T, typename IndexT = int>
void GPUGather(const platform::DeviceContext& ctx,
const Tensor& src,
const Tensor& index,
Tensor* output) {
// check index of shape 1-D
if (index.dims().size() == 1) {
PADDLE_ENFORCE_GT(index.dims()[0],
0,
"The index of gather_op should not be empty when the "
"index's rank is 1.");
} else if (index.dims().size() == 2) {
PADDLE_ENFORCE_EQ(index.dims()[1],
1,
" If the index's rank of gather_op is 2, the second "
"dimension should be 1.");
}
int index_size = index.dims()[0];
auto src_dims = src.dims();
framework::DDim output_dims(src_dims);
output_dims[0] = index_size;
// slice size
int slice_size = 1;
for (int i = 1; i < src_dims.size(); ++i) slice_size *= src_dims[i];
const T* p_src = src.data<T>();
const IndexT* p_index = index.data<IndexT>();
T* p_output = output->data<T>();
int block = 512;
int n = slice_size * index_size;
int grid = (n + block - 1) / block;
GatherCUDAKernel<T, IndexT><<<
grid,
block,
0,
reinterpret_cast<const platform::CUDADeviceContext&>(ctx).stream()>>>(
p_src, p_index, p_output, index_size, slice_size);
}
} // namespace operators
} // namespace paddle
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <memory.h>
#include <cstring>
#include "paddle/fluid/framework/ddim.h"
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/tensor.h"
#include "paddle/fluid/platform/place.h"
namespace paddle {
namespace operators {
using framework::Tensor;
/**
* A thin wrapper for gathering on cpu tensor
* Return a new tensor from source tensor, gathered according to index
* input[src]: type-T source Tensor
* input[index]: type-IndexT index Tensor (1-D)
* return: output tensor
*/
template <typename T, typename IndexT = int>
void CPUGather(const platform::DeviceContext& ctx,
const Tensor& src,
const Tensor& index,
Tensor* output) {
PADDLE_ENFORCE_EQ(platform::is_cpu_place(ctx.GetPlace()), true);
// check index of shape 1-D
if (index.dims().size() == 2) {
PADDLE_ENFORCE_EQ(index.dims()[1],
1,
"index.dims()[1] should be 1 when index.dims().size() == "
"2 in gather_op.");
} else {
PADDLE_ENFORCE_EQ(index.dims().size(),
1,
"index.dims().size() should be 1 or 2 in gather_op.");
}
int64_t index_size = index.dims()[0];
auto src_dims = src.dims();
const T* p_src = src.data<T>();
const IndexT* p_index = index.data<IndexT>();
T* p_output = output->data<T>();
// slice size
int slice_size = 1;
for (int i = 1; i < src_dims.size(); ++i) slice_size *= src_dims[i];
const size_t slice_bytes = slice_size * sizeof(T);
for (int64_t i = 0; i < index_size; ++i) {
IndexT index_ = p_index[i];
memcpy(p_output + i * slice_size, p_src + index_ * slice_size, slice_bytes);
}
}
} // namespace operators
} // namespace paddle
include_dir=$( python -c 'import paddle; print(paddle.sysconfig.get_include())' )
lib_dir=$( python -c 'import paddle; print(paddle.sysconfig.get_lib())' )
echo $include_dir
echo $lib_dir
CUDA=$1
CUDNN=$2
NCCL=$3
if [ ! -d "$CUDA" ]; then
echo "Usage: sh make.sh \$CUDA_PATH \$CUDNN_PATH \$NCCL_PATH"
exit
fi
if [ ! -d "$CUDNN" ]; then
echo "Usage: sh make.sh \${CUDA_PATH} \${CUDNN_PATH} \${NCCL_PATH}"
exit
fi
if [ ! -d "$NCCL" ]; then
echo "Usage: sh make.sh \${CUDA_PATH} \${CUDNN_PATH} \${NCCL_PATH}"
exit
fi
git clone https://github.com/NVlabs/cub.git
nvcc rrpn_generate_proposals_op.cu -c -o rrpn_generate_proposals_op.cu.o -ccbin cc -DPADDLE_WITH_MKLDNN -DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO -Xcompiler -fPIC -std=c++11 -Xcompiler -fPIC -w --expt-relaxed-constexpr -O3 -DNVCC \
-I ${include_dir} \
-I ${include_dir}/third_party \
-I ${CUDA}/include \
-I ${CUDNN}/include \
-I ${NCCL}/include \
-L ${lib_dir} -lpaddle_framework \
-L ${CUDA}/lib64 -lcudart
nvcc rotated_anchor_generator_op.cu -c -o rotated_anchor_generator_op.cu.o -ccbin cc -DPADDLE_WITH_MKLDNN -DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO -Xcompiler -fPIC -std=c++11 -Xcompiler -fPIC -w --expt-relaxed-constexpr -O3 -DNVCC \
-I ${include_dir} \
-I ${include_dir}/third_party \
-I ${CUDA}/include \
-I ${CUDNN}/include \
-I ${NCCL}/include \
-L ${lib_dir} -lpaddle_framework \
-L ${CUDA}/lib64 -lcudart
nvcc rrpn_box_coder_op.cu -c -o rrpn_box_coder_op.cu.o -ccbin cc -DPADDLE_WITH_MKLDNN -DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO -Xcompiler -fPIC -std=c++11 -Xcompiler -fPIC -w --expt-relaxed-constexpr -O3 -DNVCC \
-I ${include_dir} \
-I ${include_dir}/third_party \
-I ${CUDA}/include \
-I ${CUDNN}/include \
-I ${NCCL}/include \
-L ${lib_dir} -lpaddle_framework \
-L ${CUDA}/lib64 -lcudart
nvcc rrpn_rotated_roi_align_op.cu -c -o rrpn_rotated_roi_align_op.cu.o -ccbin cc -DPADDLE_WITH_MKLDNN -DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO -Xcompiler -fPIC -std=c++11 -Xcompiler -fPIC -w --expt-relaxed-constexpr -O3 -DNVCC \
-I ${include_dir} \
-I ${include_dir}/third_party \
-I ${CUDA}/include \
-I ${CUDNN}/include \
-I ${NCCL}/include \
-L ${lib_dir} -lpaddle_framework \
-L ${CUDA}/lib64 -lcudart
g++ rotated_anchor_generator_op.cc concat_and_split.cc rrpn_generate_proposal_labels_op.cc rrpn_generate_proposals_op.cc rrpn_target_assign_op.cc rrpn_box_coder_op.cc rrpn_rotated_roi_align_op.cc rrpn_rotated_roi_align_op.cu.o rrpn_box_coder_op.cu.o rotated_anchor_generator_op.cu.o rrpn_generate_proposals_op.cu.o -o rrpn_lib.so -shared -fPIC -std=c++11 -O3 -DPADDLE_WITH_MKLDNN -DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO \
-I ${include_dir} \
-I ${include_dir}/third_party \
-I ${CUDA}/include \
-I ${CUDNN}/include \
-I ${NCCL}/include \
-L ${lib_dir} -lpaddle_framework \
-L ${CUDA}/lib64 -lcudart
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "math_function.h"
#ifdef PADDLE_WITH_MKLML
#include "paddle/fluid/platform/dynload/mklml.h"
#endif
#ifdef PADDLE_USE_OPENBLAS
#include <cblas.h>
#endif
#include <vector>
#include "math_function_impl.h"
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/platform/float16.h"
namespace paddle {
namespace operators {
namespace math {
#define DEFINE_CPU_TRANS(RANK) \
template struct Transpose<platform::CPUDeviceContext, \
platform::float16, \
RANK>; \
template struct Transpose<platform::CPUDeviceContext, float, RANK>; \
template struct Transpose<platform::CPUDeviceContext, double, RANK>; \
template struct Transpose<platform::CPUDeviceContext, int, RANK>; \
template struct Transpose<platform::CPUDeviceContext, int64_t, RANK>; \
template struct Transpose<platform::CPUDeviceContext, bool, RANK>; \
template struct Transpose<platform::CPUDeviceContext, int16_t, RANK>; \
template struct Transpose<platform::CPUDeviceContext, uint8_t, RANK>; \
template struct Transpose<platform::CPUDeviceContext, int8_t, RANK>;
DEFINE_CPU_TRANS(1);
DEFINE_CPU_TRANS(2);
DEFINE_CPU_TRANS(3);
DEFINE_CPU_TRANS(4);
DEFINE_CPU_TRANS(5);
DEFINE_CPU_TRANS(6);
template <typename DeviceContext, typename T, int Rank>
void Transpose<DeviceContext, T, Rank>::operator()(
const DeviceContext& context,
const framework::Tensor& in,
framework::Tensor* out,
const std::vector<int>& axis) {
Eigen::array<int, Rank> permute;
for (int i = 0; i < Rank; i++) {
permute[i] = axis[i];
}
auto eigen_in = framework::EigenTensor<T, Rank>::From(in);
auto eigen_out = framework::EigenTensor<T, Rank>::From(*out);
auto* dev = context.eigen_device();
eigen_out.device(*dev) = eigen_in.shuffle(permute);
}
} // namespace math
} // namespace operators
} // namespace paddle
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <cmath>
#include <vector>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/tensor.h"
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/enforce.h"
namespace paddle {
namespace operators {
namespace math {
template <typename DeviceContext, typename T, int Rank>
struct Transpose {
void operator()(const DeviceContext& context,
const framework::Tensor& in,
framework::Tensor* out,
const std::vector<int>& axis);
};
void set_constant(const platform::DeviceContext& context,
framework::Tensor* tensor,
float value);
} // namespace math
} // namespace operators
} // namespace paddle
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "rotated_anchor_generator_op.h"
namespace paddle {
namespace operators {
class RotatedAnchorGeneratorOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(
ctx->HasInput("Input"),
"Input(Input) of RotatedAnchorGeneratorOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("Anchors"),
"Output(Anchors) of RotatedAnchorGeneratorOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("Variances"),
"Output(Variances) of RotatedAnchorGeneratorOp should not be null.");
auto input_dims = ctx->GetInputDim("Input");
PADDLE_ENFORCE(input_dims.size() == 4, "The layout of input is NCHW.");
auto anchor_sizes = ctx->Attrs().Get<std::vector<float>>("anchor_sizes");
auto aspect_ratios = ctx->Attrs().Get<std::vector<float>>("aspect_ratios");
auto angles = ctx->Attrs().Get<std::vector<float>>("angles");
auto stride = ctx->Attrs().Get<std::vector<float>>("stride");
auto variances = ctx->Attrs().Get<std::vector<float>>("variances");
size_t num_anchors =
aspect_ratios.size() * anchor_sizes.size() * angles.size();
std::vector<int64_t> dim_vec(4);
dim_vec[0] = input_dims[2];
dim_vec[1] = input_dims[3];
dim_vec[2] = num_anchors;
dim_vec[3] = 5;
ctx->SetOutputDim("Anchors", framework::make_ddim(dim_vec));
ctx->SetOutputDim("Variances", framework::make_ddim(dim_vec));
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<framework::Tensor>("Input")->type(), ctx.device_context());
}
};
class RotatedAnchorGeneratorOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("Input",
"(Tensor, default Tensor<float>), "
"the input feature is a tensor with a rank of 4. "
"The layout is NCHW.");
AddOutput("Anchors",
"(Tensor, default Tensor<float>), the output is a "
"tensor with a rank of 4. The layout is [H, W, num_anchors, 5]. "
"H is the height of input, W is the width of input, num_anchors "
"is the box count of each position. "
"Each anchor is in (xctr, yctr, w, h, thelta) format");
AddOutput("Variances",
"(Tensor, default Tensor<float>), the expanded variances for "
"normalizing bbox regression targets. The layout is [H, W, "
"num_anchors, 5]. "
"H is the height of input, W is the width of input, num_anchors "
"is the box count of each position. "
"Each variance is in (xctr, yctr, w, h, thelta) format");
AddAttr<std::vector<float>>(
"anchor_sizes",
"(vector<float>) List of Rotated Region Proposal Network(RRPN) anchor "
"sizes "
" given in absolute pixels e.g. (64, 128, 256, 512)."
" For instance, the anchor size of 64 means the area of this anchor "
"equals to 64**2.")
.AddCustomChecker([](const std::vector<float>& anchor_sizes) {
PADDLE_ENFORCE_GT(anchor_sizes.size(),
0UL,
"Size of anchor_sizes must be at least 1.");
for (size_t i = 0; i < anchor_sizes.size(); ++i) {
PADDLE_ENFORCE_GT(
anchor_sizes[i], 0.0, "anchor_sizes[%d] must be positive.", i);
}
});
AddAttr<std::vector<float>>(
"aspect_ratios",
"(vector<float>) List of Rotated Region Proposal Network(RRPN) anchor "
"aspect "
"ratios, e.g. (0.5, 1, 2)."
"For instacne, the aspect ratio of 0.5 means the height / width of "
"this anchor equals 0.5.");
AddAttr<std::vector<float>>(
"angles",
"(vector<float>) List of Rotated Region Proposal Network(RRPN) anchor "
"angles, "
"e.g. (-30.0, 0.0, 30.0, 60.0, 90.0, 120.0)."
"For instacne, the aspect ratio of 0.5 means the height / width of "
"this anchor equals 0.5.");
AddAttr<std::vector<float>>("variances",
"(vector<float>) List of variances to be used "
"in box regression deltas")
.AddCustomChecker([](const std::vector<float>& variances) {
PADDLE_ENFORCE_EQ(
variances.size(), 5UL, "Must and only provide 5 variance.");
for (size_t i = 0; i < variances.size(); ++i) {
PADDLE_ENFORCE_GT(
variances[i], 0.0, "variance[%d] must be greater than 0.", i);
}
});
AddAttr<std::vector<float>>("stride",
"Anchors stride across width and height, "
"with a default of (16, 16)")
.SetDefault(std::vector<float>(2, 16.0))
.AddCustomChecker([](const std::vector<float>& stride) {
PADDLE_ENFORCE_EQ(
stride.size(),
2UL,
"Must and only provide 2 stride for width and height.");
for (size_t i = 0; i < stride.size(); ++i) {
PADDLE_ENFORCE_GT(
stride[i], 0.0, "stride[%d] should be larger than 0.", i);
}
});
AddAttr<float>("offset",
"(float) "
"Anchor center offset, with a default of 0.5")
.SetDefault(0.5);
AddComment(R"DOC(
RotatedAnchorGenerator operator
Generates anchors for RRPN. algorithm.
Each position of the input produce N anchors, N =
size(anchor_sizes) * size(aspect_ratios) * size(angles).
Please get more information from the following papers:
https://arxiv.org/abs/1703.01086.
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(
rotated_anchor_generator,
ops::RotatedAnchorGeneratorOp,
ops::RotatedAnchorGeneratorOpMaker,
paddle::framework::EmptyGradOpMaker<paddle::framework::OpDesc>,
paddle::framework::EmptyGradOpMaker<paddle::imperative::OpBase>);
REGISTER_OP_CPU_KERNEL(rotated_anchor_generator,
ops::RotatedAnchorGeneratorOpKernel<float>,
ops::RotatedAnchorGeneratorOpKernel<double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "rotated_anchor_generator_op.h"
namespace paddle {
namespace operators {
template <typename T>
__global__ void GenRAnchors(T* out,
const T* aspect_ratios,
const int ar_num,
const T* anchor_sizes,
const int as_num,
const T* angles,
const int aa_num,
const T* stride,
const int sd_num,
const int height,
const int width,
const T offset) {
int num_anchors = as_num * ar_num * aa_num;
int box_num = height * width * num_anchors;
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < box_num;
i += blockDim.x * gridDim.x) {
int h_idx = i / (num_anchors * width);
int w_idx = (i / num_anchors) % width;
T stride_width = stride[0];
T stride_height = stride[1];
T x_ctr = (w_idx * stride_width) + offset * stride_width - 1;
T y_ctr = (h_idx * stride_height) + offset * stride_height - 1;
T area, area_ratios;
T base_w, base_h;
T scale_w, scale_h;
T anchor_width, anchor_height;
int anch_idx = i % num_anchors;
int ar_idx = anch_idx / (as_num * aa_num);
int as_idx = anch_idx / aa_num % as_num;
int aa_idx = anch_idx % aa_num;
T aspect_ratio = aspect_ratios[ar_idx];
T anchor_size = anchor_sizes[as_idx];
T angle = angles[aa_idx];
area = stride_width * stride_height;
area_ratios = area / aspect_ratio;
base_w = round(sqrt(area_ratios));
base_h = round(base_w * aspect_ratio);
scale_w = anchor_size / stride_width;
scale_h = anchor_size / stride_height;
anchor_width = scale_w * base_w;
anchor_height = scale_h * base_h;
out[i * 5] = x_ctr;
out[i * 5 + 1] = y_ctr;
out[i * 5 + 2] = anchor_width;
out[i * 5 + 3] = anchor_height;
out[i * 5 + 4] = angle;
}
}
template <typename T>
__global__ void SetVariance(T* out,
const T* var,
const int vnum,
const int num) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < num;
i += blockDim.x * gridDim.x) {
out[i] = var[i % vnum];
}
}
template <typename T>
class RotatedAnchorGeneratorOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<paddle::framework::Tensor>("Input");
auto* anchors = ctx.Output<paddle::framework::Tensor>("Anchors");
auto* vars = ctx.Output<paddle::framework::Tensor>("Variances");
auto anchor_sizes = ctx.Attr<std::vector<float>>("anchor_sizes");
auto aspect_ratios = ctx.Attr<std::vector<float>>("aspect_ratios");
auto angles = ctx.Attr<std::vector<float>>("angles");
auto stride = ctx.Attr<std::vector<float>>("stride");
auto variances = ctx.Attr<std::vector<float>>("variances");
T offset = static_cast<T>(ctx.Attr<float>("offset"));
auto width = input->dims()[3];
auto height = input->dims()[2];
int num_anchors =
aspect_ratios.size() * anchor_sizes.size() * angles.size();
int box_num = width * height * num_anchors;
int block = 512;
int grid = (box_num + block - 1) / block;
auto stream =
ctx.template device_context<platform::CUDADeviceContext>().stream();
anchors->mutable_data<T>(ctx.GetPlace());
vars->mutable_data<T>(ctx.GetPlace());
framework::Tensor ar;
framework::TensorFromVector(aspect_ratios, ctx.device_context(), &ar);
framework::Tensor as;
framework::TensorFromVector(anchor_sizes, ctx.device_context(), &as);
framework::Tensor aa;
framework::TensorFromVector(angles, ctx.device_context(), &aa);
framework::Tensor sd;
framework::TensorFromVector(stride, ctx.device_context(), &sd);
GenRAnchors<T><<<grid, block, 0, stream>>>(anchors->data<T>(),
ar.data<T>(),
aspect_ratios.size(),
as.data<T>(),
anchor_sizes.size(),
aa.data<T>(),
angles.size(),
sd.data<T>(),
stride.size(),
height,
width,
offset);
framework::Tensor v;
framework::TensorFromVector(variances, ctx.device_context(), &v);
grid = (box_num * 5 + block - 1) / block;
SetVariance<T><<<grid, block, 0, stream>>>(
vars->data<T>(), v.data<T>(), variances.size(), box_num * 5);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(rotated_anchor_generator,
ops::RotatedAnchorGeneratorOpCUDAKernel<float>,
ops::RotatedAnchorGeneratorOpCUDAKernel<double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
//#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/platform/transform.h"
namespace paddle {
namespace operators {
template <typename T>
class RotatedAnchorGeneratorOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<paddle::framework::Tensor>("Input");
auto* anchors = ctx.Output<paddle::framework::Tensor>("Anchors");
auto* vars = ctx.Output<paddle::framework::Tensor>("Variances");
auto anchor_sizes = ctx.Attr<std::vector<float>>("anchor_sizes");
auto aspect_ratios = ctx.Attr<std::vector<float>>("aspect_ratios");
auto angles = ctx.Attr<std::vector<float>>("angles");
auto stride = ctx.Attr<std::vector<float>>("stride");
auto variances = ctx.Attr<std::vector<float>>("variances");
T offset = static_cast<T>(ctx.Attr<float>("offset"));
auto feature_width = input->dims()[3];
auto feature_height = input->dims()[2];
T stride_width, stride_height;
stride_width = stride[0];
stride_height = stride[1];
int num_anchors =
aspect_ratios.size() * anchor_sizes.size() * angles.size();
anchors->mutable_data<T>(ctx.GetPlace());
vars->mutable_data<T>(ctx.GetPlace());
auto e_anchors = framework::EigenTensor<T, 4>::From(*anchors);
for (int h_idx = 0; h_idx < feature_height; ++h_idx) {
for (int w_idx = 0; w_idx < feature_width; ++w_idx) {
T x_ctr = (w_idx * stride_width) + offset * stride_width - 1;
T y_ctr = (h_idx * stride_height) + offset * stride_height - 1;
T area, area_ratios;
T base_w, base_h;
T scale_w, scale_h;
T anchor_width, anchor_height;
int idx = 0;
for (size_t r = 0; r < aspect_ratios.size(); ++r) {
auto ar = aspect_ratios[r];
for (size_t s = 0; s < anchor_sizes.size(); ++s) {
auto anchor_size = anchor_sizes[s];
area = stride_width * stride_height;
area_ratios = area / ar;
base_w = round(sqrt(area_ratios));
base_h = round(base_w * ar);
scale_w = anchor_size / stride_width;
scale_h = anchor_size / stride_height;
anchor_width = scale_w * base_w;
anchor_height = scale_h * base_h;
for (size_t a = 0; a < angles.size(); ++a) {
auto angle = angles[a];
e_anchors(h_idx, w_idx, idx, 0) = x_ctr;
e_anchors(h_idx, w_idx, idx, 1) = y_ctr;
e_anchors(h_idx, w_idx, idx, 2) = anchor_width;
e_anchors(h_idx, w_idx, idx, 3) = anchor_height;
e_anchors(h_idx, w_idx, idx, 4) = angle;
idx++;
}
}
}
}
}
framework::Tensor var_t;
var_t.mutable_data<T>(
framework::make_ddim({1, static_cast<int>(variances.size())}),
ctx.GetPlace());
auto var_et = framework::EigenTensor<T, 2>::From(var_t);
for (size_t i = 0; i < variances.size(); ++i) {
var_et(0, i) = variances[i];
}
int anchor_num = feature_height * feature_width * num_anchors;
auto var_dim = vars->dims();
vars->Resize({anchor_num, static_cast<int>(variances.size())});
auto e_vars = framework::EigenMatrix<T, Eigen::RowMajor>::From(*vars);
e_vars = var_et.broadcast(Eigen::DSizes<int, 2>(anchor_num, 1));
vars->Resize(var_dim);
}
};
} // namespace operators
} // namespace paddle
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
//#include "rrpn_box_coder_op.h"
#include <string>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
class RRPNBoxCoderOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("PriorBox"),
"Input(PriorBox) of BoxCoderOp should not be null.");
PADDLE_ENFORCE(ctx->HasInput("TargetBox"),
"Input(TargetBox) of BoxCoderOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("OutputBox"),
"Output(OutputBox) of BoxCoderOp should not be null.");
auto prior_box_dims = ctx->GetInputDim("PriorBox");
// auto target_box_dims = ctx->GetInputDim("TargetBox");
if (ctx->IsRuntime()) {
PADDLE_ENFORCE_EQ(
prior_box_dims.size(), 2, "The rank of Input PriorBox must be 2");
PADDLE_ENFORCE_EQ(
prior_box_dims[1], 5, "The shape of PriorBox is [N, 5]");
if (ctx->HasInput("PriorBoxVar")) {
auto prior_box_var_dims = ctx->GetInputDim("PriorBoxVar");
PADDLE_ENFORCE(prior_box_var_dims.size() == 2,
"Input(PriorBoxVar) of BoxCoderOp should be 2.");
PADDLE_ENFORCE_EQ(
prior_box_dims,
prior_box_var_dims,
"The dimension of Input(PriorBoxVar) should be equal to"
"the dimension of Input(PriorBox) when the rank is 2.");
}
}
}
};
class RRPNBoxCoderOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput(
"PriorBox",
"(Tensor, default Tensor<float>) "
"Box list PriorBox is a 2-D Tensor with shape [M, 5] holds M boxes, "
"each box is represented as [x, y, w, h, angle], "
"[x, y] is the center coordinate of the anchor box, "
"if the input is image feature map, they are close to the origin "
"of the coordinate system. [w, h] is the width and height "
"of the anchor box, angle is angle of rotation.");
AddInput("PriorBoxVar",
"(Tensor, default Tensor<float>, optional) "
"PriorBoxVar is a 2-D Tensor with shape [M, 5] holds M group "
"of variance. PriorBoxVar will set all elements to 1 by "
"default.")
.AsDispensable();
AddInput(
"TargetBox",
"(LoDTensor or Tensor) This input can be a 2-D LoDTensor with shape "
"[N, 5], each box is represented as [x, y, w, h, angle],"
"[x, y] is the center coordinate of the box, [w, h] is width and "
"height of the box,"
"angle is angle of rotation around the center of box.");
AddAttr<std::vector<float>>(
"variance",
"(vector<float>, default {}),"
"variance of prior box with shape [5]. PriorBoxVar and variance can"
"not be provided at the same time.")
.SetDefault(std::vector<float>{});
AddOutput("OutputBox",
"(Tensor) "
"2-D Tensor with shape [M, 5] which M represents the number of "
"deocded boxes"
"and 5 represents [x, y, w, h, angle]");
AddComment(R"DOC(
Rotatedi Bounding Box Coder.
Decode the target bounding box with the priorbox information.
The Decoding schema described below:
ox = pw * tx / pxv + cx
oy = ph * ty / pyv + cy
ow = exp(tw / pwv) * pw
oh = exp(th / phv) * ph
oa = ta / pav * 1.0 / 3.141592653 * 180 + pa
where `tx`, `ty`, `tw`, `th`, `ta` denote the target box's center coordinates, width
,height and angle respectively. Similarly, `px`, `py`, `pw`, `ph`, `pa` denote the
priorbox's (anchor) center coordinates, width, height and angle. `pxv`, `pyv`, `pwv`,
`phv`, `pav` denote the variance of the priorbox and `ox`, `oy`, `ow`, `oh`, `oa`
denote the encoded/decoded coordinates, width and height.
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(
rrpn_box_coder,
ops::RRPNBoxCoderOp,
ops::RRPNBoxCoderOpMaker,
paddle::framework::EmptyGradOpMaker<paddle::framework::OpDesc>,
paddle::framework::EmptyGradOpMaker<paddle::imperative::OpBase>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <string>
#include <vector>
#include "paddle/fluid/memory/memory.h"
//#include "rrpn_box_coder_op.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
namespace paddle {
namespace operators {
#define PI 3.141592654
template <typename T>
__global__ void DecodeCenterSizeKernel(const T* prior_box_data,
const T* prior_box_var_data,
const T* target_box_data,
const int row,
const int len,
const T prior_box_var_size,
const float* variance,
const int var_size,
T* output) {
const int idx = threadIdx.x + blockIdx.x * blockDim.x;
int prior_box_offset = 0;
if (idx < row) {
const int row_idx = idx;
prior_box_offset = row_idx * len;
T prior_box_width = prior_box_data[prior_box_offset + 2];
T prior_box_height = prior_box_data[prior_box_offset + 3];
T prior_box_center_x = prior_box_data[prior_box_offset];
T prior_box_center_y = prior_box_data[prior_box_offset + 1];
T prior_box_angle = prior_box_data[prior_box_offset + 4];
T target_box_width, target_box_height, target_box_angle;
T target_box_center_x, target_box_center_y;
T box_var_x = T(1), box_var_y = T(1);
T box_var_w = T(1), box_var_h = T(1), box_var_angle = T(1);
if (prior_box_var_data) {
int prior_var_offset = row_idx * len;
box_var_x = prior_box_var_data[prior_var_offset];
box_var_y = prior_box_var_data[prior_var_offset + 1];
box_var_w = prior_box_var_data[prior_var_offset + 2];
box_var_h = prior_box_var_data[prior_var_offset + 3];
box_var_angle = prior_box_var_data[prior_var_offset + 4];
} else if (var_size == 5) {
box_var_x = static_cast<T>(variance[0]);
box_var_y = static_cast<T>(variance[1]);
box_var_w = static_cast<T>(variance[2]);
box_var_h = static_cast<T>(variance[3]);
box_var_angle = static_cast<T>(variance[4]);
}
target_box_width =
exp(target_box_data[idx * len + 2] / box_var_w) * prior_box_width / 1.4;
target_box_height = exp(target_box_data[idx * len + 3] / box_var_h) *
prior_box_height / 1.4;
target_box_center_x =
target_box_data[idx * len] / box_var_x * prior_box_width +
prior_box_center_x;
target_box_center_y =
target_box_data[idx * len + 1] / box_var_y * prior_box_height +
prior_box_center_y;
target_box_angle =
(target_box_data[idx * len + 4] / box_var_angle) * 1.0 / PI * 180 +
prior_box_angle;
T a_cos = cos(PI / 180 * target_box_angle);
T a_sin = -sin(PI / 180 * target_box_angle);
T rotation_matrix[3][3];
rotation_matrix[0][0] = a_cos;
rotation_matrix[0][1] = a_sin;
rotation_matrix[0][2] = 0;
rotation_matrix[1][0] = -a_sin;
rotation_matrix[1][1] = a_cos;
rotation_matrix[1][2] = 0;
rotation_matrix[2][0] = -target_box_center_x * a_cos +
target_box_center_y * a_sin + target_box_center_x;
rotation_matrix[2][1] = -target_box_center_x * a_sin -
target_box_center_y * a_cos + target_box_center_y;
rotation_matrix[2][2] = 1;
T pt_x0 = target_box_center_x - target_box_width / 2;
T pt_x1 = target_box_center_x + target_box_width / 2;
T pt_x2 = target_box_center_x + target_box_width / 2;
T pt_x3 = target_box_center_x - target_box_width / 2;
T pt_y0 = target_box_center_y - target_box_height / 2;
T pt_y1 = target_box_center_y - target_box_height / 2;
T pt_y2 = target_box_center_y + target_box_height / 2;
T pt_y3 = target_box_center_y + target_box_height / 2;
output[idx * 8] = pt_x0 * rotation_matrix[0][0] +
pt_y0 * rotation_matrix[1][0] + rotation_matrix[2][0];
output[idx * 8 + 1] = pt_x0 * rotation_matrix[0][1] +
pt_y0 * rotation_matrix[1][1] + rotation_matrix[2][1];
output[idx * 8 + 2] = pt_x1 * rotation_matrix[0][0] +
pt_y1 * rotation_matrix[1][0] + rotation_matrix[2][0];
output[idx * 8 + 3] = pt_x1 * rotation_matrix[0][1] +
pt_y1 * rotation_matrix[1][1] + rotation_matrix[2][1];
output[idx * 8 + 4] = pt_x2 * rotation_matrix[0][0] +
pt_y2 * rotation_matrix[1][0] + rotation_matrix[2][0];
output[idx * 8 + 5] = pt_x2 * rotation_matrix[0][1] +
pt_y2 * rotation_matrix[1][1] + rotation_matrix[2][1];
output[idx * 8 + 6] = pt_x3 * rotation_matrix[0][0] +
pt_y3 * rotation_matrix[1][0] + rotation_matrix[2][0];
output[idx * 8 + 7] = pt_x3 * rotation_matrix[0][1] +
pt_y3 * rotation_matrix[1][1] + rotation_matrix[2][1];
}
}
template <typename DeviceContext, typename T>
class RRPNBoxCoderCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
PADDLE_ENFORCE(platform::is_gpu_place(context.GetPlace()),
"This kernel only runs on GPU device.");
auto* prior_box = context.Input<framework::Tensor>("PriorBox");
auto* prior_box_var = context.Input<framework::Tensor>("PriorBoxVar");
auto* target_box = context.Input<framework::LoDTensor>("TargetBox");
auto* output_box = context.Output<framework::Tensor>("OutputBox");
std::vector<float> variance = context.Attr<std::vector<float>>("variance");
const T* prior_box_data = prior_box->data<T>();
const T* target_box_data = target_box->data<T>();
const T* prior_box_var_data = nullptr;
auto prior_box_var_size = 0;
if (prior_box_var) {
PADDLE_ENFORCE(variance.empty(),
"Input 'PriorBoxVar' and attribute 'variance' should not"
"be used at the same time.");
prior_box_var_data = prior_box_var->data<T>();
prior_box_var_size = prior_box_var->dims().size();
}
if (!(variance.empty())) {
PADDLE_ENFORCE(static_cast<int>(variance.size()) == 5,
"Size of attribute 'variance' should be 4");
}
if (target_box->lod().size()) {
PADDLE_ENFORCE_EQ(
target_box->lod().size(), 1, "Only support 1 level of LoD.");
}
const int var_size = static_cast<int>(variance.size());
auto row = target_box->dims()[0];
auto len = 5;
int block = 512;
int grid = (row + block - 1) / block;
auto& device_ctx = context.cuda_device_context();
int bytes = var_size * sizeof(float);
auto dev_var = memory::Alloc(device_ctx, bytes);
float* dev_var_data = reinterpret_cast<float*>(dev_var->ptr());
auto cplace = platform::CPUPlace();
const auto gplace = boost::get<platform::CUDAPlace>(context.GetPlace());
memory::Copy(
gplace, dev_var_data, cplace, &variance[0], bytes, device_ctx.stream());
output_box->mutable_data<T>({row, 8}, context.GetPlace());
T* output = output_box->data<T>();
DecodeCenterSizeKernel<T><<<grid, block, 0, device_ctx.stream()>>>(
prior_box_data,
prior_box_var_data,
target_box_data,
row,
len,
prior_box_var_size,
dev_var_data,
var_size,
output);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(
rrpn_box_coder,
ops::RRPNBoxCoderCUDAKernel<paddle::platform::CUDADeviceContext, float>,
ops::RRPNBoxCoderCUDAKernel<paddle::platform::CUDADeviceContext, double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <math.h>
#include <algorithm>
#include <fstream>
#include <string>
#include <vector>
#include "bbox_util.h"
#include "concat_and_split.h"
#include "gather.h"
#include "math_function.h"
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
const int kBoxDim = 5;
template <typename T>
void AppendRois(LoDTensor* out, int64_t offset, Tensor* to_add) {
auto* out_data = out->data<T>();
auto* to_add_data = to_add->data<T>();
memcpy(out_data + offset, to_add_data, to_add->numel() * sizeof(T));
}
class RRPNGenerateProposalLabelsOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("RpnRois"),
"Input(RpnRois) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("GtClasses"),
"Input(GtClasses) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("IsCrowd"),
"Input(IsCrowd) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("GtBoxes"),
"Input(GtBoxes) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("ImInfo"), "Input(ImInfo) shouldn't be null.");
PADDLE_ENFORCE(
ctx->HasOutput("Rois"),
"Output(Rois) of RRPNGenerateProposalLabelsOp should not be null");
PADDLE_ENFORCE(ctx->HasOutput("LabelsInt32"),
"Output(LabelsInt32) of RRPNGenerateProposalLabelsOp should "
"not be null");
PADDLE_ENFORCE(ctx->HasOutput("BboxTargets"),
"Output(BboxTargets) of RRPNGenerateProposalLabelsOp should "
"not be null");
PADDLE_ENFORCE(ctx->HasOutput("BboxInsideWeights"),
"Output(BboxInsideWeights) of RRPNGenerateProposalLabelsOp "
"should not be null");
PADDLE_ENFORCE(ctx->HasOutput("BboxOutsideWeights"),
"Output(BboxOutsideWeights) of RRPNGenerateProposalLabelsOp "
"should not be null");
auto rpn_rois_dims = ctx->GetInputDim("RpnRois");
auto gt_boxes_dims = ctx->GetInputDim("GtBoxes");
auto im_info_dims = ctx->GetInputDim("ImInfo");
PADDLE_ENFORCE_EQ(
rpn_rois_dims.size(), 2, "The rank of Input(RpnRois) must be 2.");
PADDLE_ENFORCE_EQ(
gt_boxes_dims.size(), 2, "The rank of Input(GtBoxes) must be 2.");
PADDLE_ENFORCE_EQ(
im_info_dims.size(), 2, "The rank of Input(ImInfo) must be 2.");
int class_nums = ctx->Attrs().Get<int>("class_nums");
ctx->SetOutputDim("Rois", {-1, 5});
ctx->SetOutputDim("LabelsInt32", {-1, 1});
ctx->SetOutputDim("BboxTargets", {-1, 5 * class_nums});
ctx->SetOutputDim("BboxInsideWeights", {-1, 5 * class_nums});
ctx->SetOutputDim("BboxOutsideWeights", {-1, 5 * class_nums});
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<framework::LoDTensor>("RpnRois")->type(),
platform::CPUPlace());
}
};
template <typename T>
void Concat(const platform::CPUDeviceContext& context,
const Tensor& in_tensor_a,
const Tensor& in_tensor_b,
Tensor* out_tensor) {
int axis = 0;
std::vector<Tensor> inputs;
inputs.emplace_back(in_tensor_a);
inputs.emplace_back(in_tensor_b);
math::ConcatFunctor<platform::CPUDeviceContext, T> concat_functor;
concat_functor(context, inputs, axis, out_tensor);
}
template <typename T>
std::vector<std::vector<int>> SampleFgBgGt(
const platform::CPUDeviceContext& context,
Tensor* iou,
const Tensor& is_crowd,
const int batch_size_per_im,
const float fg_fraction,
const float fg_thresh,
const float bg_thresh_hi,
const float bg_thresh_lo,
std::minstd_rand engine,
const bool use_random,
const Tensor& rpn_rois) {
std::vector<int> fg_inds;
std::vector<int> bg_inds;
std::vector<int> mapped_gt_inds;
int64_t gt_num = is_crowd.numel();
const int* crowd_data = is_crowd.data<int>();
T* proposal_to_gt_overlaps = iou->data<T>();
int64_t row = iou->dims()[0];
int64_t col = iou->dims()[1];
float epsilon = 0.00001;
const T* rpn_rois_dt = rpn_rois.data<T>();
// Follow the Faster RCNN's implementation
for (int64_t i = 0; i < row; ++i) {
const T* v = proposal_to_gt_overlaps + i * col;
T max_overlap = *std::max_element(v, v + col);
if ((i < gt_num) && (crowd_data[i])) {
max_overlap = -1.0;
}
if (max_overlap >= fg_thresh) {
// fg mapped gt label index
for (int64_t j = 0; j < col; ++j) {
T val = proposal_to_gt_overlaps[i * col + j];
auto diff = std::abs(max_overlap - val);
if (diff < epsilon) {
fg_inds.emplace_back(i);
mapped_gt_inds.emplace_back(j);
break;
}
}
} else if ((max_overlap >= bg_thresh_lo) && (max_overlap < bg_thresh_hi)) {
bg_inds.emplace_back(i);
} else {
continue;
}
}
std::vector<std::vector<int>> res;
// sampling fg
std::uniform_real_distribution<float> uniform(0, 1);
int fg_rois_per_im = std::floor(batch_size_per_im * fg_fraction);
int fg_rois_this_image = fg_inds.size();
int fg_rois_per_this_image = std::min(fg_rois_per_im, fg_rois_this_image);
if (use_random) {
const int64_t fg_size = static_cast<int64_t>(fg_inds.size());
if (fg_size > fg_rois_per_this_image) {
for (int64_t i = fg_rois_per_this_image; i < fg_size; ++i) {
int rng_ind = std::floor(uniform(engine) * i);
if (rng_ind < fg_rois_per_this_image) {
std::iter_swap(fg_inds.begin() + rng_ind, fg_inds.begin() + i);
std::iter_swap(mapped_gt_inds.begin() + rng_ind,
mapped_gt_inds.begin() + i);
}
}
}
}
std::vector<int> new_fg_inds(fg_inds.begin(),
fg_inds.begin() + fg_rois_per_this_image);
std::vector<int> new_gt_inds(mapped_gt_inds.begin(),
mapped_gt_inds.begin() + fg_rois_per_this_image);
// sampling bg
int bg_rois_per_image = batch_size_per_im - fg_rois_per_this_image;
int bg_rois_this_image = bg_inds.size();
int bg_rois_per_this_image = std::min(bg_rois_per_image, bg_rois_this_image);
if (use_random) {
const int64_t bg_size = static_cast<int64_t>(bg_inds.size());
if (bg_size > bg_rois_per_this_image) {
for (int64_t i = bg_rois_per_this_image; i < bg_size; ++i) {
int rng_ind = std::floor(uniform(engine) * i);
if (rng_ind < fg_rois_per_this_image)
std::iter_swap(bg_inds.begin() + rng_ind, bg_inds.begin() + i);
}
}
}
std::vector<int> new_bg_inds(bg_inds.begin(),
bg_inds.begin() + bg_rois_per_this_image);
res.emplace_back(new_fg_inds);
res.emplace_back(new_bg_inds);
res.emplace_back(new_gt_inds);
return res;
}
template <typename T>
void GatherBoxesLabels(const platform::CPUDeviceContext& context,
const Tensor& boxes,
const Tensor& gt_boxes,
const Tensor& gt_classes,
const std::vector<int>& fg_inds,
const std::vector<int>& bg_inds,
const std::vector<int>& gt_inds,
Tensor* sampled_boxes,
Tensor* sampled_labels,
Tensor* sampled_gts) {
int fg_num = fg_inds.size();
int bg_num = bg_inds.size();
Tensor fg_inds_t, bg_inds_t, gt_box_inds_t, gt_label_inds_t;
int* fg_inds_data = fg_inds_t.mutable_data<int>({fg_num}, context.GetPlace());
int* bg_inds_data = bg_inds_t.mutable_data<int>({bg_num}, context.GetPlace());
int* gt_box_inds_data =
gt_box_inds_t.mutable_data<int>({fg_num}, context.GetPlace());
int* gt_label_inds_data =
gt_label_inds_t.mutable_data<int>({fg_num}, context.GetPlace());
std::copy(fg_inds.begin(), fg_inds.end(), fg_inds_data);
std::copy(bg_inds.begin(), bg_inds.end(), bg_inds_data);
std::copy(gt_inds.begin(), gt_inds.end(), gt_box_inds_data);
std::copy(gt_inds.begin(), gt_inds.end(), gt_label_inds_data);
Tensor fg_boxes, bg_boxes, fg_labels, bg_labels;
fg_boxes.mutable_data<T>({fg_num, kBoxDim}, context.GetPlace());
CPUGather<T>(context, boxes, fg_inds_t, &fg_boxes);
bg_boxes.mutable_data<T>({bg_num, kBoxDim}, context.GetPlace());
CPUGather<T>(context, boxes, bg_inds_t, &bg_boxes);
Concat<T>(context, fg_boxes, bg_boxes, sampled_boxes);
CPUGather<T>(context, gt_boxes, gt_box_inds_t, sampled_gts);
fg_labels.mutable_data<int>({fg_num}, context.GetPlace());
CPUGather<int>(context, gt_classes, gt_label_inds_t, &fg_labels);
bg_labels.mutable_data<int>({bg_num}, context.GetPlace());
math::set_constant(context, &bg_labels, 0);
Concat<int>(context, fg_labels, bg_labels, sampled_labels);
}
template <typename T>
std::vector<Tensor> SampleRoisForOneImage(
const platform::CPUDeviceContext& context,
const Tensor& rpn_rois_in,
const Tensor& gt_classes,
const Tensor& is_crowd,
const Tensor& gt_boxes,
const Tensor& im_info,
const int batch_size_per_im,
const float fg_fraction,
const float fg_thresh,
const float bg_thresh_hi,
const float bg_thresh_lo,
const std::vector<float>& bbox_reg_weights,
const int class_nums,
std::minstd_rand engine,
bool use_random,
bool is_cls_agnostic) {
// 1.1 map to original image
auto im_scale = im_info.data<T>()[2];
Tensor rpn_rois_slice;
Tensor rpn_rois;
rpn_rois.mutable_data<T>(rpn_rois_in.dims(), context.GetPlace());
const T* rpn_rois_in_dt = rpn_rois_in.data<T>();
T* rpn_rois_dt = rpn_rois.data<T>();
for (int i = 0; i < rpn_rois.numel(); ++i) {
rpn_rois_dt[i] = rpn_rois_in_dt[i];
}
// 1.2 compute overlaps
int proposals_num = gt_boxes.dims()[0] + rpn_rois.dims()[0];
Tensor boxes;
boxes.mutable_data<T>({proposals_num, kBoxDim}, context.GetPlace());
Concat<T>(context, gt_boxes, rpn_rois, &boxes);
Tensor proposal_to_gt_overlaps;
proposal_to_gt_overlaps.mutable_data<T>({proposals_num, gt_boxes.dims()[0]},
context.GetPlace());
BboxOverlaps2<T>(boxes, gt_boxes, &proposal_to_gt_overlaps);
std::vector<std::vector<int>> fg_bg_gt =
SampleFgBgGt<T>(context,
&proposal_to_gt_overlaps,
is_crowd,
batch_size_per_im,
fg_fraction,
fg_thresh,
bg_thresh_hi,
bg_thresh_lo,
engine,
use_random,
boxes);
std::vector<int> fg_inds = fg_bg_gt[0];
std::vector<int> bg_inds = fg_bg_gt[1];
std::vector<int> mapped_gt_inds = fg_bg_gt[2]; // mapped_gt_labels
Tensor sampled_boxes, sampled_labels, sampled_gts;
int fg_num = fg_inds.size();
int bg_num = bg_inds.size();
int boxes_num = fg_num + bg_num;
framework::DDim bbox_dim({boxes_num, kBoxDim});
sampled_boxes.mutable_data<T>(bbox_dim, context.GetPlace());
sampled_labels.mutable_data<int>({boxes_num}, context.GetPlace());
sampled_gts.mutable_data<T>({fg_num, kBoxDim}, context.GetPlace());
GatherBoxesLabels<T>(context,
boxes,
gt_boxes,
gt_classes,
fg_inds,
bg_inds,
mapped_gt_inds,
&sampled_boxes,
&sampled_labels,
&sampled_gts);
// Compute targets
Tensor bbox_targets_single;
bbox_targets_single.mutable_data<T>(bbox_dim, context.GetPlace());
BoxToDelta2<T>(fg_num,
sampled_boxes,
sampled_gts,
bbox_reg_weights.data(),
&bbox_targets_single);
// Scale rois
Tensor sampled_rois;
sampled_rois.mutable_data<T>(sampled_boxes.dims(), context.GetPlace());
auto sampled_rois_et = framework::EigenTensor<T, 2>::From(sampled_rois);
auto sampled_boxes_et = framework::EigenTensor<T, 2>::From(sampled_boxes);
sampled_rois_et = sampled_boxes_et;
// Expand box targets
Tensor bbox_targets, bbox_inside_weights, bbox_outside_weights;
framework::DDim bbox_expand_dim({boxes_num, kBoxDim * class_nums});
bbox_targets.mutable_data<T>(bbox_expand_dim, context.GetPlace());
bbox_inside_weights.mutable_data<T>(bbox_expand_dim, context.GetPlace());
bbox_outside_weights.mutable_data<T>(bbox_expand_dim, context.GetPlace());
math::set_constant(context, &bbox_targets, 0.0);
math::set_constant(context, &bbox_inside_weights, 0.0);
math::set_constant(context, &bbox_outside_weights, 0.0);
auto* bbox_targets_single_data = bbox_targets_single.data<T>();
auto* sampled_labels_data = sampled_labels.data<int>();
auto* bbox_targets_data = bbox_targets.data<T>();
auto* bbox_inside_weights_data = bbox_inside_weights.data<T>();
auto* bbox_outside_weights_data = bbox_outside_weights.data<T>();
int width = kBoxDim * class_nums;
for (int64_t i = 0; i < boxes_num; ++i) {
int label = sampled_labels_data[i];
if (label > 0) {
if (is_cls_agnostic) {
label = 1;
}
int dst_idx = i * width + kBoxDim * label;
int src_idx = kBoxDim * i;
bbox_targets_data[dst_idx] = bbox_targets_single_data[src_idx];
bbox_targets_data[dst_idx + 1] = bbox_targets_single_data[src_idx + 1];
bbox_targets_data[dst_idx + 2] = bbox_targets_single_data[src_idx + 2];
bbox_targets_data[dst_idx + 3] = bbox_targets_single_data[src_idx + 3];
bbox_targets_data[dst_idx + 4] = bbox_targets_single_data[src_idx + 4];
bbox_inside_weights_data[dst_idx] = 1;
bbox_inside_weights_data[dst_idx + 1] = 1;
bbox_inside_weights_data[dst_idx + 2] = 1;
bbox_inside_weights_data[dst_idx + 3] = 1;
bbox_inside_weights_data[dst_idx + 4] = 1;
bbox_outside_weights_data[dst_idx] = 1;
bbox_outside_weights_data[dst_idx + 1] = 1;
bbox_outside_weights_data[dst_idx + 2] = 1;
bbox_outside_weights_data[dst_idx + 3] = 1;
bbox_outside_weights_data[dst_idx + 4] = 1;
}
}
std::vector<Tensor> res;
res.emplace_back(sampled_rois);
res.emplace_back(sampled_labels);
res.emplace_back(bbox_targets);
res.emplace_back(bbox_inside_weights);
res.emplace_back(bbox_outside_weights);
return res;
}
template <typename T>
class RRPNGenerateProposalLabelsKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* rpn_rois = context.Input<LoDTensor>("RpnRois");
auto* gt_classes = context.Input<LoDTensor>("GtClasses");
auto* is_crowd = context.Input<LoDTensor>("IsCrowd");
auto* gt_boxes = context.Input<LoDTensor>("GtBoxes");
auto* im_info = context.Input<LoDTensor>("ImInfo");
auto* rois = context.Output<LoDTensor>("Rois");
auto* labels_int32 = context.Output<LoDTensor>("LabelsInt32");
auto* bbox_targets = context.Output<LoDTensor>("BboxTargets");
auto* bbox_inside_weights = context.Output<LoDTensor>("BboxInsideWeights");
auto* bbox_outside_weights =
context.Output<LoDTensor>("BboxOutsideWeights");
int batch_size_per_im = context.Attr<int>("batch_size_per_im");
float fg_fraction = context.Attr<float>("fg_fraction");
float fg_thresh = context.Attr<float>("fg_thresh");
float bg_thresh_hi = context.Attr<float>("bg_thresh_hi");
float bg_thresh_lo = context.Attr<float>("bg_thresh_lo");
std::vector<float> bbox_reg_weights =
context.Attr<std::vector<float>>("bbox_reg_weights");
int class_nums = context.Attr<int>("class_nums");
bool use_random = context.Attr<bool>("use_random");
bool is_cls_agnostic = context.Attr<bool>("is_cls_agnostic");
PADDLE_ENFORCE_EQ(
rpn_rois->lod().size(),
1UL,
"RRPNGenerateProposalLabelsOp rpn_rois needs 1 level of LoD");
PADDLE_ENFORCE_EQ(
gt_classes->lod().size(),
1UL,
"RRPNGenerateProposalLabelsOp gt_classes needs 1 level of LoD");
PADDLE_ENFORCE_EQ(
is_crowd->lod().size(),
1UL,
"RRPNGenerateProposalLabelsOp is_crowd needs 1 level of LoD");
PADDLE_ENFORCE_EQ(
gt_boxes->lod().size(),
1UL,
"RRPNGenerateProposalLabelsOp gt_boxes needs 1 level of LoD");
int64_t n = static_cast<int64_t>(rpn_rois->lod().back().size() - 1);
rois->mutable_data<T>({n * batch_size_per_im, kBoxDim}, context.GetPlace());
labels_int32->mutable_data<int>({n * batch_size_per_im, 1},
context.GetPlace());
bbox_targets->mutable_data<T>({n * batch_size_per_im, kBoxDim * class_nums},
context.GetPlace());
bbox_inside_weights->mutable_data<T>(
{n * batch_size_per_im, kBoxDim * class_nums}, context.GetPlace());
bbox_outside_weights->mutable_data<T>(
{n * batch_size_per_im, kBoxDim * class_nums}, context.GetPlace());
std::random_device rnd;
std::minstd_rand engine;
int seed = rnd();
engine.seed(seed);
framework::LoD lod;
std::vector<size_t> lod0(1, 0);
int64_t num_rois = 0;
auto& dev_ctx = context.device_context<platform::CPUDeviceContext>();
auto rpn_rois_lod = rpn_rois->lod().back();
auto gt_classes_lod = gt_classes->lod().back();
auto is_crowd_lod = is_crowd->lod().back();
auto gt_boxes_lod = gt_boxes->lod().back();
for (int i = 0; i < n; ++i) {
if (rpn_rois_lod[i] == rpn_rois_lod[i + 1]) {
lod0.emplace_back(num_rois);
continue;
}
Tensor rpn_rois_slice =
rpn_rois->Slice(rpn_rois_lod[i], rpn_rois_lod[i + 1]);
Tensor gt_classes_slice =
gt_classes->Slice(gt_classes_lod[i], gt_classes_lod[i + 1]);
Tensor is_crowd_slice =
is_crowd->Slice(is_crowd_lod[i], is_crowd_lod[i + 1]);
Tensor gt_boxes_slice =
gt_boxes->Slice(gt_boxes_lod[i], gt_boxes_lod[i + 1]);
Tensor im_info_slice = im_info->Slice(i, i + 1);
std::vector<Tensor> tensor_output =
SampleRoisForOneImage<T>(dev_ctx,
rpn_rois_slice,
gt_classes_slice,
is_crowd_slice,
gt_boxes_slice,
im_info_slice,
batch_size_per_im,
fg_fraction,
fg_thresh,
bg_thresh_hi,
bg_thresh_lo,
bbox_reg_weights,
class_nums,
engine,
use_random,
is_cls_agnostic);
Tensor sampled_rois = tensor_output[0];
Tensor sampled_labels_int32 = tensor_output[1];
Tensor sampled_bbox_targets = tensor_output[2];
Tensor sampled_bbox_inside_weights = tensor_output[3];
Tensor sampled_bbox_outside_weights = tensor_output[4];
AppendRois<T>(rois, kBoxDim * num_rois, &sampled_rois);
AppendRois<int>(labels_int32, num_rois, &sampled_labels_int32);
AppendRois<T>(
bbox_targets, kBoxDim * num_rois * class_nums, &sampled_bbox_targets);
AppendRois<T>(bbox_inside_weights,
kBoxDim * num_rois * class_nums,
&sampled_bbox_inside_weights);
AppendRois<T>(bbox_outside_weights,
kBoxDim * num_rois * class_nums,
&sampled_bbox_outside_weights);
num_rois += sampled_rois.dims()[0];
lod0.emplace_back(num_rois);
}
lod.emplace_back(lod0);
rois->set_lod(lod);
labels_int32->set_lod(lod);
bbox_targets->set_lod(lod);
bbox_inside_weights->set_lod(lod);
bbox_outside_weights->set_lod(lod);
rois->Resize({num_rois, kBoxDim});
labels_int32->Resize({num_rois, 1});
bbox_targets->Resize({num_rois, kBoxDim * class_nums});
bbox_inside_weights->Resize({num_rois, kBoxDim * class_nums});
bbox_outside_weights->Resize({num_rois, kBoxDim * class_nums});
}
};
class RRPNGenerateProposalLabelsOpMaker
: public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("RpnRois",
"(LoDTensor), This input is a 2D LoDTensor with shape [N, 5]. "
"N is the number of the GenerateProposalOp's output, "
"each element is a bounding box with [x, y, w, h, angle] format.");
AddInput("GtClasses",
"(LoDTensor), This input is a 2D LoDTensor with shape [M, 1]. "
"M is the number of groundtruth, "
"each element is a class label of groundtruth.");
AddInput(
"IsCrowd",
"(LoDTensor), This input is a 2D LoDTensor with shape [M, 1]. "
"M is the number of groundtruth, "
"each element is a flag indicates whether a groundtruth is crowd.");
AddInput("GtBoxes",
"(LoDTensor), This input is a 2D LoDTensor with shape [M, 5. "
"M is the number of groundtruth, "
"each element is a bounding box with [x, y, w, h, angle] format.");
AddInput("ImInfo",
"(Tensor), This input is a 2D Tensor with shape [B, 3]. "
"B is the number of input images, "
"each element consists of im_height, im_width, im_scale.");
AddOutput(
"Rois",
"(LoDTensor), This output is a 2D LoDTensor with shape [P, 5]. "
"P usuall equal to batch_size_per_im * batch_size, "
"each element is a bounding box with [x, y, w, h ,angle] format.");
AddOutput("LabelsInt32",
"(LoDTensor), This output is a 2D LoDTensor with shape [P, 1], "
"each element repersents a class label of a roi");
AddOutput("BboxTargets",
"(LoDTensor), This output is a 2D LoDTensor with shape [P, 5 * "
"class_nums], "
"each element repersents a box label of a roi");
AddOutput(
"BboxInsideWeights",
"(LoDTensor), This output is a 2D LoDTensor with shape [P, 5 * "
"class_nums], "
"each element indicates whether a box should contribute to loss.");
AddOutput(
"BboxOutsideWeights",
"(LoDTensor), This output is a 2D LoDTensor with shape [P, 5 * "
"class_nums], "
"each element indicates whether a box should contribute to loss.");
AddAttr<int>("batch_size_per_im", "Batch size of rois per images.");
AddAttr<float>("fg_fraction",
"Foreground fraction in total batch_size_per_im.");
AddAttr<float>(
"fg_thresh",
"Overlap threshold which is used to chose foreground sample.");
AddAttr<float>("bg_thresh_hi",
"Overlap threshold upper bound which is used to chose "
"background sample.");
AddAttr<float>("bg_thresh_lo",
"Overlap threshold lower bound which is used to chose "
"background sample.");
AddAttr<std::vector<float>>("bbox_reg_weights", "Box regression weights.");
AddAttr<int>("class_nums", "Class number.");
AddAttr<bool>(
"use_random",
"Use random sampling to choose foreground and background boxes.")
.SetDefault(true);
AddAttr<bool>(
"is_cls_agnostic",
"the box regress will only include fg and bg locations if set true ")
.SetDefault(false);
AddComment(R"DOC(
This operator can be, for given the RotatedGenerateProposalOp output rotated bounding boxes and groundtruth,
to sample foreground boxes and background boxes, and compute loss target.
RpnRois is the output boxes of RPN and was processed by rotated_generate_proposal_op, these boxes
were combined with groundtruth boxes and sampled according to batch_size_per_im and fg_fraction,
If an instance with a groundtruth overlap greater than fg_thresh, then it was considered as a foreground sample.
If an instance with a groundtruth overlap greater than bg_thresh_lo and lower than bg_thresh_hi,
then it was considered as a background sample.
After all foreground and background boxes are chosen (so called Rois),
then we apply random sampling to make sure
the number of foreground boxes is no more than batch_size_per_im * fg_fraction.
For each box in Rois, we assign the classification (class label) and regression targets (box label) to it.
Finally BboxInsideWeights and BboxOutsideWeights are used to specify whether it would contribute to training loss.
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(
rrpn_generate_proposal_labels,
ops::RRPNGenerateProposalLabelsOp,
ops::RRPNGenerateProposalLabelsOpMaker,
paddle::framework::EmptyGradOpMaker<paddle::framework::OpDesc>,
paddle::framework::EmptyGradOpMaker<paddle::imperative::OpBase>);
REGISTER_OP_CPU_KERNEL(rrpn_generate_proposal_labels,
ops::RRPNGenerateProposalLabelsKernel<float>,
ops::RRPNGenerateProposalLabelsKernel<double>);
/*opyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <cmath>
#include <cstring>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include "gather.h"
#include "math_function.h"
#include "paddle/fluid/framework/op_registry.h"
#include "safe_ref.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
static const double kBBoxClipDefault = std::log(1000.0 / 16.0);
#define PI 3.141592654
static void RRPNAppendProposals(Tensor *dst,
int64_t offset,
const Tensor &src) {
auto *out_data = dst->data<void>();
auto *to_add_data = src.data<void>();
size_t size_of_t = framework::SizeOfType(src.type());
offset *= size_of_t;
std::memcpy(
reinterpret_cast<void *>(reinterpret_cast<uintptr_t>(out_data) + offset),
to_add_data,
src.numel() * size_of_t);
}
template <class T>
inline T axr(T x, T r) {
return 0.5 * PI * r * r - x * sqrt(r * r - x * x) - r * r * std::asin(x / r);
}
class RRPNGenerateProposalsOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Scores"), "Input(Scores) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("BboxDeltas"),
"Input(BboxDeltas) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("ImInfo"), "Input(ImInfo) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("Anchors"),
"Input(Anchors) shouldn't be null.");
PADDLE_ENFORCE(ctx->HasInput("Variances"),
"Input(Variances) shouldn't be null.");
ctx->SetOutputDim("RpnRois", {-1, 5});
ctx->SetOutputDim("RpnRoiProbs", {-1, 1});
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext &ctx) const override {
return framework::OpKernelType(ctx.Input<Tensor>("Anchors")->type(),
ctx.device_context());
}
};
template <class T>
static inline void RBoxCoder(const platform::DeviceContext &ctx,
Tensor *all_anchors,
Tensor *bbox_deltas,
Tensor *variances,
Tensor *proposals) {
T *proposals_data = proposals->mutable_data<T>(ctx.GetPlace());
int64_t row = all_anchors->dims()[0];
int64_t len = all_anchors->dims()[1];
auto *bbox_deltas_data = bbox_deltas->data<T>();
auto *anchor_data = all_anchors->data<T>();
const T *variances_data = nullptr;
if (variances) {
variances_data = variances->data<T>();
}
for (int64_t i = 0; i < row; ++i) {
T anchor_width = anchor_data[i * len + 2];
T anchor_height = anchor_data[i * len + 3];
T anchor_angle = anchor_data[i * len + 4];
T anchor_center_x = anchor_data[i * len];
T anchor_center_y = anchor_data[i * len + 1];
T bbox_center_x = 0, bbox_center_y = 0;
T bbox_width = 0, bbox_height = 0, bbox_angle = 0;
if (variances) {
bbox_center_x =
bbox_deltas_data[i * len] / variances_data[i * len] * anchor_width +
anchor_center_x;
bbox_center_y = bbox_deltas_data[i * len + 1] /
variances_data[i * len + 1] * anchor_height +
anchor_center_y;
bbox_width = std::exp(std::min<T>(bbox_deltas_data[i * len + 2] /
variances_data[i * len + 2],
kBBoxClipDefault)) *
anchor_width;
bbox_height = std::exp(std::min<T>(bbox_deltas_data[i * len + 3] /
variances_data[i * len + 3],
kBBoxClipDefault)) *
anchor_height;
bbox_angle =
(bbox_deltas_data[i * len + 4] / variances_data[i * len + 4]) * 1.0 /
PI * 180 +
anchor_angle;
} else {
bbox_center_x =
bbox_deltas_data[i * len] * anchor_width + anchor_center_x;
bbox_center_y =
bbox_deltas_data[i * len + 1] * anchor_height + anchor_center_y;
bbox_width = std::exp(std::min<T>(bbox_deltas_data[i * len + 2],
kBBoxClipDefault)) *
anchor_width;
bbox_height = std::exp(std::min<T>(bbox_deltas_data[i * len + 3],
kBBoxClipDefault)) *
anchor_height;
bbox_angle =
bbox_deltas_data[i * len + 4] * 1.0 / PI * 180 + anchor_angle;
}
proposals_data[i * len] = bbox_center_x;
proposals_data[i * len + 1] = bbox_center_y;
proposals_data[i * len + 2] = bbox_width;
proposals_data[i * len + 3] = bbox_height;
proposals_data[i * len + 4] = bbox_angle;
}
// return proposals;
}
template <class T>
static inline void RFilterBoxes(const platform::DeviceContext &ctx,
Tensor *boxes,
float min_size,
const Tensor &im_info,
Tensor *keep) {
T *boxes_data = boxes->mutable_data<T>(ctx.GetPlace());
keep->Resize({boxes->dims()[0]});
min_size = std::max(min_size, 0.0f);
int *keep_data = keep->mutable_data<int>(ctx.GetPlace());
int keep_len = 0;
for (int i = 0; i < boxes->dims()[0]; ++i) {
T ws = boxes_data[5 * i + 2];
T hs = boxes_data[5 * i + 3];
if (ws >= min_size && hs >= min_size) {
keep_data[keep_len++] = i;
}
}
keep->Resize({keep_len});
}
template <class T>
static inline std::vector<std::pair<T, int>> GetSortedScoreIndex(
const std::vector<T> &scores) {
std::vector<std::pair<T, int>> sorted_indices;
sorted_indices.reserve(scores.size());
for (size_t i = 0; i < scores.size(); ++i) {
sorted_indices.emplace_back(scores[i], i);
}
// Sort the score pair according to the scores in descending order
std::stable_sort(sorted_indices.begin(),
sorted_indices.end(),
[](const std::pair<T, int> &a, const std::pair<T, int> &b) {
return a.first < b.first;
});
return sorted_indices;
}
template <typename T>
static inline Tensor VectorToTensor(const std::vector<T> &selected_indices,
int selected_num) {
Tensor keep_nms;
keep_nms.Resize({selected_num});
auto *keep_data = keep_nms.mutable_data<T>(platform::CPUPlace());
for (int i = 0; i < selected_num; ++i) {
keep_data[i] = selected_indices[i];
}
return keep_nms;
}
template <typename T>
inline T trangle_area(T *a, T *b, T *c) {
return ((a[0] - c[0]) * (b[1] - c[1]) - (a[1] - c[1]) * (b[0] - c[0])) / 2.0;
}
template <typename T>
inline T area(T *int_pts, int num_of_inter) {
float area = 0.0;
for (int i = 0; i < num_of_inter - 2; i++) {
area +=
fabs(trangle_area(int_pts, int_pts + 2 * i + 2, int_pts + 2 * i + 4));
}
return area;
}
template <typename T>
inline void reorder_pts(T *int_pts, int num_of_inter) {
if (num_of_inter > 0) {
float center[2];
center[0] = 0.0;
center[1] = 0.0;
for (int i = 0; i < num_of_inter; i++) {
center[0] += int_pts[2 * i];
center[1] += int_pts[2 * i + 1];
}
center[0] /= num_of_inter;
center[1] /= num_of_inter;
float vs[16];
float v[2];
float d;
for (int i = 0; i < num_of_inter; i++) {
v[0] = int_pts[2 * i] - center[0];
v[1] = int_pts[2 * i + 1] - center[1];
d = sqrt(v[0] * v[0] + v[1] * v[1]);
v[0] = v[0] / d;
v[1] = v[1] / d;
if (v[1] < 0) {
v[0] = -2 - v[0];
}
vs[i] = v[0];
}
float temp, tx, ty;
int j;
for (int i = 1; i < num_of_inter; ++i) {
if (vs[i - 1] > vs[i]) {
temp = vs[i];
tx = int_pts[2 * i];
ty = int_pts[2 * i + 1];
j = i;
while (j > 0 && vs[j - 1] > temp) {
vs[j] = vs[j - 1];
int_pts[j * 2] = int_pts[j * 2 - 2];
int_pts[j * 2 + 1] = int_pts[j * 2 - 1];
j--;
}
vs[j] = temp;
int_pts[j * 2] = tx;
int_pts[j * 2 + 1] = ty;
}
}
}
}
template <typename T>
inline bool inter2line(T *pts1, T *pts2, int i, int j, T *temp_pts) {
T a[2];
T b[2];
T c[2];
T d[2];
T area_abc, area_abd, area_cda, area_cdb;
a[0] = pts1[2 * i];
a[1] = pts1[2 * i + 1];
b[0] = pts1[2 * ((i + 1) % 4)];
b[1] = pts1[2 * ((i + 1) % 4) + 1];
c[0] = pts2[2 * j];
c[1] = pts2[2 * j + 1];
d[0] = pts2[2 * ((j + 1) % 4)];
d[1] = pts2[2 * ((j + 1) % 4) + 1];
area_abc = trangle_area(a, b, c);
area_abd = trangle_area(a, b, d);
if (area_abc * area_abd >= 0) {
return false;
}
area_cda = trangle_area(c, d, a);
area_cdb = area_cda + area_abc - area_abd;
if (area_cda * area_cdb >= 0) {
return false;
}
float t = area_cda / (area_abd - area_abc);
float dx = t * (b[0] - a[0]);
float dy = t * (b[1] - a[1]);
temp_pts[0] = a[0] + dx;
temp_pts[1] = a[1] + dy;
return true;
}
template <typename T>
inline bool in_rect(T pt_x, T pt_y, T *pts) {
float ab[2];
float ad[2];
float ap[2];
float abab;
float abap;
float adad;
float adap;
ab[0] = pts[2] - pts[0];
ab[1] = pts[3] - pts[1];
ad[0] = pts[6] - pts[0];
ad[1] = pts[7] - pts[1];
ap[0] = pt_x - pts[0];
ap[1] = pt_y - pts[1];
abab = ab[0] * ab[0] + ab[1] * ab[1];
abap = ab[0] * ap[0] + ab[1] * ap[1];
adad = ad[0] * ad[0] + ad[1] * ad[1];
adap = ad[0] * ap[0] + ad[1] * ap[1];
return abab >= abap and abap >= 0 and adad >= adap and adap >= 0;
}
template <typename T>
inline int inter_pts(T *pts1, T *pts2, T *int_pts) {
int num_of_inter = 0;
for (int i = 0; i < 4; i++) {
if (in_rect(pts1[2 * i], pts1[2 * i + 1], pts2)) {
int_pts[num_of_inter * 2] = pts1[2 * i];
int_pts[num_of_inter * 2 + 1] = pts1[2 * i + 1];
num_of_inter++;
}
if (in_rect(pts2[2 * i], pts2[2 * i + 1], pts1)) {
int_pts[num_of_inter * 2] = pts2[2 * i];
int_pts[num_of_inter * 2 + 1] = pts2[2 * i + 1];
num_of_inter++;
}
}
T temp_pts[2];
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
bool has_pts = inter2line(pts1, pts2, i, j, temp_pts);
if (has_pts) {
int_pts[num_of_inter * 2] = temp_pts[0];
int_pts[num_of_inter * 2 + 1] = temp_pts[1];
num_of_inter++;
}
}
}
return num_of_inter;
}
template <typename T>
inline void convert_region(T *pts, const T *region) {
float angle = region[4];
float a_cos = cos(angle / 180.0 * PI);
float a_sin = -sin(angle / 180.0 * PI); // anti clock-wise
float ctr_x = region[0];
float ctr_y = region[1];
float h = region[3];
float w = region[2];
float pts_x[4];
float pts_y[4];
pts_x[0] = -w / 2;
pts_x[1] = -w / 2;
pts_x[2] = w / 2;
pts_x[3] = w / 2;
pts_y[0] = -h / 2;
pts_y[1] = h / 2;
pts_y[2] = h / 2;
pts_y[3] = -h / 2;
for (int i = 0; i < 4; i++) {
pts[2 * i] = a_cos * pts_x[i] - a_sin * pts_y[i] + ctr_x;
pts[2 * i + 1] = a_sin * pts_x[i] + a_cos * pts_y[i] + ctr_y;
}
}
template <typename T>
inline float inter(const T *region1, const T *region2) {
T pts1[8];
T pts2[8];
T int_pts[16];
int num_of_inter;
convert_region<T>(pts1, region1);
convert_region<T>(pts2, region2);
num_of_inter = inter_pts<T>(pts1, pts2, int_pts);
reorder_pts<T>(int_pts, num_of_inter);
return area<T>(int_pts, num_of_inter);
}
template <typename T>
inline float DevRotateIoU(const T *region1, const T *region2) {
T area1 = region1[2] * region1[3];
T area2 = region2[2] * region2[3];
T area_inter = inter<T>(region1, region2);
return area_inter / (area1 + area2 - area_inter);
}
template <class T>
static inline Tensor RNMS(const platform::DeviceContext &ctx,
Tensor *bbox,
Tensor *scores,
T nms_threshold) {
PADDLE_ENFORCE_NOT_NULL(bbox);
int64_t num_boxes = bbox->dims()[0];
// 4: [xmin ymin xmax ymax]
int64_t box_size = bbox->dims()[1];
std::vector<T> scores_data(num_boxes);
std::copy_n(scores->data<T>(), num_boxes, scores_data.begin());
std::vector<std::pair<T, int>> sorted_indices =
GetSortedScoreIndex<T>(scores_data);
std::vector<int> selected_indices;
int selected_num = 0;
T adaptive_threshold = nms_threshold;
const T *bbox_data = bbox->data<T>();
while (sorted_indices.size() != 0) {
int idx = sorted_indices.back().second;
bool flag = true;
for (int kept_idx : selected_indices) {
if (flag) {
T overlap = DevRotateIoU<T>(bbox_data + idx * box_size,
bbox_data + kept_idx * box_size);
flag = (overlap <= adaptive_threshold);
} else {
break;
}
}
if (flag) {
selected_indices.push_back(idx);
++selected_num;
}
sorted_indices.erase(sorted_indices.end() - 1);
}
return VectorToTensor(selected_indices, selected_num);
}
template <typename T>
class RRPNGenerateProposalsKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &context) const override {
auto *scores = context.Input<Tensor>("Scores");
auto *bbox_deltas = context.Input<Tensor>("BboxDeltas");
auto *im_info = context.Input<Tensor>("ImInfo");
auto anchors = detail::Ref(context.Input<Tensor>("Anchors"),
"Cannot find input Anchors(%s) in scope",
context.InputNames("Anchors")[0]);
auto variances = detail::Ref(context.Input<Tensor>("Variances"),
"Cannot find input Variances(%s) in scope",
context.InputNames("Variances")[0]);
auto *rpn_rois = context.Output<LoDTensor>("RpnRois");
auto *rpn_roi_probs = context.Output<LoDTensor>("RpnRoiProbs");
int pre_nms_top_n = context.Attr<int>("pre_nms_topN");
int post_nms_top_n = context.Attr<int>("post_nms_topN");
float nms_thresh = context.Attr<float>("nms_thresh");
float min_size = context.Attr<float>("min_size");
auto &dev_ctx =
context.template device_context<platform::CPUDeviceContext>();
auto &scores_dim = scores->dims();
int64_t num = scores_dim[0];
int64_t c_score = scores_dim[1];
int64_t h_score = scores_dim[2];
int64_t w_score = scores_dim[3];
auto &bbox_dim = bbox_deltas->dims();
int64_t c_bbox = bbox_dim[1];
int64_t h_bbox = bbox_dim[2];
int64_t w_bbox = bbox_dim[3];
rpn_rois->mutable_data<T>({bbox_deltas->numel() / 5, 5},
context.GetPlace());
rpn_roi_probs->mutable_data<T>({scores->numel(), 1}, context.GetPlace());
Tensor bbox_deltas_swap, scores_swap;
bbox_deltas_swap.mutable_data<T>({num, h_bbox, w_bbox, c_bbox},
dev_ctx.GetPlace());
scores_swap.mutable_data<T>({num, h_score, w_score, c_score},
dev_ctx.GetPlace());
math::Transpose<platform::CPUDeviceContext, T, 4> trans;
std::vector<int> axis = {0, 2, 3, 1};
trans(dev_ctx, *bbox_deltas, &bbox_deltas_swap, axis);
trans(dev_ctx, *scores, &scores_swap, axis);
framework::LoD lod;
lod.resize(1);
auto &lod0 = lod[0];
lod0.push_back(0);
anchors.Resize({anchors.numel() / 5, 5});
variances.Resize({variances.numel() / 5, 5});
int64_t num_proposals = 0;
for (int64_t i = 0; i < num; ++i) {
Tensor im_info_slice = im_info->Slice(i, i + 1);
Tensor bbox_deltas_slice = bbox_deltas_swap.Slice(i, i + 1);
Tensor scores_slice = scores_swap.Slice(i, i + 1);
bbox_deltas_slice.Resize({h_bbox * w_bbox * c_bbox / 5, 5});
scores_slice.Resize({h_score * w_score * c_score, 1});
std::pair<Tensor, Tensor> tensor_pair =
ProposalForOneImage(dev_ctx,
im_info_slice,
anchors,
variances,
bbox_deltas_slice,
scores_slice,
pre_nms_top_n,
post_nms_top_n,
nms_thresh,
min_size);
Tensor &proposals = tensor_pair.first;
Tensor &scores = tensor_pair.second;
RRPNAppendProposals(rpn_rois, 5 * num_proposals, proposals);
RRPNAppendProposals(rpn_roi_probs, num_proposals, scores);
num_proposals += proposals.dims()[0];
lod0.push_back(num_proposals);
}
rpn_rois->set_lod(lod);
rpn_roi_probs->set_lod(lod);
rpn_rois->Resize({num_proposals, 5});
rpn_roi_probs->Resize({num_proposals, 1});
}
std::pair<Tensor, Tensor> ProposalForOneImage(
const platform::CPUDeviceContext &ctx,
const Tensor &im_info_slice,
const Tensor &anchors,
const Tensor &variances,
const Tensor &bbox_deltas_slice, // [M, 5]
const Tensor &scores_slice, // [N, 1]
int pre_nms_top_n,
int post_nms_top_n,
float nms_thresh,
float min_size) const {
auto *scores_data = scores_slice.data<T>();
// Sort index
Tensor index_t;
index_t.Resize({scores_slice.numel()});
int *index = index_t.mutable_data<int>(ctx.GetPlace());
for (int i = 0; i < scores_slice.numel(); ++i) {
index[i] = i;
}
auto compare = [scores_data](const int64_t &i, const int64_t &j) {
return scores_data[i] > scores_data[j];
};
if (pre_nms_top_n <= 0 || pre_nms_top_n >= scores_slice.numel()) {
std::sort(index, index + scores_slice.numel(), compare);
} else {
std::nth_element(
index, index + pre_nms_top_n, index + scores_slice.numel(), compare);
index_t.Resize({pre_nms_top_n});
}
Tensor scores_sel, bbox_sel, anchor_sel, var_sel;
scores_sel.mutable_data<T>({index_t.numel(), 1}, ctx.GetPlace());
bbox_sel.mutable_data<T>({index_t.numel(), 5}, ctx.GetPlace());
anchor_sel.mutable_data<T>({index_t.numel(), 5}, ctx.GetPlace());
var_sel.mutable_data<T>({index_t.numel(), 5}, ctx.GetPlace());
CPUGather<T>(ctx, scores_slice, index_t, &scores_sel);
CPUGather<T>(ctx, bbox_deltas_slice, index_t, &bbox_sel);
CPUGather<T>(ctx, anchors, index_t, &anchor_sel);
CPUGather<T>(ctx, variances, index_t, &var_sel);
auto *scores_ = scores_sel.data<T>();
Tensor proposals;
proposals.mutable_data<T>({index_t.numel(), 5}, ctx.GetPlace());
RBoxCoder<T>(ctx, &anchor_sel, &bbox_sel, &var_sel, &proposals);
Tensor keep;
RFilterBoxes<T>(ctx, &proposals, min_size, im_info_slice, &keep);
Tensor scores_filter;
bbox_sel.mutable_data<T>({keep.numel(), 5}, ctx.GetPlace());
scores_filter.mutable_data<T>({keep.numel(), 1}, ctx.GetPlace());
CPUGather<T>(ctx, proposals, keep, &bbox_sel);
CPUGather<T>(ctx, scores_sel, keep, &scores_filter);
if (nms_thresh <= 0) {
return std::make_pair(bbox_sel, scores_filter);
}
Tensor keep_nms = RNMS<T>(ctx, &bbox_sel, &scores_filter, nms_thresh);
if (post_nms_top_n > 0 && post_nms_top_n < keep_nms.numel()) {
keep_nms.Resize({post_nms_top_n});
}
proposals.mutable_data<T>({keep_nms.numel(), 5}, ctx.GetPlace());
scores_sel.mutable_data<T>({keep_nms.numel(), 1}, ctx.GetPlace());
CPUGather<T>(ctx, bbox_sel, keep_nms, &proposals);
CPUGather<T>(ctx, scores_filter, keep_nms, &scores_sel);
return std::make_pair(proposals, scores_sel);
}
};
class RRPNGenerateProposalsOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("Scores",
"(Tensor) The scores from conv is in shape (N, A, H, W), "
"N is batch size, A is number of anchors, "
"H and W are height and width of the feature map");
AddInput("BboxDeltas",
"(Tensor) Bounding box deltas from conv is in "
"shape (N, 5*A, H, W).");
AddInput("ImInfo",
"(Tensor) Information for image reshape is in shape (N, 3), "
"in format (height, width, scale)");
AddInput("Anchors",
"(Tensor) Bounding box anchors from anchor_generator_op "
"is in shape (A, H, W, 5).");
AddInput("Variances",
"(Tensor) Bounding box variances with same shape as `Anchors`.");
AddOutput("RpnRois",
"(LoDTensor), Output proposals with shape (rois_num, 5).");
AddOutput("RpnRoiProbs",
"(LoDTensor) Scores of proposals with shape (rois_num, 1).");
AddAttr<int>("pre_nms_topN",
"Number of top scoring RPN proposals to keep before "
"applying NMS.");
AddAttr<int>("post_nms_topN",
"Number of top scoring RPN proposals to keep after "
"applying NMS");
AddAttr<float>("nms_thresh", "NMS threshold used on RPN proposals.");
AddAttr<float>("min_size",
"Proposal height and width both need to be greater "
"than this min_size.");
AddComment(R"DOC(
This operator Generate bounding box proposals for Faster RCNN.
The propoasls are generated for a list of images based on image
score 'Scores', bounding box regression result 'BboxDeltas' as
well as predefined bounding box shapes 'anchors'. Greedy
non-maximum suppression is applied to generate the final bounding
boxes.
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(
rrpn_generate_proposals,
ops::RRPNGenerateProposalsOp,
ops::RRPNGenerateProposalsOpMaker,
paddle::framework::EmptyGradOpMaker<paddle::framework::OpDesc>,
paddle::framework::EmptyGradOpMaker<paddle::imperative::OpBase>);
REGISTER_OP_CPU_KERNEL(rrpn_generate_proposals,
ops::RRPNGenerateProposalsKernel<float>,
ops::RRPNGenerateProposalsKernel<double>);
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Based on
--------------------------------------------------------
@misc{ma2019rrpn,
author = {Jianqi Ma},
title = {{RRPN in pytorch}},
year = {2019},
howpublished = {\url{https://github.com/mjq11302010044/RRPN_pytorch}},
}
@article{Jianqi17RRPN,
Author = {Jianqi Ma and Weiyuan Shao and Hao Ye and Li Wang and Hong Wang
and Yingbin Zheng and Xiangyang Xue},
Title = {Arbitrary-Oriented Scene Text Detection via Rotation Proposals},
journal = {IEEE Transactions on Multimedia},
volume={20},
number={11},
pages={3111-3122},
year={2018}
}
--------------------------------------------------------
*/
#include <paddle/fluid/memory/allocation/allocator.h>
#include <stdio.h>
#include <string>
#include <vector>
#include "cub/cub/cub.cuh"
#include "gather.cu.h"
#include "math_function.h"
#include "paddle/fluid/framework/mixed_vector.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/memory/memory.h"
#include "paddle/fluid/platform/for_range.h"
#include "safe_ref.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
#define PI 3.141592654
namespace {
#define DIVUP(m, n) ((m) / (n) + ((m) % (n) > 0))
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
i += blockDim.x * gridDim.x)
int const kThreadsPerBlock = sizeof(uint64_t) * 8;
static const double kBBoxClipDefault = std::log(1000.0 / 16.0);
struct RangeInitFunctor {
int start_;
int delta_;
int *out_;
__device__ void operator()(size_t i) { out_[i] = start_ + i * delta_; }
};
template <typename T>
static void RSortDescending(const platform::CUDADeviceContext &ctx,
const Tensor &value,
Tensor *value_out,
Tensor *index_out) {
int num = static_cast<int>(value.numel());
Tensor index_in_t;
int *idx_in = index_in_t.mutable_data<int>({num}, ctx.GetPlace());
platform::ForRange<platform::CUDADeviceContext> for_range(ctx, num);
for_range(RangeInitFunctor{0, 1, idx_in});
int *idx_out = index_out->mutable_data<int>({num}, ctx.GetPlace());
const T *keys_in = value.data<T>();
T *keys_out = value_out->mutable_data<T>({num}, ctx.GetPlace());
// Determine temporary device storage requirements
size_t temp_storage_bytes = 0;
cub::DeviceRadixSort::SortPairsDescending<T, int>(
nullptr, temp_storage_bytes, keys_in, keys_out, idx_in, idx_out, num);
// Allocate temporary storage
auto place = boost::get<platform::CUDAPlace>(ctx.GetPlace());
auto d_temp_storage = memory::Alloc(place, temp_storage_bytes);
// Run sorting operation
cub::DeviceRadixSort::SortPairsDescending<T, int>(d_temp_storage->ptr(),
temp_storage_bytes,
keys_in,
keys_out,
idx_in,
idx_out,
num);
}
template <typename T>
struct RBoxDecodeAndClipFunctor {
const T *anchor;
const T *deltas;
const T *var;
const int *index;
const T *im_info;
T *proposals;
RBoxDecodeAndClipFunctor(const T *anchor,
const T *deltas,
const T *var,
const int *index,
const T *im_info,
T *proposals)
: anchor(anchor),
deltas(deltas),
var(var),
index(index),
im_info(im_info),
proposals(proposals) {}
T bbox_clip_default{static_cast<T>(kBBoxClipDefault)};
__device__ void operator()(size_t i) {
int k = index[i] * 5;
T w = anchor[k + 2];
T h = anchor[k + 3];
T cx = anchor[k];
T cy = anchor[k + 1];
T angle = anchor[k + 4];
T de_cx = deltas[k];
T de_cy = deltas[k + 1];
T de_w = deltas[k + 2];
T de_h = deltas[k + 3];
T de_g = deltas[k + 4];
T d_cx, d_cy, d_w, d_h, d_g;
if (var) {
d_cx = cx + de_cx * w / var[k];
d_cy = cy + de_cy * h / var[k + 1];
d_w = exp(Min(de_w / var[k + 2], bbox_clip_default)) * w;
d_h = exp(Min(de_h / var[k + 3], bbox_clip_default)) * h;
d_g = de_g / var[k + 4] * 1.0 / PI * 180 + angle;
} else {
d_cx = cx + de_cx * w;
d_cy = cy + de_cy * h;
d_w = exp(Min(de_w, bbox_clip_default)) * w;
d_h = exp(Min(de_h, bbox_clip_default)) * h;
d_g = de_g * 1.0 / PI * 180 + angle;
}
proposals[i * 5] = d_cx;
proposals[i * 5 + 1] = d_cy;
proposals[i * 5 + 2] = d_w;
proposals[i * 5 + 3] = d_h;
proposals[i * 5 + 4] = d_g;
}
__device__ __forceinline__ T Min(T a, T b) const { return a > b ? b : a; }
__device__ __forceinline__ T Max(T a, T b) const { return a > b ? a : b; }
};
template <typename T, int BlockSize>
static __global__ void RFilterBBoxes(const T *bboxes,
const T *im_info,
const T min_size,
const int num,
int *keep_num,
int *keep) {
T im_h = im_info[0];
T im_w = im_info[1];
T im_scale = im_info[2];
int cnt = 0;
__shared__ int keep_index[BlockSize];
CUDA_1D_KERNEL_LOOP(i, num) {
keep_index[threadIdx.x] = -1;
__syncthreads();
int k = i * 5;
T cx = bboxes[k];
T cy = bboxes[k + 1];
T w_s = bboxes[k + 2];
T h_s = bboxes[k + 3];
if (w_s >= min_size && h_s >= min_size) {
keep_index[threadIdx.x] = i;
}
__syncthreads();
if (threadIdx.x == 0) {
int size = (num - i) < BlockSize ? num - i : BlockSize;
for (int j = 0; j < size; ++j) {
if (keep_index[j] > -1) {
keep[cnt++] = keep_index[j];
}
}
}
__syncthreads();
}
if (threadIdx.x == 0) {
keep_num[0] = cnt;
}
}
__device__ inline float trangle_area(float *a, float *b, float *c) {
return ((a[0] - c[0]) * (b[1] - c[1]) - (a[1] - c[1]) * (b[0] - c[0])) / 2.0;
}
__device__ inline float area(float *int_pts, int num_of_inter) {
float area = 0.0;
for (int i = 0; i < num_of_inter - 2; i++) {
area +=
fabs(trangle_area(int_pts, int_pts + 2 * i + 2, int_pts + 2 * i + 4));
}
return area;
}
__device__ inline void reorder_pts(float *int_pts, int num_of_inter) {
if (num_of_inter > 0) {
float center[2] = {0.0, 0.0};
// center[0] = 0.0;
// center[1] = 0.0;
for (int i = 0; i < num_of_inter; i++) {
center[0] += int_pts[2 * i];
center[1] += int_pts[2 * i + 1];
}
center[0] /= num_of_inter;
center[1] /= num_of_inter;
float vs[16];
float v[2];
float d;
for (int i = 0; i < num_of_inter; i++) {
v[0] = int_pts[2 * i] - center[0];
v[1] = int_pts[2 * i + 1] - center[1];
d = sqrt(v[0] * v[0] + v[1] * v[1]);
v[0] = v[0] / d;
v[1] = v[1] / d;
if (v[1] < 0) {
v[0] = -2 - v[0];
}
vs[i] = v[0];
}
float temp, tx, ty;
int j;
for (int i = 1; i < num_of_inter; ++i) {
if (vs[i - 1] > vs[i]) {
temp = vs[i];
tx = int_pts[2 * i];
ty = int_pts[2 * i + 1];
j = i;
while (j > 0 && vs[j - 1] > temp) {
vs[j] = vs[j - 1];
int_pts[j * 2] = int_pts[j * 2 - 2];
int_pts[j * 2 + 1] = int_pts[j * 2 - 1];
j--;
}
vs[j] = temp;
int_pts[j * 2] = tx;
int_pts[j * 2 + 1] = ty;
}
}
}
}
__device__ inline bool inter2line(
float *pts1, float *pts2, int i, int j, float *temp_pts) {
float a[2] = {pts1[2 * i], pts1[2 * i + 1]};
float b[2] = {pts1[2 * ((i + 1) % 4)], pts1[2 * ((i + 1) % 4) + 1]};
float c[2] = {pts2[2 * j], pts2[2 * j + 1]};
float d[2] = {pts2[2 * ((j + 1) % 4)], pts2[2 * ((j + 1) % 4) + 1]};
// T area_abc, area_abd, area_cda, area_cdb;
// a[0] = pts1[2 * i];
// a[1] = pts1[2 * i + 1];
// b[0] = pts1[2 * ((i + 1) % 4)];
// b[1] = pts1[2 * ((i + 1) % 4) + 1];
// c[0] = pts2[2 * j];
// c[1] = pts2[2 * j + 1];
// d[0] = pts2[2 * ((j + 1) % 4)];
// d[1] = pts2[2 * ((j + 1) % 4) + 1];
float area_abc = trangle_area(a, b, c);
float area_abd = trangle_area(a, b, d);
if (area_abc * area_abd >= 0) {
return false;
}
float area_cda = trangle_area(c, d, a);
float area_cdb = area_cda + area_abc - area_abd;
if (area_cda * area_cdb >= 0) {
return false;
}
float t = area_cda / (area_abd - area_abc);
float dx = t * (b[0] - a[0]);
float dy = t * (b[1] - a[1]);
temp_pts[0] = a[0] + dx;
temp_pts[1] = a[1] + dy;
return true;
}
__device__ inline bool in_rect(float pt_x, float pt_y, float *pts) {
float ab[2] = {pts[2] - pts[0], pts[3] - pts[1]};
float ad[2] = {pts[6] - pts[0], pts[7] - pts[1]};
float ap[2] = {pt_x - pts[0], pt_y - pts[1]};
// float abab;
// float abap;
// float adad;
// float adap;
// ab[0] = pts[2] - pts[0];
// ab[1] = pts[3] - pts[1];
//
// ad[0] = pts[6] - pts[0];
// ad[1] = pts[7] - pts[1];
//
// ap[0] = pt_x - pts[0];
// ap[1] = pt_y - pts[1];
float abab = ab[0] * ab[0] + ab[1] * ab[1];
float abap = ab[0] * ap[0] + ab[1] * ap[1];
float adad = ad[0] * ad[0] + ad[1] * ad[1];
float adap = ad[0] * ap[0] + ad[1] * ap[1];
return abab >= abap and abap >= 0 and adad >= adap and adap >= 0;
}
__device__ inline int inter_pts(float *pts1, float *pts2, float *int_pts) {
int num_of_inter = 0;
for (int i = 0; i < 4; i++) {
if (in_rect(pts1[2 * i], pts1[2 * i + 1], pts2)) {
int_pts[num_of_inter * 2] = pts1[2 * i];
int_pts[num_of_inter * 2 + 1] = pts1[2 * i + 1];
num_of_inter++;
}
if (in_rect(pts2[2 * i], pts2[2 * i + 1], pts1)) {
int_pts[num_of_inter * 2] = pts2[2 * i];
int_pts[num_of_inter * 2 + 1] = pts2[2 * i + 1];
num_of_inter++;
}
}
float temp_pts[2];
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
bool has_pts = inter2line(pts1, pts2, i, j, temp_pts);
if (has_pts) {
int_pts[num_of_inter * 2] = temp_pts[0];
int_pts[num_of_inter * 2 + 1] = temp_pts[1];
num_of_inter++;
}
}
}
return num_of_inter;
}
__device__ inline void convert_region(float *pts, const float *region) {
float angle = region[4];
float a_cos = cos(angle / 180.0 * PI);
float a_sin = -sin(angle / 180.0 * PI); // anti clock-wise
float ctr_x = region[0];
float ctr_y = region[1];
float h = region[3];
float w = region[2];
float pts_x[4] = {-w / 2, -w / 2, w / 2, w / 2};
float pts_y[4] = {-h / 2, h / 2, h / 2, -h / 2};
// pts_x[0] = -w / 2;
// pts_x[1] = -w / 2;
// pts_x[2] = w / 2;
// pts_x[3] = w / 2;
//
// pts_y[0] = -h / 2;
// pts_y[1] = h / 2;
// pts_y[2] = h / 2;
// pts_y[3] = -h / 2;
for (int i = 0; i < 4; i++) {
pts[2 * i] = a_cos * pts_x[i] - a_sin * pts_y[i] + ctr_x;
pts[2 * i + 1] = a_sin * pts_x[i] + a_cos * pts_y[i] + ctr_y;
}
}
__device__ inline float inter(const float *region1, const float *region2) {
float pts1[8];
float pts2[8];
float int_pts[16];
int num_of_inter;
convert_region(pts1, region1);
convert_region(pts2, region2);
num_of_inter = inter_pts(pts1, pts2, int_pts);
reorder_pts(int_pts, num_of_inter);
return area(int_pts, num_of_inter);
}
__device__ inline float IoU(const float *region1, const float *region2) {
float area1 = region1[2] * region1[3];
float area2 = region2[2] * region2[3];
float area_inter = inter(region1, region2);
return area_inter / (area1 + area2 - area_inter);
}
static __global__ void RNMSKernel(const int n_boxes,
const float nms_overlap_thresh,
const float *dev_boxes,
uint64_t *dev_mask) {
const int row_start = blockIdx.y;
const int col_start = blockIdx.x;
const int row_size =
min(n_boxes - row_start * kThreadsPerBlock, kThreadsPerBlock);
const int col_size =
min(n_boxes - col_start * kThreadsPerBlock, kThreadsPerBlock);
__shared__ float block_boxes[kThreadsPerBlock * 5];
if (threadIdx.x < col_size) {
block_boxes[threadIdx.x * 5 + 0] =
dev_boxes[(kThreadsPerBlock * col_start + threadIdx.x) * 5 + 0];
block_boxes[threadIdx.x * 5 + 1] =
dev_boxes[(kThreadsPerBlock * col_start + threadIdx.x) * 5 + 1];
block_boxes[threadIdx.x * 5 + 2] =
dev_boxes[(kThreadsPerBlock * col_start + threadIdx.x) * 5 + 2];
block_boxes[threadIdx.x * 5 + 3] =
dev_boxes[(kThreadsPerBlock * col_start + threadIdx.x) * 5 + 3];
block_boxes[threadIdx.x * 5 + 4] =
dev_boxes[(kThreadsPerBlock * col_start + threadIdx.x) * 5 + 4];
}
__syncthreads();
if (threadIdx.x < row_size) {
const int cur_box_idx = kThreadsPerBlock * row_start + threadIdx.x;
const float *cur_box = dev_boxes + cur_box_idx * 5;
int i = 0;
uint64_t t = 0;
int start = 0;
if (row_start == col_start) {
start = threadIdx.x + 1;
}
for (i = start; i < col_size; i++) {
if (IoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) {
t |= 1ULL << i;
}
}
const int col_blocks = DIVUP(n_boxes, kThreadsPerBlock);
dev_mask[cur_box_idx * col_blocks + col_start] = t;
}
}
template <typename T>
static void RNMS(const platform::CUDADeviceContext &ctx,
const Tensor &proposals,
const Tensor &sorted_indices,
const T nms_threshold,
Tensor *keep_out) {
int boxes_num = proposals.dims()[0];
PADDLE_ENFORCE_EQ(boxes_num, sorted_indices.dims()[0]);
const int col_blocks = DIVUP(boxes_num, kThreadsPerBlock);
dim3 blocks(DIVUP(boxes_num, kThreadsPerBlock),
DIVUP(boxes_num, kThreadsPerBlock));
dim3 threads(kThreadsPerBlock);
const T *boxes = proposals.data<T>();
auto place = boost::get<platform::CUDAPlace>(ctx.GetPlace());
framework::Vector<uint64_t> mask(boxes_num * col_blocks);
RNMSKernel<<<blocks, threads>>>(
boxes_num,
nms_threshold,
boxes,
mask.CUDAMutableData(boost::get<platform::CUDAPlace>(ctx.GetPlace())));
std::vector<uint64_t> remv(col_blocks);
memset(&remv[0], 0, sizeof(uint64_t) * col_blocks);
std::vector<int> keep_vec;
int num_to_keep = 0;
for (int i = 0; i < boxes_num; i++) {
int nblock = i / kThreadsPerBlock;
int inblock = i % kThreadsPerBlock;
if (!(remv[nblock] & (1ULL << inblock))) {
++num_to_keep;
keep_vec.push_back(i);
uint64_t *p = &mask[0] + i * col_blocks;
for (int j = nblock; j < col_blocks; j++) {
remv[j] |= p[j];
}
}
}
int *keep = keep_out->mutable_data<int>({num_to_keep}, ctx.GetPlace());
memory::Copy(place,
keep,
platform::CPUPlace(),
keep_vec.data(),
sizeof(int) * num_to_keep,
ctx.stream());
ctx.Wait();
}
template <typename T>
static std::pair<Tensor, Tensor> RRPNProposalForOneImage(
const platform::CUDADeviceContext &ctx,
const Tensor &im_info,
const Tensor &anchors,
const Tensor &variances,
const Tensor &bbox_deltas, // [M, 5]
const Tensor &scores, // [N, 1]
int pre_nms_top_n,
int post_nms_top_n,
float nms_thresh,
float min_size) {
// 1. pre nms
Tensor scores_sort, index_sort;
RSortDescending<T>(ctx, scores, &scores_sort, &index_sort);
int num = scores.numel();
int pre_nms_num = (pre_nms_top_n <= 0 || pre_nms_top_n > num) ? scores.numel()
: pre_nms_top_n;
scores_sort.Resize({pre_nms_num, 1});
index_sort.Resize({pre_nms_num, 1});
// 2. box decode and clipping
Tensor proposals;
proposals.mutable_data<T>({pre_nms_num, 5}, ctx.GetPlace());
{
platform::ForRange<platform::CUDADeviceContext> for_range(ctx, pre_nms_num);
for_range(RBoxDecodeAndClipFunctor<T>{anchors.data<T>(),
bbox_deltas.data<T>(),
variances.data<T>(),
index_sort.data<int>(),
im_info.data<T>(),
proposals.data<T>()});
}
// 3. filter
Tensor keep_index, keep_num_t;
keep_index.mutable_data<int>({pre_nms_num}, ctx.GetPlace());
keep_num_t.mutable_data<int>({1}, ctx.GetPlace());
min_size = std::max(min_size, 0.0f);
auto stream = ctx.stream();
RFilterBBoxes<T, 256><<<1, 256, 0, stream>>>(proposals.data<T>(),
im_info.data<T>(),
min_size,
pre_nms_num,
keep_num_t.data<int>(),
keep_index.data<int>());
int keep_num;
const auto gpu_place = boost::get<platform::CUDAPlace>(ctx.GetPlace());
memory::Copy(platform::CPUPlace(),
&keep_num,
gpu_place,
keep_num_t.data<int>(),
sizeof(int),
ctx.stream());
ctx.Wait();
keep_index.Resize({keep_num});
Tensor scores_filter, proposals_filter;
proposals_filter.mutable_data<T>({keep_num, 5}, ctx.GetPlace());
scores_filter.mutable_data<T>({keep_num, 1}, ctx.GetPlace());
GPUGather<T>(ctx, proposals, keep_index, &proposals_filter);
GPUGather<T>(ctx, scores_sort, keep_index, &scores_filter);
if (nms_thresh <= 0) {
return std::make_pair(proposals_filter, scores_filter);
}
// 4. nms
Tensor keep_nms;
RNMS<T>(ctx, proposals_filter, keep_index, nms_thresh, &keep_nms);
if (post_nms_top_n > 0 && post_nms_top_n < keep_nms.numel()) {
keep_nms.Resize({post_nms_top_n});
}
Tensor scores_nms, proposals_nms;
proposals_nms.mutable_data<T>({keep_nms.numel(), 5}, ctx.GetPlace());
scores_nms.mutable_data<T>({keep_nms.numel(), 1}, ctx.GetPlace());
GPUGather<T>(ctx, proposals_filter, keep_nms, &proposals_nms);
GPUGather<T>(ctx, scores_filter, keep_nms, &scores_nms);
return std::make_pair(proposals_nms, scores_nms);
}
} // namespace
template <typename DeviceContext, typename T>
class CUDARRPNGenerateProposalsKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &context) const override {
auto *scores = context.Input<Tensor>("Scores");
auto *bbox_deltas = context.Input<Tensor>("BboxDeltas");
auto *im_info = context.Input<Tensor>("ImInfo");
auto anchors = detail::Ref(context.Input<Tensor>("Anchors"),
"Cannot find input Anchors(%s) in scope",
context.InputNames("Anchors")[0]);
auto variances = detail::Ref(context.Input<Tensor>("Variances"),
"Cannot find input Variances(%s) in scope",
context.InputNames("Variances")[0]);
auto *rpn_rois = context.Output<LoDTensor>("RpnRois");
auto *rpn_roi_probs = context.Output<LoDTensor>("RpnRoiProbs");
int pre_nms_top_n = context.Attr<int>("pre_nms_topN");
int post_nms_top_n = context.Attr<int>("post_nms_topN");
float nms_thresh = context.Attr<float>("nms_thresh");
float min_size = context.Attr<float>("min_size");
auto &dev_ctx = context.template device_context<DeviceContext>();
auto scores_dim = scores->dims();
int64_t num = scores_dim[0];
int64_t c_score = scores_dim[1];
int64_t h_score = scores_dim[2];
int64_t w_score = scores_dim[3];
auto bbox_dim = bbox_deltas->dims();
int64_t c_bbox = bbox_dim[1];
int64_t h_bbox = bbox_dim[2];
int64_t w_bbox = bbox_dim[3];
Tensor bbox_deltas_swap, scores_swap;
bbox_deltas_swap.mutable_data<T>({num, h_bbox, w_bbox, c_bbox},
dev_ctx.GetPlace());
scores_swap.mutable_data<T>({num, h_score, w_score, c_score},
dev_ctx.GetPlace());
math::Transpose<DeviceContext, T, 4> trans;
std::vector<int> axis = {0, 2, 3, 1};
trans(dev_ctx, *bbox_deltas, &bbox_deltas_swap, axis);
trans(dev_ctx, *scores, &scores_swap, axis);
anchors.Resize({anchors.numel() / 5, 5});
variances.Resize({variances.numel() / 5, 5});
rpn_rois->mutable_data<T>({bbox_deltas->numel() / 5, 5},
context.GetPlace());
rpn_roi_probs->mutable_data<T>({scores->numel(), 1}, context.GetPlace());
T *rpn_rois_data = rpn_rois->data<T>();
T *rpn_roi_probs_data = rpn_roi_probs->data<T>();
auto place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int64_t num_proposals = 0;
std::vector<size_t> offset(1, 0);
for (int64_t i = 0; i < num; ++i) {
Tensor im_info_slice = im_info->Slice(i, i + 1);
Tensor bbox_deltas_slice = bbox_deltas_swap.Slice(i, i + 1);
Tensor scores_slice = scores_swap.Slice(i, i + 1);
bbox_deltas_slice.Resize({h_bbox * w_bbox * c_bbox / 5, 5});
scores_slice.Resize({h_score * w_score * c_score, 1});
// auto* scores_data = scores_slice.data<T>();
// for(int k=0; k < 256; k++) {
// std::cout << scores_data[k] << std::endl;
// }
std::pair<Tensor, Tensor> box_score_pair =
RRPNProposalForOneImage<T>(dev_ctx,
im_info_slice,
anchors,
variances,
bbox_deltas_slice,
scores_slice,
pre_nms_top_n,
post_nms_top_n,
nms_thresh,
min_size);
Tensor &proposals = box_score_pair.first;
Tensor &scores = box_score_pair.second;
memory::Copy(place,
rpn_rois_data + num_proposals * 5,
place,
proposals.data<T>(),
sizeof(T) * proposals.numel(),
dev_ctx.stream());
memory::Copy(place,
rpn_roi_probs_data + num_proposals,
place,
scores.data<T>(),
sizeof(T) * scores.numel(),
dev_ctx.stream());
dev_ctx.Wait();
num_proposals += proposals.dims()[0];
offset.emplace_back(num_proposals);
}
framework::LoD lod;
lod.emplace_back(offset);
rpn_rois->set_lod(lod);
rpn_roi_probs->set_lod(lod);
rpn_rois->Resize({num_proposals, 5});
rpn_roi_probs->Resize({num_proposals, 1});
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(
rrpn_generate_proposals,
ops::CUDARRPNGenerateProposalsKernel<paddle::platform::CUDADeviceContext,
float>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <algorithm>
#include <limits>
#include <memory>
#include "math_function.h"
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
class RRPNRotatedROIAlignOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of Rotated ROIAlignOp should not be null.");
PADDLE_ENFORCE(ctx->HasInput("ROIs"),
"Input(ROIs) of Rotated ROIAlignOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of Rotated ROIAlignOp should not be null.");
auto input_dims = ctx->GetInputDim("X");
auto rois_dims = ctx->GetInputDim("ROIs");
PADDLE_ENFORCE(input_dims.size() == 4,
"The format of input tensor is NCHW.");
PADDLE_ENFORCE(rois_dims.size() == 2,
"ROIs should be a 2-D LoDTensor of shape (num_rois, 5)"
"given as [[x1, y1, x2, y2, theta], ...].");
if (ctx->IsRuntime()) {
PADDLE_ENFORCE(rois_dims[1] == 5,
"ROIs should be a 2-D LoDTensor of shape (num_rois, 5)"
"given as [[x1, y1, x2, y2, theta], ...].");
}
int pooled_height = ctx->Attrs().Get<int>("pooled_height");
int pooled_width = ctx->Attrs().Get<int>("pooled_width");
float spatial_scale = ctx->Attrs().Get<float>("spatial_scale");
PADDLE_ENFORCE_GT(
pooled_height, 0, "The pooled output height must greater than 0");
PADDLE_ENFORCE_GT(
pooled_width, 0, "The pooled output width must greater than 0");
PADDLE_ENFORCE_GT(
spatial_scale, 0.0f, "The spatial scale must greater than 0");
auto out_dims = input_dims;
out_dims[0] = rois_dims[0];
out_dims[1] = input_dims[1];
out_dims[2] = pooled_height;
out_dims[3] = pooled_width;
ctx->SetOutputDim("Out", out_dims);
ctx->SetOutputDim("ConIdX", out_dims);
ctx->SetOutputDim("ConIdY", out_dims);
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<framework::Tensor>("X")->type(),
ctx.device_context());
}
};
class RRPNRotatedROIAlignGradOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")),
"The GRAD@Out of RotatedROIAlignGradOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutputs(framework::GradVarName("X")),
"The GRAD@X of RotatedROIAlignGradOp should not be null.");
ctx->SetOutputsDim(framework::GradVarName("X"), ctx->GetInputsDim("X"));
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<framework::Tensor>("ROIs")->type(),
ctx.device_context());
}
};
class RRPNRotatedROIAlignOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"(Tensor), "
"The input of RRPNRotatedROIAlignOp. The data type is float32 or "
"float64."
"The format of input tensor is NCHW. Where N is batch size, "
"C is the number of input channels, "
"H is the height of the feature, and "
"W is the width of the feature.");
AddInput("ROIs",
"(LoDTensor), "
"ROIs (Regions of Interest) to pool over. "
"should be a 2-D LoDTensor of shape (num_rois, 5)"
"given as [[x, y, w, h, theta], ...]. "
"(x, y) is the center coordinates, and "
"(w, h) is the bottom right coordinates, theta is rotation angle"
"of ROI.");
AddOutput("Out",
"(Tensor), "
"The output of ROIAlignOp is a 4-D tensor with shape "
"(num_rois, channels, pooled_h, pooled_w). The data type is "
"float32 or float64.");
AddOutput("ConIdX",
"(Tensor), "
"index x of affine transform");
AddOutput("ConIdY",
"(Tensor), "
"index y of affine transform");
AddAttr<float>("spatial_scale",
"(float, default 1.0), "
"Multiplicative spatial scale factor "
"to translate ROI coords from their input scale "
"to the scale used when pooling.")
.SetDefault(1.0);
AddAttr<int>("pooled_height",
"(int, default 1), "
"The pooled output height.")
.SetDefault(1);
AddAttr<int>("pooled_width",
"(int, default 1), "
"The pooled output width.")
.SetDefault(1);
AddComment(R"DOC(
**RotatedRoIAlign Operator**
Rotated Region of interest align (also known as Rotated RoI align) is to perform
bilinear interpolation on inputs of nonuniform sizes to obtain
fixed-size feature maps (e.g. 7*7)
Dividing each region proposal into equal-sized sections with
the pooled_width and pooled_height. Location remains the origin
result.
In each ROI bin, the value of the four regularly sampled locations
are computed directly through bilinear interpolation. The output is
the mean of four locations.
Thus avoid the misaligned problem.
)DOC");
}
};
template <typename T>
class RRPNRotatedROIAlignGradMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
protected:
std::unique_ptr<T> Apply() const override {
std::unique_ptr<T> op(new T);
op->SetType("rrpn_rotated_roi_align_grad");
op->SetInput("X", this->Input("X"));
op->SetInput("ROIs", this->Input("ROIs"));
op->SetInput("ConIdX", this->Output("ConIdX"));
op->SetInput("ConIdY", this->Output("ConIdY"));
op->SetInput(framework::GradVarName("Out"), this->OutputGrad("Out"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetAttrMap(this->Attrs());
return op;
}
};
DECLARE_NO_NEED_BUFFER_VARS_INFERENCE(
RRPNRotatedRoiAlignGradNoNeedBufVarsInferer, "X");
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(
rrpn_rotated_roi_align,
ops::RRPNRotatedROIAlignOp,
ops::RRPNRotatedROIAlignOpMaker,
ops::RRPNRotatedROIAlignGradMaker<paddle::framework::OpDesc>,
ops::RRPNRotatedROIAlignGradMaker<paddle::imperative::OpBase>);
REGISTER_OPERATOR(rrpn_rotated_roi_align_grad,
ops::RRPNRotatedROIAlignGradOp,
ops::RRPNRotatedRoiAlignGradNoNeedBufVarsInferer);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Based on
@misc{ma2019rrpn,
author = {Jianqi Ma},
title = {{RRPN in pytorch}},
year = {2019},
howpublished = {\url{https://github.com/mjq11302010044/RRPN_pytorch}},
}
@article{Jianqi17RRPN,
Author = {Jianqi Ma and Weiyuan Shao and Hao Ye and Li Wang and Hong Wang
and Yingbin Zheng and Xiangyang Xue},
Title = {Arbitrary-Oriented Scene Text Detection via Rotation Proposals},
journal = {IEEE Transactions on Multimedia},
volume={20},
number={11},
pages={3111-3122},
year={2018}
}*/
#include <algorithm>
#include <limits>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/memory/memory.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
i += blockDim.x * gridDim.x)
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
static constexpr int kNumCUDAThreads = 512;
static constexpr int kNumMaxinumNumBlocks = 4096;
#define PI 3.141592654
static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaxinumNumBlocks);
}
template <typename T>
__global__ void Zero(T* x, int num) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < num;
i += blockDim.x * gridDim.x) {
x[i] = static_cast<T>(0);
}
}
template <typename T>
__global__ void RROIAlignForward(const int nthreads,
const T* bottom_data,
const T spatial_scale,
int height,
int width,
int channels,
const int pooled_height,
const int pooled_width,
const T* bottom_rois,
int* roi_batch_id_data,
T* top_data,
T* con_idx_x,
T* con_idx_y) {
CUDA_1D_KERNEL_LOOP(index, nthreads) {
int imageWidth = width;
int imageHeight = height;
// (n, c, ph, pw) is an element in the pooled output
int n = index;
int pw = n % pooled_width;
n /= pooled_width;
int ph = n % pooled_height;
n /= pooled_height;
int c = n % channels;
n /= channels;
const T* offset_bottom_rois = bottom_rois + n * 5;
int roi_batch_ind = roi_batch_id_data[n];
T cx = offset_bottom_rois[0];
T cy = offset_bottom_rois[1];
T h = offset_bottom_rois[3];
T w = offset_bottom_rois[2];
T angle = offset_bottom_rois[4] / 180.0 * PI;
// TransformPrepare
T dx = -pooled_width / 2.0;
T dy = -pooled_height / 2.0;
T Sx = w * spatial_scale / pooled_width;
T Sy = h * spatial_scale / pooled_height;
T Alpha = cos(angle);
T Beta = sin(angle);
T Dx = cx * spatial_scale;
T Dy = cy * spatial_scale;
T M[2][3];
M[0][0] = Alpha * Sx;
M[0][1] = Beta * Sy;
M[0][2] = Alpha * Sx * dx + Beta * Sy * dy + Dx;
M[1][0] = -Beta * Sx;
M[1][1] = Alpha * Sy;
M[1][2] = -Beta * Sx * dx + Alpha * Sy * dy + Dy;
T P[8];
P[0] = M[0][0] * pw + M[0][1] * ph + M[0][2];
P[1] = M[1][0] * pw + M[1][1] * ph + M[1][2];
P[2] = M[0][0] * pw + M[0][1] * (ph + 1) + M[0][2];
P[3] = M[1][0] * pw + M[1][1] * (ph + 1) + M[1][2];
P[4] = M[0][0] * (pw + 1) + M[0][1] * ph + M[0][2];
P[5] = M[1][0] * (pw + 1) + M[1][1] * ph + M[1][2];
P[6] = M[0][0] * (pw + 1) + M[0][1] * (ph + 1) + M[0][2];
P[7] = M[1][0] * (pw + 1) + M[1][1] * (ph + 1) + M[1][2];
T leftMost = (max(round(min(min(P[0], P[2]), min(P[4], P[6]))), 0.0));
T rightMost =
(min(round(max(max(P[0], P[2]), max(P[4], P[6]))), imageWidth - 1.0));
T topMost = (max(round(min(min(P[1], P[3]), min(P[5], P[7]))), 0.0));
T bottomMost =
(min(round(max(max(P[1], P[3]), max(P[5], P[7]))), imageHeight - 1.0));
const T* offset_bottom_data =
bottom_data + (roi_batch_ind * channels + c) * height * width;
float bin_cx = (leftMost + rightMost) / 2.0; // shift
float bin_cy = (topMost + bottomMost) / 2.0;
int bin_l = (int)floor(bin_cx);
int bin_r = (int)ceil(bin_cx);
int bin_t = (int)floor(bin_cy);
int bin_b = (int)ceil(bin_cy);
T lt_value = 0.0;
if (bin_t > 0 && bin_l > 0 && bin_t < height && bin_l < width)
lt_value = offset_bottom_data[bin_t * width + bin_l];
T rt_value = 0.0;
if (bin_t > 0 && bin_r > 0 && bin_t < height && bin_r < width)
rt_value = offset_bottom_data[bin_t * width + bin_r];
T lb_value = 0.0;
if (bin_b > 0 && bin_l > 0 && bin_b < height && bin_l < width)
lb_value = offset_bottom_data[bin_b * width + bin_l];
T rb_value = 0.0;
if (bin_b > 0 && bin_r > 0 && bin_b < height && bin_r < width)
rb_value = offset_bottom_data[bin_b * width + bin_r];
T rx = bin_cx - floor(bin_cx);
T ry = bin_cy - floor(bin_cy);
T wlt = (1.0 - rx) * (1.0 - ry);
T wrt = rx * (1.0 - ry);
T wrb = rx * ry;
T wlb = (1.0 - rx) * ry;
T inter_val = 0.0;
inter_val += lt_value * wlt;
inter_val += rt_value * wrt;
inter_val += rb_value * wrb;
inter_val += lb_value * wlb;
platform::CudaAtomicAdd(top_data + index, static_cast<T>(inter_val));
platform::CudaAtomicAdd(con_idx_x + index, static_cast<T>(bin_cx));
platform::CudaAtomicAdd(con_idx_y + index, static_cast<T>(bin_cy));
}
}
template <typename T>
__global__ void RROIAlignBackward(const int nthreads,
const T* top_diff,
const float* con_idx_x,
const float* con_idx_y,
const int num_rois,
const float spatial_scale,
const int height,
const int width,
const int channels,
const int pooled_height,
const int pooled_width,
T* bottom_diff,
const T* bottom_rois,
int* roi_batch_id_data) {
CUDA_1D_KERNEL_LOOP(index, nthreads) {
// (n, c, ph, pw) is an element in the pooled output
int n = index;
n /= pooled_width;
n /= pooled_height;
int c = n % channels;
n /= channels;
const T* offset_bottom_rois = bottom_rois + n * 5;
int roi_batch_ind = roi_batch_id_data[n];
T* offset_bottom_diff =
bottom_diff + (roi_batch_ind * channels + c) * height * width;
float bw = con_idx_x[index];
float bh = con_idx_y[index];
int bin_xs = int(floor(bw));
int bin_ys = int(floor(bh));
float rx = bw - float(bin_xs);
float ry = bh - float(bin_ys);
T wlt = (1.0 - rx) * (1.0 - ry);
T wrt = rx * (1.0 - ry);
T wrb = rx * ry;
T wlb = (1.0 - rx) * ry;
int min_x = (int)floor(bw);
int max_x = (int)ceil(bw);
int min_y = (int)floor(bh);
int max_y = (int)ceil(bh);
T top_diff_of_bin = top_diff[index];
T v1 = wlt * top_diff_of_bin;
T v2 = wrt * top_diff_of_bin;
T v3 = wrb * top_diff_of_bin;
T v4 = wlb * top_diff_of_bin;
if (min_y > 0 && min_x > 0 && min_y < height - 1 && min_x < width - 1)
platform::CudaAtomicAdd(offset_bottom_diff + min_y * width + min_x,
static_cast<T>(v1));
if (min_y > 0 && max_x < width - 1 && min_y < height - 1 && max_x > 0)
platform::CudaAtomicAdd(offset_bottom_diff + min_y * width + max_x,
static_cast<T>(v2));
if (max_y < height - 1 && max_x < width - 1 && max_y > 0 && max_x > 0)
platform::CudaAtomicAdd(offset_bottom_diff + max_y * width + max_x,
static_cast<T>(v3));
if (max_y < height - 1 && min_x > 0 && max_y > 0 && min_x < width - 1)
platform::CudaAtomicAdd(offset_bottom_diff + max_y * width + min_x,
static_cast<T>(v4));
}
}
template <typename Place, typename T>
class RRPNROIAlignRotatedCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<Tensor>("X");
auto* rois = ctx.Input<LoDTensor>("ROIs");
auto* out = ctx.Output<Tensor>("Out");
auto* con_idx_x = ctx.Output<Tensor>("ConIdX");
auto* con_idx_y = ctx.Output<Tensor>("ConIdY");
auto pooled_height = ctx.Attr<int>("pooled_height");
auto pooled_width = ctx.Attr<int>("pooled_width");
auto spatial_scale = ctx.Attr<float>("spatial_scale");
auto in_dims = input->dims();
int batch_size = in_dims[0];
int channels = in_dims[1];
int height = in_dims[2];
int width = in_dims[3];
int rois_num = rois->dims()[0];
if (rois_num == 0) return;
int output_size = out->numel();
int blocks = NumBlocks(output_size);
int threads = kNumCUDAThreads;
Tensor roi_batch_id_list;
roi_batch_id_list.Resize({rois_num});
auto cplace = platform::CPUPlace();
int* roi_batch_id_data = roi_batch_id_list.mutable_data<int>(cplace);
auto lod = rois->lod();
PADDLE_ENFORCE_EQ(
lod.empty(),
false,
"Input(ROIs) Tensor of ROIAlignOp does not contain LoD information.");
auto rois_lod = lod.back();
int rois_batch_size = rois_lod.size() - 1;
PADDLE_ENFORCE_EQ(
rois_batch_size,
batch_size,
"The rois_batch_size and imgs batch_size must be the same.");
int rois_num_with_lod = rois_lod[rois_batch_size];
PADDLE_ENFORCE_EQ(rois_num,
rois_num_with_lod,
"The rois_num from input and lod must be the same.");
for (int n = 0; n < rois_batch_size; ++n) {
for (size_t i = rois_lod[n]; i < rois_lod[n + 1]; ++i) {
roi_batch_id_data[i] = n;
}
}
auto& dev_ctx = ctx.cuda_device_context();
int bytes = roi_batch_id_list.numel() * sizeof(int);
auto roi_ptr = memory::Alloc(dev_ctx, bytes);
int* roi_id_data = reinterpret_cast<int*>(roi_ptr->ptr());
const auto gplace = boost::get<platform::CUDAPlace>(ctx.GetPlace());
memory::Copy(gplace,
roi_id_data,
cplace,
roi_batch_id_data,
bytes,
dev_ctx.stream());
T* out_ = out->mutable_data<T>(ctx.GetPlace());
T* con_idx_x_ = con_idx_x->mutable_data<T>(ctx.GetPlace());
T* con_idx_y_ = con_idx_y->mutable_data<T>(ctx.GetPlace());
int idx_x_num = con_idx_x->numel();
int idx_y_num = con_idx_y->numel();
int out_num = out->numel();
Zero<<<(idx_x_num + 512 - 1) / 512, 512, 0, dev_ctx.stream()>>>(con_idx_x_,
idx_x_num);
Zero<<<(idx_y_num + 512 - 1) / 512, 512, 0, dev_ctx.stream()>>>(con_idx_y_,
idx_y_num);
Zero<<<(out_num + 512 - 1) / 512, 512, 0, dev_ctx.stream()>>>(out_,
out_num);
RROIAlignForward<T><<<blocks, threads, 0, dev_ctx.stream()>>>(
output_size,
input->data<T>(),
spatial_scale,
height,
width,
channels,
pooled_height,
pooled_width,
rois->data<T>(),
roi_id_data,
out_,
con_idx_x_,
con_idx_y_);
}
};
template <typename Place, typename T>
class RRPNROIAlignRotatedGradCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<Tensor>("X");
auto* rois = ctx.Input<LoDTensor>("ROIs");
auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* con_idx_x = ctx.Input<Tensor>("ConIdX");
auto* con_idx_y = ctx.Input<Tensor>("ConIdY");
auto pooled_height = ctx.Attr<int>("pooled_height");
auto pooled_width = ctx.Attr<int>("pooled_width");
auto spatial_scale = ctx.Attr<float>("spatial_scale");
int rois_num = rois->dims()[0];
int channels = input->dims()[1];
int height = input->dims()[2];
int width = input->dims()[3];
if (!in_grad) {
return;
}
Tensor roi_batch_id_list;
roi_batch_id_list.Resize({rois_num});
auto cplace = platform::CPUPlace();
int* roi_batch_id_data = roi_batch_id_list.mutable_data<int>(cplace);
auto rois_lod = rois->lod().back();
int rois_batch_size = rois_lod.size() - 1;
for (int n = 0; n < rois_batch_size; ++n) {
for (size_t i = rois_lod[n]; i < rois_lod[n + 1]; ++i) {
roi_batch_id_data[i] = n;
}
}
auto& dev_ctx = ctx.cuda_device_context();
auto roi_ptr =
memory::Alloc(dev_ctx, roi_batch_id_list.numel() * sizeof(int));
int* roi_id_data = reinterpret_cast<int*>(roi_ptr->ptr());
int bytes = roi_batch_id_list.numel() * sizeof(int);
const auto gplace = boost::get<platform::CUDAPlace>(ctx.GetPlace());
memory::Copy(gplace,
roi_id_data,
cplace,
roi_batch_id_data,
bytes,
dev_ctx.stream());
T* in_grad_ = in_grad->mutable_data<T>(ctx.GetPlace());
int in_grad_num = in_grad->numel();
Zero<<<(in_grad_num + 512 - 1) / 512, 512, 0, dev_ctx.stream()>>>(
in_grad_, in_grad_num);
int output_grad_size = out_grad->numel();
int blocks = NumBlocks(output_grad_size);
int threads = kNumCUDAThreads;
con_idx_x->data<float>();
con_idx_y->data<float>();
out_grad->data<T>();
rois->data<T>();
if (output_grad_size > 0) {
RROIAlignBackward<T><<<blocks, threads, 0, dev_ctx.stream()>>>(
output_grad_size,
out_grad->data<T>(),
con_idx_x->data<float>(),
con_idx_y->data<float>(),
rois_num,
spatial_scale,
height,
width,
channels,
pooled_height,
pooled_width,
in_grad_,
// in_grad->mutable_data<T>(ctx.GetPlace()),
rois->data<T>(),
roi_id_data);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(
rrpn_rotated_roi_align,
ops::RRPNROIAlignRotatedCUDAKernel<paddle::platform::CUDADeviceContext,
float>,
ops::RRPNROIAlignRotatedCUDAKernel<paddle::platform::CUDADeviceContext,
double>);
REGISTER_OP_CUDA_KERNEL(
rrpn_rotated_roi_align_grad,
ops::RRPNROIAlignRotatedGradCUDAKernel<paddle::platform::CUDADeviceContext,
float>,
ops::RRPNROIAlignRotatedGradCUDAKernel<paddle::platform::CUDADeviceContext,
double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <fstream>
#include <iostream>
#include <random>
#include "bbox_util.h"
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
template <typename T,
int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
class RRpnTargetAssignOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Anchor"),
"Input(Anchor) of RRpnTargetAssignOp should not be null");
PADDLE_ENFORCE(ctx->HasInput("GtBoxes"),
"Input(GtBoxes) of RRpnTargetAssignOp should not be null");
PADDLE_ENFORCE(ctx->HasInput("ImInfo"),
"Input(ImInfo) of RRpnTargetAssignOp should not be null");
PADDLE_ENFORCE(
ctx->HasOutput("LocationIndex"),
"Output(LocationIndex) of RRpnTargetAssignOp should not be null");
PADDLE_ENFORCE(
ctx->HasOutput("ScoreIndex"),
"Output(ScoreIndex) of RRpnTargetAssignOp should not be null");
PADDLE_ENFORCE(
ctx->HasOutput("TargetLabel"),
"Output(TargetLabel) of RRpnTargetAssignOp should not be null");
PADDLE_ENFORCE(
ctx->HasOutput("TargetBBox"),
"Output(TargetBBox) of RRpnTargetAssignOp should not be null");
auto anchor_dims = ctx->GetInputDim("Anchor");
auto gt_boxes_dims = ctx->GetInputDim("GtBoxes");
auto im_info_dims = ctx->GetInputDim("ImInfo");
PADDLE_ENFORCE_EQ(
anchor_dims.size(), 2, "The rank of Input(Anchor) must be 2.");
PADDLE_ENFORCE_EQ(
gt_boxes_dims.size(), 2, "The rank of Input(GtBoxes) must be 2.");
PADDLE_ENFORCE_EQ(
im_info_dims.size(), 2, "The rank of Input(ImInfo) must be 2.");
ctx->SetOutputDim("LocationIndex", {-1});
ctx->SetOutputDim("ScoreIndex", {-1});
ctx->SetOutputDim("TargetLabel", {-1, 1});
ctx->SetOutputDim("TargetBBox", {-1, 5});
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<framework::LoDTensor>("Anchor")->type(),
platform::CPUPlace());
}
};
template <typename T>
void AppendRpns(LoDTensor* out, int64_t offset, Tensor* to_add) {
auto* out_data = out->data<T>();
auto* to_add_data = to_add->data<T>();
memcpy(out_data + offset, to_add_data, to_add->numel() * sizeof(T));
}
template <typename T>
std::vector<Tensor> FilterStraddleAnchor(
const platform::CPUDeviceContext& context,
const Tensor* anchor,
const float rpn_straddle_thresh,
T im_height,
T im_width,
int64_t offset) {
std::vector<int> inds_inside;
int anchor_num = anchor->dims()[0];
auto* anchor_data = anchor->data<T>();
if (rpn_straddle_thresh >= 0) {
int index;
for (int i = 0; i < anchor_num; ++i) {
index = i * offset;
if ((anchor_data[index + 0] >= -rpn_straddle_thresh) &&
(anchor_data[index + 1] >= -rpn_straddle_thresh) &&
(anchor_data[index + 2] < im_width + rpn_straddle_thresh) &&
(anchor_data[index + 3] < im_height + rpn_straddle_thresh)) {
inds_inside.emplace_back(i);
}
}
} else {
for (int i = 0; i < anchor_num; ++i) {
inds_inside.emplace_back(i);
}
}
int inside_num = inds_inside.size();
Tensor inds_inside_t;
int* inds_inside_data =
inds_inside_t.mutable_data<int>({inside_num}, context.GetPlace());
std::copy(inds_inside.begin(), inds_inside.end(), inds_inside_data);
Tensor inside_anchor_t;
T* inside_anchor_data =
inside_anchor_t.mutable_data<T>({inside_num, offset}, context.GetPlace());
Gather<T>(anchor->data<T>(),
offset,
inds_inside_data,
inside_num,
inside_anchor_data);
std::vector<Tensor> res;
res.emplace_back(inds_inside_t);
res.emplace_back(inside_anchor_t);
return res;
}
void ReservoirSampling(const int num,
std::vector<int>* inds,
std::minstd_rand engine,
bool use_random) {
std::uniform_real_distribution<float> uniform(0, 1);
size_t len = inds->size();
if (len > static_cast<size_t>(num)) {
if (use_random) {
for (size_t i = num; i < len; ++i) {
int rng_ind = std::floor(uniform(engine) * i);
if (rng_ind < num)
std::iter_swap(inds->begin() + rng_ind, inds->begin() + i);
}
}
inds->resize(num);
}
}
template <typename T>
void RRpnScoreAssign(const T* anchor_by_gt_overlap_data,
const Tensor& anchor_to_gt_max,
const Tensor& gt_to_anchor_max,
const int rpn_batch_size_per_im,
const float rpn_fg_fraction,
const float rpn_positive_overlap,
const float rpn_negative_overlap,
std::vector<int>* fg_inds,
std::vector<int>* bg_inds,
std::vector<int>* tgt_lbl,
std::minstd_rand engine,
bool use_random) {
float epsilon = 0.00000001;
int anchor_num = anchor_to_gt_max.dims()[0];
int gt_num = gt_to_anchor_max.dims()[0];
std::vector<int> target_label(anchor_num, -1);
const T* anchor_to_gt_max_data = anchor_to_gt_max.data<T>();
const T* gt_to_anchor_max_data = gt_to_anchor_max.data<T>();
for (int64_t i = 0; i < anchor_num; ++i) {
bool is_anchors_with_max_overlap = false;
int64_t j = 0;
for (; j < gt_num; ++j) {
T value = anchor_by_gt_overlap_data[i * gt_num + j];
T diff = std::abs(value - gt_to_anchor_max_data[j]);
if (diff < epsilon) {
is_anchors_with_max_overlap = true;
break;
}
}
bool is_anchor_great_than_thresh =
(anchor_to_gt_max_data[i] >= rpn_positive_overlap);
if (is_anchors_with_max_overlap || is_anchor_great_than_thresh) {
fg_inds->emplace_back(i);
target_label[i] = 1;
}
}
// Reservoir Sampling
int fg_num = 0;
if (rpn_fg_fraction > 0 && rpn_batch_size_per_im > 0) {
fg_num = static_cast<int>(rpn_fg_fraction * rpn_batch_size_per_im);
ReservoirSampling(fg_num, fg_inds, engine, use_random);
}
fg_num = static_cast<int>(fg_inds->size());
for (int64_t i = 0; i < anchor_num; ++i) {
if (anchor_to_gt_max_data[i] < rpn_negative_overlap &&
target_label[i] != 1) {
bg_inds->emplace_back(i);
target_label[i] = 0;
}
}
int bg_num = 0;
if (rpn_fg_fraction > 0 && rpn_batch_size_per_im > 0) {
bg_num = rpn_batch_size_per_im - fg_num;
ReservoirSampling(bg_num, bg_inds, engine, use_random);
}
bg_num = static_cast<int>(bg_inds->size());
tgt_lbl->resize(fg_num + bg_num, 0);
std::vector<int> fg_lbl(fg_num, 1);
std::vector<int> bg_lbl(bg_num, 0);
std::copy(fg_lbl.begin(), fg_lbl.end(), tgt_lbl->data());
std::copy(bg_lbl.begin(), bg_lbl.end(), tgt_lbl->data() + fg_num);
}
template <typename T>
std::vector<Tensor> SampleRRpnFgBgGt(const platform::CPUDeviceContext& ctx,
const Tensor& anchor_by_gt_overlap,
const int rpn_batch_size_per_im,
const float rpn_positive_overlap,
const float rpn_negative_overlap,
const float rpn_fg_fraction,
std::minstd_rand engine,
bool use_random) {
auto* anchor_by_gt_overlap_data = anchor_by_gt_overlap.data<T>();
int anchor_num = anchor_by_gt_overlap.dims()[0];
int gt_num = anchor_by_gt_overlap.dims()[1];
std::vector<int> fg_inds;
std::vector<int> bg_inds;
std::vector<int> gt_inds;
std::vector<int> tgt_lbl;
// Calculate the max IoU between anchors and gt boxes
// Map from anchor to gt box that has highest overlap
auto place = ctx.GetPlace();
Tensor anchor_to_gt_max, anchor_to_gt_argmax, gt_to_anchor_max;
anchor_to_gt_max.mutable_data<T>({anchor_num}, place);
int* argmax = anchor_to_gt_argmax.mutable_data<int>({anchor_num}, place);
gt_to_anchor_max.mutable_data<T>({gt_num}, place);
auto anchor_by_gt_overlap_et =
framework::EigenMatrix<T>::From(anchor_by_gt_overlap);
auto anchor_to_gt_max_et =
framework::EigenVector<T>::Flatten(anchor_to_gt_max);
auto gt_to_anchor_max_et =
framework::EigenVector<T>::Flatten(gt_to_anchor_max);
auto anchor_to_gt_argmax_et =
framework::EigenVector<int>::Flatten(anchor_to_gt_argmax);
anchor_to_gt_max_et =
anchor_by_gt_overlap_et.maximum(Eigen::DSizes<int, 1>(1));
anchor_to_gt_argmax_et =
anchor_by_gt_overlap_et.argmax(1).template cast<int>();
gt_to_anchor_max_et =
anchor_by_gt_overlap_et.maximum(Eigen::DSizes<int, 1>(0));
// Follow the Faster RCNN's implementation
RRpnScoreAssign(anchor_by_gt_overlap_data,
anchor_to_gt_max,
gt_to_anchor_max,
rpn_batch_size_per_im,
rpn_fg_fraction,
rpn_positive_overlap,
rpn_negative_overlap,
&fg_inds,
&bg_inds,
&tgt_lbl,
engine,
use_random);
int fg_num = fg_inds.size();
int bg_num = bg_inds.size();
gt_inds.reserve(fg_num);
for (int i = 0; i < fg_num; ++i) {
gt_inds.emplace_back(argmax[fg_inds[i]]);
}
Tensor loc_index_t, score_index_t, tgt_lbl_t, gt_inds_t;
int* loc_index_data = loc_index_t.mutable_data<int>({fg_num}, place);
int* score_index_data =
score_index_t.mutable_data<int>({fg_num + bg_num}, place);
int* tgt_lbl_data = tgt_lbl_t.mutable_data<int>({fg_num + bg_num}, place);
int* gt_inds_data = gt_inds_t.mutable_data<int>({fg_num}, place);
std::copy(fg_inds.begin(), fg_inds.end(), loc_index_data);
std::copy(fg_inds.begin(), fg_inds.end(), score_index_data);
std::copy(bg_inds.begin(), bg_inds.end(), score_index_data + fg_num);
std::copy(tgt_lbl.begin(), tgt_lbl.end(), tgt_lbl_data);
std::copy(gt_inds.begin(), gt_inds.end(), gt_inds_data);
std::vector<Tensor> loc_score_tgtlbl_gt;
loc_score_tgtlbl_gt.emplace_back(loc_index_t);
loc_score_tgtlbl_gt.emplace_back(score_index_t);
loc_score_tgtlbl_gt.emplace_back(tgt_lbl_t);
loc_score_tgtlbl_gt.emplace_back(gt_inds_t);
return loc_score_tgtlbl_gt;
}
template <typename T>
class RRpnTargetAssignKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* anchor = context.Input<Tensor>("Anchor"); // (H*W*A) * 5
auto* gt_boxes = context.Input<LoDTensor>("GtBoxes");
auto* im_info = context.Input<LoDTensor>("ImInfo");
auto* loc_index = context.Output<LoDTensor>("LocationIndex");
auto* score_index = context.Output<LoDTensor>("ScoreIndex");
auto* tgt_bbox = context.Output<LoDTensor>("TargetBBox");
auto* tgt_lbl = context.Output<LoDTensor>("TargetLabel");
PADDLE_ENFORCE_EQ(gt_boxes->lod().size(),
1UL,
"RRpnTargetAssignOp gt_boxes needs 1 level of LoD");
int64_t anchor_num = static_cast<int64_t>(anchor->dims()[0]);
int64_t batch_num = static_cast<int64_t>(gt_boxes->lod().back().size() - 1);
int rpn_batch_size_per_im = context.Attr<int>("rpn_batch_size_per_im");
float rpn_straddle_thresh = context.Attr<float>("rpn_straddle_thresh");
float rpn_positive_overlap = context.Attr<float>("rpn_positive_overlap");
float rpn_negative_overlap = context.Attr<float>("rpn_negative_overlap");
float rpn_fg_fraction = context.Attr<float>("rpn_fg_fraction");
bool use_random = context.Attr<bool>("use_random");
int64_t max_num = batch_num * rpn_batch_size_per_im;
auto place = context.GetPlace();
loc_index->mutable_data<int>({max_num}, place);
score_index->mutable_data<int>({max_num}, place);
tgt_bbox->mutable_data<T>({max_num, 5}, place);
tgt_lbl->mutable_data<int>({max_num, 1}, place);
auto& dev_ctx = context.device_context<platform::CPUDeviceContext>();
std::random_device rnd;
std::minstd_rand engine;
int seed = rnd();
engine.seed(seed);
framework::LoD lod_loc, loc_score;
std::vector<size_t> lod0_loc(1, 0);
std::vector<size_t> lod0_score(1, 0);
int total_loc_num = 0;
int total_score_num = 0;
auto gt_boxes_lod = gt_boxes->lod().back();
for (int i = 0; i < batch_num; ++i) {
Tensor gt_boxes_slice =
gt_boxes->Slice(gt_boxes_lod[i], gt_boxes_lod[i + 1]);
Tensor im_info_slice = im_info->Slice(i, i + 1);
auto* im_info_data = im_info_slice.data<T>();
auto im_height = im_info_data[0];
auto im_width = im_info_data[1];
// auto im_scale = im_info_data[2];
// Filter straddle anchor
std::vector<Tensor> filter_output = FilterStraddleAnchor<T>(
dev_ctx, anchor, rpn_straddle_thresh, im_height, im_width, 5);
Tensor inds_inside = filter_output[0];
Tensor inside_anchor = filter_output[1];
Tensor anchor_by_gt_overlap;
anchor_by_gt_overlap.mutable_data<T>(
{inside_anchor.dims()[0], gt_boxes_slice.dims()[0]}, place);
BboxOverlaps2<T>(inside_anchor, gt_boxes_slice, &anchor_by_gt_overlap);
auto loc_score_tgtlbl_gt = SampleRRpnFgBgGt<T>(dev_ctx,
anchor_by_gt_overlap,
rpn_batch_size_per_im,
rpn_positive_overlap,
rpn_negative_overlap,
rpn_fg_fraction,
engine,
use_random);
Tensor sampled_loc_index = loc_score_tgtlbl_gt[0];
Tensor sampled_score_index = loc_score_tgtlbl_gt[1];
Tensor sampled_tgtlbl = loc_score_tgtlbl_gt[2];
Tensor sampled_gt_index = loc_score_tgtlbl_gt[3];
int loc_num = sampled_loc_index.dims()[0];
int score_num = sampled_score_index.dims()[0];
// unmap to all anchor
Tensor sampled_loc_index_unmap, sampled_score_index_unmap;
sampled_loc_index_unmap.mutable_data<int>({loc_num}, place);
sampled_score_index_unmap.mutable_data<int>({score_num}, place);
Gather<int>(inds_inside.data<int>(),
1,
sampled_loc_index.data<int>(),
loc_num,
sampled_loc_index_unmap.data<int>());
Gather<int>(inds_inside.data<int>(),
1,
sampled_score_index.data<int>(),
score_num,
sampled_score_index_unmap.data<int>());
// get target bbox deltas
Tensor sampled_anchor, sampled_gt, sampled_tgt_bbox;
auto* sampled_anchor_data =
sampled_anchor.mutable_data<T>({loc_num, 5}, place);
auto* sampled_gt_data = sampled_gt.mutable_data<T>({loc_num, 5}, place);
Gather<T>(anchor->data<T>(),
5,
sampled_loc_index_unmap.data<int>(),
loc_num,
sampled_anchor_data);
Gather<T>(gt_boxes_slice.data<T>(),
5,
sampled_gt_index.data<int>(),
loc_num,
sampled_gt_data);
sampled_tgt_bbox.mutable_data<T>({loc_num, 5}, place);
BoxToDelta2<T>(
loc_num, sampled_anchor, sampled_gt, nullptr, &sampled_tgt_bbox);
std::ofstream file_anchor;
// Add anchor offset
int anchor_offset = i * anchor_num;
auto sampled_loc_index_unmap_et =
framework::EigenTensor<int, 1>::From(sampled_loc_index_unmap);
sampled_loc_index_unmap_et = sampled_loc_index_unmap_et + anchor_offset;
auto sampled_score_index_unmap_et =
framework::EigenTensor<int, 1>::From(sampled_score_index_unmap);
sampled_score_index_unmap_et =
sampled_score_index_unmap_et + anchor_offset;
AppendRpns<int>(loc_index, total_loc_num, &sampled_loc_index_unmap);
AppendRpns<int>(score_index, total_score_num, &sampled_score_index_unmap);
AppendRpns<T>(tgt_bbox, total_loc_num * 5, &sampled_tgt_bbox);
AppendRpns<int>(tgt_lbl, total_score_num, &sampled_tgtlbl);
total_loc_num += loc_num;
total_score_num += score_num;
lod0_loc.emplace_back(total_loc_num);
lod0_score.emplace_back(total_score_num);
}
PADDLE_ENFORCE_LE(total_loc_num, max_num);
PADDLE_ENFORCE_LE(total_score_num, max_num);
lod_loc.emplace_back(lod0_loc);
loc_score.emplace_back(lod0_score);
loc_index->set_lod(lod_loc);
score_index->set_lod(loc_score);
tgt_bbox->set_lod(lod_loc);
tgt_lbl->set_lod(loc_score);
loc_index->Resize({total_loc_num});
score_index->Resize({total_score_num});
tgt_bbox->Resize({total_loc_num, 5});
tgt_lbl->Resize({total_score_num, 1});
}
};
class RRpnTargetAssignOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("Anchor",
"(Tensor) input anchor is a 2-D Tensor with shape [H*W*A, 5].");
AddInput("GtBoxes",
"(LoDTensor) input ground-truth bbox with shape [K, 5].");
AddInput("ImInfo",
"(LoDTensor) input image information with shape [N, 3]. "
"N is the batch size, each image information includes height, "
"width and scale.");
AddAttr<int>("rpn_batch_size_per_im",
"Total number of RPN examples per image.")
.SetDefault(256);
AddAttr<float>(
"rpn_straddle_thresh",
"Remove RPN anchors that go outside the image by straddle_thresh "
"pixels, "
"Set to -1 or a large value, e.g. 100000, to disable pruning anchors.");
AddAttr<float>(
"rpn_positive_overlap",
"Minimum overlap required between an anchor and ground-truth "
"box for the (anchor, gt box) pair to be a positive example.")
.SetDefault(0.7);
AddAttr<float>(
"rpn_negative_overlap",
"Maximum overlap allowed between an anchor and ground-truth "
"box for the (anchor, gt box) pair to be a negative examples.")
.SetDefault(0.3);
AddAttr<float>(
"rpn_fg_fraction",
"Target fraction of RoI minibatch that "
"is labeled foreground (i.e. class > 0), 0-th class is background.")
.SetDefault(0.25);
AddAttr<bool>("use_random",
"A flag indicating whether to use a ReservoirSampling. "
"NOTE: DO NOT set this flag to false in training. "
"Setting this flag to false is only useful in unittest.")
.SetDefault(true);
AddOutput(
"LocationIndex",
"(Tensor), The indexes of foreground anchors in all RPN anchors, the "
"shape of the LocationIndex is [F], F depends on the value of input "
"tensor and attributes.");
AddOutput(
"ScoreIndex",
"(Tensor), The indexes of foreground and background anchors in all "
"RPN anchors(The rest anchors are ignored). The shape of the "
"ScoreIndex is [F + B], F and B are sampled foreground and background "
" number.");
AddOutput("TargetBBox",
"(Tensor), The target bbox deltas with shape "
"[F, 5], F is the sampled foreground number.");
AddOutput(
"TargetLabel",
"(Tensor<int>), The target labels of each anchor with shape "
"[F + B, 1], F and B are sampled foreground and background number.");
AddComment(R"DOC(
This operator can be, for a given set of ground truth bboxes and the
anchors, to assign classification and regression targets to each prediction.
The ScoreIndex and LocationIndex will be generated according to the anchor-groundtruth IOU.
The rest anchors would not contibute to the RPN training loss
ScoreIndex is composed of foreground anchor indexes(positive labels) and
background anchor indexes(negative labels). LocationIndex is exactly same
as the foreground anchor indexes since we can not assign regression target to
the background anchors.
The classification targets(TargetLabel) is a binary class label (of being
an object or not). Following the paper of Faster-RCNN, the positive labels
are two kinds of anchors: (i) the anchor/anchors with the highest IoU
overlap with a ground-truth box, or (ii) an anchor that has an IoU overlap
higher than rpn_positive_overlap(0.7) with any ground-truth box. Note that
a single ground-truth box may assign positive labels to multiple anchors.
A non-positive anchor is when its IoU ratio is lower than rpn_negative_overlap
(0.3) for all ground-truth boxes. Anchors that are neither positive nor
negative do not contribute to the training objective.
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(
rrpn_target_assign,
ops::RRpnTargetAssignOp,
ops::RRpnTargetAssignOpMaker,
paddle::framework::EmptyGradOpMaker<paddle::framework::OpDesc>,
paddle::framework::EmptyGradOpMaker<paddle::imperative::OpBase>);
REGISTER_OP_CPU_KERNEL(rrpn_target_assign,
ops::RRpnTargetAssignKernel<float>,
ops::RRpnTargetAssignKernel<double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/fluid/platform/enforce.h"
namespace paddle {
namespace operators {
namespace detail {
/**
* Get Reference From Pointer with check. The error message is printf format,
* and passed by `args`
*/
template <typename T, typename... ARGS>
inline T& Ref(T* ptr, ARGS&&... args) {
PADDLE_ENFORCE_NOT_NULL(ptr, ::paddle::string::Sprintf(args...));
return *ptr;
}
} // namespace detail
} // namespace operators
} // namespace paddle
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Constant
from paddle.fluid.initializer import Normal
from paddle.fluid.initializer import MSRA
from paddle.fluid.regularizer import L2Decay
from config import cfg
from models.ext_op.rrpn_lib import *
class RRPN(object):
def __init__(self,
add_conv_body_func=None,
add_roi_box_head_func=None,
mode='train',
use_pyreader=True,
use_random=True):
self.add_conv_body_func = add_conv_body_func
self.add_roi_box_head_func = add_roi_box_head_func
self.mode = mode
self.use_pyreader = use_pyreader
self.use_random = use_random
def build_model(self, image_shape):
self.build_input(image_shape)
body_conv = self.add_conv_body_func(self.image)
# RPN
self.rpn_heads(body_conv)
# Fast RCNN
self.fast_rcnn_heads(body_conv)
if self.mode != 'train':
self.eval_bbox()
def loss(self):
losses = []
# Fast RCNN loss
loss_cls, loss_bbox = self.fast_rcnn_loss()
# RPN loss
rpn_cls_loss, rpn_reg_loss = self.rpn_loss()
losses = [loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss]
rkeys = ['loss', 'loss_cls', 'loss_bbox', \
'loss_rpn_cls', 'loss_rpn_bbox',]
loss = fluid.layers.sum(losses)
rloss = [loss] + losses
return rloss, rkeys, self.rpn_rois
def eval_bbox_out(self):
return self.pred_result
def build_input(self, image_shape):
if self.use_pyreader:
in_shapes = [[-1] + image_shape, [-1, 5], [-1, 1], [-1, 1],
[-1, 3], [-1, 1]]
lod_levels = [0, 1, 1, 1, 0, 0]
dtypes = [
'float32', 'float32', 'int32', 'int32', 'float32', 'int64'
]
self.py_reader = fluid.layers.py_reader(
capacity=64,
shapes=in_shapes,
lod_levels=lod_levels,
dtypes=dtypes,
use_double_buffer=True)
ins = fluid.layers.read_file(self.py_reader)
self.image = ins[0]
self.gt_box = ins[1]
self.gt_label = ins[2]
self.is_crowd = ins[3]
self.im_info = ins[4]
self.im_id = ins[5]
else:
self.image = fluid.layers.data(
name='image', shape=image_shape, dtype='float32')
self.gt_box = fluid.layers.data(
name='gt_box', shape=[4], dtype='float32', lod_level=1)
self.gt_label = fluid.layers.data(
name='gt_label', shape=[1], dtype='int32', lod_level=1)
self.is_crowd = fluid.layers.data(
name='is_crowd', shape=[1], dtype='int32', lod_level=1)
self.im_info = fluid.layers.data(
name='im_info', shape=[3], dtype='float32')
self.im_id = fluid.layers.data(
name='im_id', shape=[1], dtype='int64')
self.difficult = fluid.layers.data(
name='difficult', shape=[1], dtype='float32', lod_level=1)
def feeds(self):
if self.mode == 'infer':
return [self.image, self.im_info]
if self.mode == 'val':
return [
self.image, self.gt_box, self.gt_label, self.is_crowd,
self.im_info, self.im_id, self.difficult
]
return [
self.image, self.gt_box, self.gt_label, self.is_crowd, self.im_info,
self.im_id
]
def eval_bbox(self):
self.im_scale = fluid.layers.slice(
self.im_info, [1], starts=[2], ends=[3])
im_scale_lod = fluid.layers.sequence_expand(self.im_scale,
self.rpn_rois)
results = []
boxes = self.rpn_rois
cls_prob = fluid.layers.softmax(self.cls_score, use_cudnn=False)
bbox_pred = fluid.layers.reshape(self.bbox_pred, (-1, cfg.class_num, 5))
for i in range(cfg.class_num - 1):
bbox_pred_slice = fluid.layers.slice(
bbox_pred, axes=[1], starts=[i + 1], ends=[i + 2])
bbox_pred_reshape = fluid.layers.reshape(bbox_pred_slice, (-1, 5))
decoded_box = rrpn_box_coder(prior_box=boxes, \
target_box=bbox_pred_reshape, \
prior_box_var=cfg.bbox_reg_weights)
score_slice = fluid.layers.slice(
cls_prob, axes=[1], starts=[i + 1], ends=[i + 2])
score_slice = fluid.layers.reshape(score_slice, shape=[-1, 1])
box_positive = fluid.layers.reshape(decoded_box, shape=[-1, 8])
box_reshape = fluid.layers.reshape(x=box_positive, shape=[1, -1, 8])
score_reshape = fluid.layers.reshape(
x=score_slice, shape=[1, 1, -1])
pred_result = fluid.layers.multiclass_nms(
bboxes=box_reshape,
scores=score_reshape,
score_threshold=cfg.TEST.score_thresh,
nms_top_k=-1,
nms_threshold=cfg.TEST.nms_thresh,
keep_top_k=cfg.TEST.detections_per_im,
normalized=False,
background_label=-1)
result_shape = fluid.layers.shape(pred_result)
res_dimension = fluid.layers.slice(
result_shape, axes=[0], starts=[1], ends=[2])
res_dimension = fluid.layers.reshape(res_dimension, shape=[1, 1])
dimension = fluid.layers.fill_constant(
shape=[1, 1], value=2, dtype='int32')
cond = fluid.layers.less_than(dimension, res_dimension)
res = fluid.layers.create_global_var(
shape=[1, 10], value=0.0, dtype='float32', persistable=False)
with fluid.layers.control_flow.Switch() as switch:
with switch.case(cond):
coordinate = fluid.layers.fill_constant(
shape=[9], value=0.0, dtype='float32')
pred_class = fluid.layers.fill_constant(
shape=[1], value=i + 1, dtype='float32')
add_class = fluid.layers.concat(
[pred_class, coordinate], axis=0)
normal_result = fluid.layers.elementwise_add(pred_result,
add_class)
fluid.layers.assign(normal_result, res)
with switch.default():
normal_result = fluid.layers.fill_constant(
shape=[1, 10], value=-1.0, dtype='float32')
fluid.layers.assign(normal_result, res)
results.append(res)
if len(results) == 1:
self.pred_result = results[0]
return
outs = []
out = fluid.layers.concat(results)
zero = fluid.layers.fill_constant(
shape=[1, 1], value=0.0, dtype='float32')
out_split, _ = fluid.layers.split(out, dim=1, num_or_sections=[1, 9])
out_bool = fluid.layers.greater_than(out_split, zero)
idx = fluid.layers.where(out_bool)
idx_split, _ = fluid.layers.split(idx, dim=1, num_or_sections=[1, 1])
idx = fluid.layers.reshape(idx_split, [-1, 1])
self.pred_result = fluid.layers.gather(input=out, index=idx)
def rpn_heads(self, rpn_input):
# RPN hidden representation
dim_out = rpn_input.shape[1]
rpn_conv = fluid.layers.conv2d(
input=rpn_input,
num_filters=dim_out,
filter_size=3,
stride=1,
padding=1,
act='relu',
name='conv_rpn',
param_attr=ParamAttr(
name="conv_rpn_w", initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.)))
self.anchor, self.var = rotated_anchor_generator(
input=rpn_conv,
anchor_sizes=cfg.anchor_sizes,
aspect_ratios=cfg.aspect_ratios,
angles=cfg.anchor_angle,
variance=cfg.variance,
stride=cfg.rpn_stride,
offset=0.5)
num_anchor = self.anchor.shape[2]
# Proposal classification scores
self.rpn_cls_score = fluid.layers.conv2d(
rpn_conv,
num_filters=num_anchor,
filter_size=1,
stride=1,
padding=0,
act=None,
name='rpn_cls_score',
param_attr=ParamAttr(
name="rpn_cls_logits_w", initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="rpn_cls_logits_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
# Proposal bbox regression deltas
self.rpn_bbox_pred = fluid.layers.conv2d(
rpn_conv,
num_filters=5 * num_anchor,
filter_size=1,
stride=1,
padding=0,
act=None,
name='rpn_bbox_pred',
param_attr=ParamAttr(
name="rpn_bbox_pred_w", initializer=Normal(
loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="rpn_bbox_pred_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
rpn_cls_score_prob = fluid.layers.sigmoid(
self.rpn_cls_score, name='rpn_cls_score_prob')
param_obj = cfg.TRAIN if self.mode == 'train' else cfg.TEST
pre_nms_top_n = param_obj.rpn_pre_nms_top_n
post_nms_top_n = param_obj.rpn_post_nms_top_n
nms_thresh = param_obj.rpn_nms_thresh
min_size = param_obj.rpn_min_size
self.rpn_rois, self.rpn_roi_probs = rotated_generate_proposals(
scores=rpn_cls_score_prob,
bbox_deltas=self.rpn_bbox_pred,
im_info=self.im_info,
anchors=self.anchor,
variances=self.var,
pre_nms_top_n=pre_nms_top_n,
post_nms_top_n=post_nms_top_n,
nms_thresh=param_obj.rpn_nms_thresh,
min_size=param_obj.rpn_min_size)
if self.mode == 'train':
outs = rotated_generate_proposal_labels(
rpn_rois=self.rpn_rois,
gt_classes=self.gt_label,
is_crowd=self.is_crowd,
gt_boxes=self.gt_box,
im_info=self.im_info,
batch_size_per_im=cfg.TRAIN.batch_size_per_im,
fg_fraction=cfg.TRAIN.fg_fractrion,
fg_thresh=cfg.TRAIN.fg_thresh,
bg_thresh_hi=cfg.TRAIN.bg_thresh_hi,
bg_thresh_lo=cfg.TRAIN.bg_thresh_lo,
bbox_reg_weights=cfg.bbox_reg_weights,
class_nums=cfg.class_num,
use_random=self.use_random)
self.rois = outs[0]
self.labels_int32 = outs[1]
self.bbox_targets = outs[2]
self.bbox_inside_weights = outs[3]
self.bbox_outside_weights = outs[4]
def fast_rcnn_heads(self, roi_input):
if self.mode == 'train':
pool_rois = self.rois
else:
pool_rois = self.rpn_rois
pool = rotated_roi_align(
input=roi_input,
rois=pool_rois,
pooled_height=cfg.roi_resolution,
pooled_width=cfg.roi_resolution,
spatial_scale=cfg.spatial_scale)
self.res5_2_sum = self.add_roi_box_head_func(pool)
rcnn_out = fluid.layers.pool2d(
self.res5_2_sum, pool_type='avg', pool_size=7, name='res5_pool')
self.cls_score = fluid.layers.fc(input=rcnn_out,
size=cfg.class_num,
act=None,
name='cls_score',
param_attr=ParamAttr(
name='cls_score_w',
initializer=Normal(
loc=0.0, scale=0.001)),
bias_attr=ParamAttr(
name='cls_score_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
self.bbox_pred = fluid.layers.fc(input=rcnn_out,
size=5 * cfg.class_num,
act=None,
name='bbox_pred',
param_attr=ParamAttr(
name='bbox_pred_w',
initializer=Normal(
loc=0.0, scale=0.01)),
bias_attr=ParamAttr(
name='bbox_pred_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
def fast_rcnn_loss(self):
labels_int64 = fluid.layers.cast(x=self.labels_int32, dtype='int64')
labels_int64.stop_gradient = True
loss_cls = fluid.layers.softmax_with_cross_entropy(
logits=self.cls_score,
label=labels_int64,
numeric_stable_mode=True, )
loss_cls = fluid.layers.reduce_mean(loss_cls)
loss_bbox = fluid.layers.smooth_l1(
x=self.bbox_pred,
y=self.bbox_targets,
inside_weight=self.bbox_inside_weights,
outside_weight=self.bbox_outside_weights,
sigma=1.0)
loss_bbox = fluid.layers.reduce_mean(loss_bbox)
return loss_cls, loss_bbox
def rpn_loss(self):
rpn_cls_score_reshape = fluid.layers.transpose(
self.rpn_cls_score, perm=[0, 2, 3, 1])
rpn_bbox_pred_reshape = fluid.layers.transpose(
self.rpn_bbox_pred, perm=[0, 2, 3, 1])
anchor_reshape = fluid.layers.reshape(self.anchor, shape=(-1, 5))
var_reshape = fluid.layers.reshape(self.var, shape=(-1, 5))
rpn_cls_score_reshape = fluid.layers.reshape(
x=rpn_cls_score_reshape, shape=(0, -1, 1))
rpn_bbox_pred_reshape = fluid.layers.reshape(
x=rpn_bbox_pred_reshape, shape=(0, -1, 5))
score_pred, loc_pred, score_tgt, loc_tgt = \
rrpn_target_assign(
bbox_pred=rpn_bbox_pred_reshape,
cls_logits=rpn_cls_score_reshape,
anchor_box=anchor_reshape,
gt_boxes=self.gt_box,
im_info=self.im_info,
rpn_batch_size_per_im=cfg.TRAIN.rpn_batch_size_per_im,
rpn_straddle_thresh=-1,
rpn_fg_fraction=cfg.TRAIN.rpn_fg_fraction,
rpn_positive_overlap=cfg.TRAIN.rpn_positive_overlap,
rpn_negative_overlap=cfg.TRAIN.rpn_negative_overlap,
use_random=self.use_random)
score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32')
rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
x=score_pred, label=score_tgt)
rpn_cls_loss = fluid.layers.reduce_mean(
rpn_cls_loss, name='loss_rpn_cls')
rpn_reg_loss = fluid.layers.smooth_l1(x=loc_pred, y=loc_tgt, sigma=3.0)
rpn_reg_loss = fluid.layers.reduce_sum(
rpn_reg_loss, name='loss_rpn_bbox')
score_shape = fluid.layers.shape(score_tgt)
score_shape = fluid.layers.cast(x=score_shape, dtype='float32')
norm = fluid.layers.reduce_prod(score_shape)
norm.stop_gradient = True
rpn_reg_loss = rpn_reg_loss / norm
return rpn_cls_loss, rpn_reg_loss
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
class NameAdapter(object):
"""Fix the backbones variable names for pretrained weight"""
def __init__(self, model):
super(NameAdapter, self).__init__()
self.model = model
@property
def model_type(self):
return getattr(self.model, '_model_type', '')
@property
def variant(self):
return getattr(self.model, 'variant', '')
def fix_conv_norm_name(self, name):
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
# the naming rule is same as pretrained weight
if self.model_type == 'SEResNeXt':
bn_name = name + "_bn"
return bn_name
def fix_shortcut_name(self, name):
if self.model_type == 'SEResNeXt':
name = 'conv' + name + '_prj'
return name
def fix_bottleneck_name(self, name):
if self.model_type == 'SEResNeXt':
conv_name1 = 'conv' + name + '_x1'
conv_name2 = 'conv' + name + '_x2'
conv_name3 = 'conv' + name + '_x3'
shortcut_name = name
else:
conv_name1 = name + "_branch2a"
conv_name2 = name + "_branch2b"
conv_name3 = name + "_branch2c"
shortcut_name = name + "_branch1"
return conv_name1, conv_name2, conv_name3, shortcut_name
def fix_layer_warp_name(self, stage_num, count, i):
name = 'res' + str(stage_num)
if count > 10 and stage_num == 4:
if i == 0:
conv_name = name + "a"
else:
conv_name = name + "b" + str(i)
else:
conv_name = name + chr(ord("a") + i)
return conv_name
def fix_c1_stage_name(self):
return "conv1"
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.framework import Variable
from paddle.fluid.regularizer import L2Decay
from paddle.fluid.initializer import Constant
from numbers import Integral
from .name_adapter import NameAdapter
class ResNet(object):
"""
Residual Network, see https://arxiv.org/abs/1512.03385
Args:
depth (int): ResNet depth, should be 18, 34, 50, 101, 152.
freeze_at (int): freeze the backbone at which stage
norm_type (str): normalization type, 'bn'/'sync_bn'/'affine_channel'
freeze_norm (bool): freeze normalization layers
norm_decay (float): weight decay for normalization layer weights
variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
feature_maps (list): index of stages whose feature maps are returned
"""
__shared__ = ['norm_type', 'freeze_norm', 'weight_prefix_name']
def __init__(self,
depth=50,
freeze_at=2,
norm_type='affine_channel',
freeze_norm=True,
norm_decay=0.,
variant='b',
feature_maps=4,
weight_prefix_name=''):
super(ResNet, self).__init__()
if isinstance(feature_maps, Integral):
feature_maps = [feature_maps]
assert depth in [18, 34, 50, 101, 152], \
"depth {} not in [18, 34, 50, 101, 152]"
assert variant in ['a', 'b', 'c', 'd'], "invalid ResNet variant"
assert 0 <= freeze_at <= 4, "freeze_at should be 0, 1, 2, 3 or 4"
assert len(feature_maps) > 0, "need one or more feature maps"
assert norm_type in ['bn', 'sync_bn', 'affine_channel']
self.depth = depth
self.freeze_at = freeze_at
self.norm_type = norm_type
self.norm_decay = norm_decay
self.freeze_norm = freeze_norm
self.variant = variant
self._model_type = 'ResNet'
self.feature_maps = feature_maps
self.depth_cfg = {
18: ([2, 2, 2, 2], self.basicblock),
34: ([3, 4, 6, 3], self.basicblock),
50: ([3, 4, 6, 3], self.bottleneck),
101: ([3, 4, 23, 3], self.bottleneck),
152: ([3, 8, 36, 3], self.bottleneck)
}
self.stage_filters = [64, 128, 256, 512]
self._c1_out_chan_num = 64
self.na = NameAdapter(self)
self.prefix_name = weight_prefix_name
def _conv_offset(self,
input,
filter_size,
stride,
padding,
act=None,
name=None):
out_channel = filter_size * filter_size * 3
out = fluid.layers.conv2d(
input,
num_filters=out_channel,
filter_size=filter_size,
stride=stride,
padding=padding,
param_attr=ParamAttr(
initializer=Constant(0.0), name=name + ".w_0"),
bias_attr=ParamAttr(
initializer=Constant(0.0), name=name + ".b_0"),
act=act,
name=name)
return out
def _conv_norm(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None):
_name = self.prefix_name + name if self.prefix_name != '' else name
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=ParamAttr(name=_name + "_weights"),
bias_attr=False,
name=_name + '.conv2d.output.1')
bn_name = self.na.fix_conv_norm_name(name)
bn_name = self.prefix_name + bn_name if self.prefix_name != '' else bn_name
norm_lr = 0. if self.freeze_norm else 1.
norm_decay = self.norm_decay
pattr = ParamAttr(
name=bn_name + '_scale',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
battr = ParamAttr(
name=bn_name + '_offset',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
if self.norm_type in ['bn', 'sync_bn']:
global_stats = True if self.freeze_norm else False
out = fluid.layers.batch_norm(
input=conv,
act=act,
name=bn_name + '.output.1',
param_attr=pattr,
bias_attr=battr,
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance',
use_global_stats=global_stats)
scale = fluid.framework._get_var(pattr.name)
bias = fluid.framework._get_var(battr.name)
elif self.norm_type == 'affine_channel':
scale = fluid.layers.create_parameter(
shape=[conv.shape[1]],
dtype=conv.dtype,
attr=pattr,
default_initializer=fluid.initializer.Constant(1.))
bias = fluid.layers.create_parameter(
shape=[conv.shape[1]],
dtype=conv.dtype,
attr=battr,
default_initializer=fluid.initializer.Constant(0.))
out = fluid.layers.affine_channel(
x=conv, scale=scale, bias=bias, act=act)
if self.freeze_norm:
scale.stop_gradient = True
bias.stop_gradient = True
return out
def _shortcut(self, input, ch_out, stride, is_first, name):
max_pooling_in_short_cut = self.variant == 'd'
ch_in = input.shape[1]
# the naming rule is same as pretrained weight
name = self.na.fix_shortcut_name(name)
std_senet = getattr(self, 'std_senet', False)
if ch_in != ch_out or stride != 1 or (self.depth < 50 and is_first):
if std_senet:
if is_first:
return self._conv_norm(input, ch_out, 1, stride, name=name)
else:
return self._conv_norm(input, ch_out, 3, stride, name=name)
if max_pooling_in_short_cut and not is_first:
input = fluid.layers.pool2d(
input=input,
pool_size=2,
pool_stride=2,
pool_padding=0,
ceil_mode=True,
pool_type='avg')
return self._conv_norm(input, ch_out, 1, 1, name=name)
return self._conv_norm(input, ch_out, 1, stride, name=name)
else:
return input
def bottleneck(self, input, num_filters, stride, is_first, name):
if self.variant == 'a':
stride1, stride2 = stride, 1
else:
stride1, stride2 = 1, stride
# ResNeXt
groups = getattr(self, 'groups', 1)
group_width = getattr(self, 'group_width', -1)
if groups == 1:
expand = 4
elif (groups * group_width) == 256:
expand = 1
else: # FIXME hard code for now, handles 32x4d, 64x4d and 32x8d
num_filters = num_filters // 2
expand = 2
conv_name1, conv_name2, conv_name3, \
shortcut_name = self.na.fix_bottleneck_name(name)
std_senet = getattr(self, 'std_senet', False)
conv_def = [[num_filters, 1, stride1, 'relu', 1, conv_name1],
[num_filters, 3, stride2, 'relu', groups, conv_name2],
[num_filters * expand, 1, 1, None, 1, conv_name3]]
residual = input
for i, (c, k, s, act, g, _name) in enumerate(conv_def):
residual = self._conv_norm(
input=residual,
num_filters=c,
filter_size=k,
stride=s,
act=act,
groups=g,
name=_name)
short = self._shortcut(
input,
num_filters * expand,
stride,
is_first=is_first,
name=shortcut_name)
return fluid.layers.elementwise_add(
x=short, y=residual, act='relu', name=name + ".add.output.5")
def basicblock(self, input, num_filters, stride, is_first, name): #,
conv0 = self._conv_norm(
input=input,
num_filters=num_filters,
filter_size=3,
act='relu',
stride=stride,
name=name + "_branch2a")
conv1 = self._conv_norm(
input=conv0,
num_filters=num_filters,
filter_size=3,
act=None,
name=name + "_branch2b")
short = self._shortcut(
input, num_filters, stride, is_first, name=name + "_branch1")
return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
def layer_warp(self, input, stage_num):
"""
Args:
input (Variable): input variable.
stage_num (int): the stage number, should be 2, 3, 4, 5
Returns:
The last variable in endpoint-th stage.
"""
assert stage_num in [2, 3, 4, 5]
stages, block_func = self.depth_cfg[self.depth]
count = stages[stage_num - 2]
ch_out = self.stage_filters[stage_num - 2]
is_first = False if stage_num != 2 else True
# Make the layer name and parameter name consistent
# with ImageNet pre-trained model
conv = input
for i in range(count):
conv_name = self.na.fix_layer_warp_name(stage_num, count, i)
if self.depth < 50:
is_first = True if i == 0 and stage_num == 2 else False
conv = block_func(
input=conv,
num_filters=ch_out,
stride=2 if i == 0 and stage_num != 2 else 1,
is_first=is_first,
name=conv_name)
return conv
def c1_stage(self, input):
out_chan = self._c1_out_chan_num
conv1_name = self.na.fix_c1_stage_name()
if self.variant in ['c', 'd']:
conv_def = [
[out_chan // 2, 3, 2, "conv1_1"],
[out_chan // 2, 3, 1, "conv1_2"],
[out_chan, 3, 1, "conv1_3"],
]
else:
conv_def = [[out_chan, 7, 2, conv1_name]]
for (c, k, s, _name) in conv_def:
input = self._conv_norm(
input=input,
num_filters=c,
filter_size=k,
stride=s,
act='relu',
name=_name)
output = fluid.layers.pool2d(
input=input,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
return output
def __call__(self, input):
assert isinstance(input, Variable)
assert not (set(self.feature_maps) - set([2, 3, 4, 5])), \
"feature maps {} not in [2, 3, 4, 5]".format(self.feature_maps)
res_endpoints = []
res = input
feature_maps = self.feature_maps
severed_head = getattr(self, 'severed_head', False)
if not severed_head:
res = self.c1_stage(res)
feature_maps = range(2, max(self.feature_maps) + 1)
for i in feature_maps:
res = self.layer_warp(res, i)
if i in self.feature_maps:
res_endpoints.append(res)
if self.freeze_at >= i:
res.stop_gradient = True
return res
class ResNetC5(ResNet):
__doc__ = ResNet.__doc__
def __init__(self,
depth=50,
freeze_at=2,
norm_type='affine_channel',
freeze_norm=True,
norm_decay=0.,
variant='b',
feature_maps=[5],
weight_prefix_name=''):
super(ResNetC5, self).__init__(depth, freeze_at, norm_type, freeze_norm,
norm_decay, variant, feature_maps)
self.severed_head = True
# Download the data.
echo "Downloading..."
wget https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar --no-check-certificate
echo "Extracting..."
tar -xf ResNet50_cos_pretrained.tar
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
import numpy as np
import xml.etree.ElementTree
import os
import time
import copy
import six
import cv2
import math
import paddle
from collections import deque
import data_utils
from roidbs import ICDAR2015Dataset, ICDAR2017Dataset
from config import cfg
from PIL import Image
from data_utils import _resize
num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
np.random.seed(10)
def roidb_reader(roidb, mode):
im, im_scales, gt_boxes, gt_classes = data_utils.get_image_blob(roidb, mode)
im_id = roidb['im_id']
is_crowd = roidb['is_crowd']
im_height = np.round(roidb['height'] * im_scales)
im_width = np.round(roidb['width'] * im_scales)
is_difficult = roidb['is_difficult']
im_info = np.array([im_height, im_width, im_scales], dtype=np.float32)
if mode == 'val':
return im, gt_boxes, gt_classes, is_crowd, im_info, im_id, is_difficult
outs = (im, gt_boxes, gt_classes, is_crowd, im_info, im_id)
return outs
def RRPNData(mode,
batch_size=None,
total_batch_size=None,
padding_total=False,
shuffle=False,
shuffle_seed=None): #,
#roidbs=None):
total_batch_size = total_batch_size if total_batch_size else batch_size
assert total_batch_size % batch_size == 0
if cfg.dataset == "icdar2015":
icdar2015_dataset = ICDAR2015Dataset(mode)
roidbs = icdar2015_dataset.get_roidb()
else:
icdar2017_dataset = ICDAR2017Dataset(mode)
roidbs = icdar2017_dataset.get_roidb()
print("{} on {} with {} roidbs".format(mode, cfg.dataset, len(roidbs)))
def reader():
if mode == "train":
if shuffle:
if shuffle_seed is not None:
np.random.seed(shuffle_seed)
roidb_perm = deque(np.random.permutation(roidbs))
else:
roidb_perm = deque(roidbs)
roidb_cur = 0
count = 0
batch_out = []
device_num = total_batch_size / batch_size
while True:
start = time.time()
roidb = roidb_perm[0]
roidb_cur += 1
roidb_perm.rotate(-1)
if roidb_cur >= len(roidbs):
if shuffle:
roidb_perm = deque(np.random.permutation(roidbs))
else:
roidb_perm = deque(roidbs)
roidb_cur = 0
# im, gt_boxes, gt_classes, is_crowd, im_info, im_id, gt_masks
datas = roidb_reader(roidb, mode)
if datas[1].shape[0] == 0:
continue
batch_out.append(datas)
end = time.time()
#print('reader time:', end - start)
if len(batch_out) == batch_size:
yield batch_out
count += 1
batch_out = []
iter_id = count // device_num
if iter_id >= cfg.max_iter * num_trainers:
return
elif mode == "val":
batch_out = []
for roidb in roidbs:
im, gt_boxes, gt_classes, is_crowd, im_info, im_id, is_difficult = roidb_reader(
roidb, mode)
batch_out.append((im, gt_boxes, gt_classes, is_crowd, im_info,
im_id, is_difficult))
if len(batch_out) == batch_size:
yield batch_out
batch_out = []
if len(batch_out) != 0:
yield batch_out
return reader
def train(batch_size,
total_batch_size=None,
padding_total=False,
num_workers=20,
shuffle=True,
shuffle_seed=None):
return RRPNData(
'train',
batch_size,
total_batch_size,
padding_total,
shuffle=shuffle,
shuffle_seed=shuffle_seed)
def test(batch_size, total_batch_size=None, padding_total=False):
return RRPNData('val', batch_size, total_batch_size, shuffle=False)
def infer(file_path):
def reader():
imgs = os.listdir(file_path)
imgs.sort()
for image in imgs:
if not os.path.exists(file_path):
raise ValueError("Image path [%s] does not exist." %
(file_path))
with open(os.path.join(file_path, image), 'rb') as f:
data = f.read()
data = np.frombuffer(data, dtype='uint8')
img = cv2.imdecode(data, 1)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img, im_scale = _resize(img, target_size=1000, max_size=1778)
img = img.astype(np.float32, copy=False)
img = img / 255.0
mean = np.array(cfg.pixel_means)[np.newaxis, np.newaxis, :]
std = np.array(cfg.pixel_std)[np.newaxis, np.newaxis, :]
img -= mean
img /= std
img = img.transpose((2, 0, 1))
h = img.shape[1]
w = img.shape[2]
im_info = np.array([h, w, im_scale], dtype=np.float32)
yield [(img, im_info)]
return reader
if __name__ == '__main__':
from utility import parse_args
args = parse_args()
train_reader = train(1, shuffle=True)
import time
time0 = time.time()
for iter_id, data in enumerate(train_reader()):
print('iter:', iter_id)
print('cost:', time.time() - time0)
time0 = time.time()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Based on:
# --------------------------------------------------------
# Detectron
# Copyright (c) 2017-present, Facebook, Inc.
# Licensed under the Apache License, Version 2.0;
# Written by Ross Girshick
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import copy
import logging
import numpy as np
import os
import scipy.sparse
import random
import time
import matplotlib
import cv2
#import segm_utils
from config import cfg
from data_utils import DatasetPath
logger = logging.getLogger(__name__)
class ICDAR2015Dataset(object):
"""A class representing a ICDAR2015 dataset."""
def __init__(self, mode):
print('Creating: {}'.format(cfg.dataset))
self.name = cfg.data_dir
self.mode = mode
data_path = DatasetPath(mode, self.name)
data_dir = data_path.get_data_dir()
file_list = data_path.get_file_list()
self.image_dir = data_dir
self.gt_dir = file_list
def get_roidb(self):
"""Return an roidb corresponding to the txt dataset. Optionally:
- include ground truth boxes in the roidb
"""
image_list = os.listdir(self.image_dir)
image_list.sort()
im_infos = []
count = 0
for image in image_list:
prefix = image[:-4]
if image.split('.')[-1] != 'jpg':
continue
img_name = os.path.join(self.image_dir, image)
gt_name = os.path.join(self.gt_dir, 'gt_' + prefix + '.txt')
easy_boxes = []
hard_boxes = []
boxes = []
gt_obj = open(gt_name, 'r', encoding='UTF-8-sig')
gt_txt = gt_obj.read()
gt_split = gt_txt.split('\n')
img = cv2.imread(img_name)
f = False
for gt_line in gt_split:
gt_ind = gt_line.split(',')
# can get the text information
if len(gt_ind) > 3 and '###' not in gt_ind[8]:
pt1 = (int(gt_ind[0]), int(gt_ind[1]))
pt2 = (int(gt_ind[2]), int(gt_ind[3]))
pt3 = (int(gt_ind[4]), int(gt_ind[5]))
pt4 = (int(gt_ind[6]), int(gt_ind[7]))
edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (
pt1[1] - pt2[1]) * (pt1[1] - pt2[1]))
edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (
pt2[1] - pt3[1]) * (pt2[1] - pt3[1]))
angle = 0
if edge1 > edge2:
width = edge1
height = edge2
if pt1[0] - pt2[0] != 0:
angle = -np.arctan(
float(pt1[1] - pt2[1]) /
float(pt1[0] - pt2[0])) / np.pi * 180
else:
angle = 90.0
elif edge2 >= edge1:
width = edge2
height = edge1
# print pt2[0], pt3[0]
if pt2[0] - pt3[0] != 0:
angle = -np.arctan(
float(pt2[1] - pt3[1]) /
float(pt2[0] - pt3[0])) / np.pi * 180
else:
angle = 90.0
if angle < -45.0:
angle = angle + 180
x_ctr = float(pt1[0] + pt3[
0]) / 2 # pt1[0] + np.abs(float(pt1[0] - pt3[0])) / 2
y_ctr = float(pt1[1] + pt3[
1]) / 2 # pt1[1] + np.abs(float(pt1[1] - pt3[1])) / 2
if self.mode == 'val':
easy_boxes.append(
list(np.array([pt1, pt2, pt3, pt4]).reshape(8)))
else:
easy_boxes.append([x_ctr, y_ctr, width, height, angle])
# can‘t get the text information
if len(gt_ind) > 3 and '###' in gt_ind[8]:
pt1 = (int(gt_ind[0]), int(gt_ind[1]))
pt2 = (int(gt_ind[2]), int(gt_ind[3]))
pt3 = (int(gt_ind[4]), int(gt_ind[5]))
pt4 = (int(gt_ind[6]), int(gt_ind[7]))
edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (
pt1[1] - pt2[1]) * (pt1[1] - pt2[1]))
edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (
pt2[1] - pt3[1]) * (pt2[1] - pt3[1]))
angle = 0
if edge1 > edge2:
width = edge1
height = edge2
if pt1[0] - pt2[0] != 0:
angle = -np.arctan(
float(pt1[1] - pt2[1]) /
float(pt1[0] - pt2[0])) / np.pi * 180
else:
angle = 90.0
elif edge2 >= edge1:
width = edge2
height = edge1
if pt2[0] - pt3[0] != 0:
angle = -np.arctan(
float(pt2[1] - pt3[1]) /
float(pt2[0] - pt3[0])) / np.pi * 180
else:
angle = 90.0
if angle < -45.0:
angle = angle + 180
x_ctr = float(pt1[0] + pt3[
0]) / 2 # pt1[0] + np.abs(float(pt1[0] - pt3[0])) / 2
y_ctr = float(pt1[1] + pt3[
1]) / 2 # pt1[1] + np.abs(float(pt1[1] - pt3[1])) / 2
if self.mode == 'val':
hard_boxes.append(
list(np.array([pt1, pt2, pt3, pt4]).reshape(8)))
else:
hard_boxes.append([x_ctr, y_ctr, width, height, angle])
#print(easy_boxes)
if self.mode == 'train':
boxes.extend(easy_boxes)
# hard box only get 1/3 for train
boxes.extend(hard_boxes[0:int(len(hard_boxes) / 3)])
is_difficult = [0] * len(easy_boxes)
is_difficult.extend([1] * int(len(hard_boxes) / 3))
else:
boxes.extend(easy_boxes)
boxes.extend(hard_boxes)
is_difficult = [0] * len(easy_boxes)
is_difficult.extend([1] * int(len(hard_boxes)))
len_of_bboxes = len(boxes)
#is_difficult = [0] * len(easy_boxes)
#is_difficult.extend([1] * int(len(hard_boxes)))
is_difficult = np.array(is_difficult).reshape(
1, len_of_bboxes).astype(np.int32)
if self.mode == 'train':
gt_boxes = np.zeros((len_of_bboxes, 5), dtype=np.int32)
else:
gt_boxes = np.zeros((len_of_bboxes, 8), dtype=np.int32)
gt_classes = np.zeros((len_of_bboxes), dtype=np.int32)
is_crowd = np.zeros((len_of_bboxes), dtype=np.int32)
for idx in range(len(boxes)):
if self.mode == 'train':
gt_boxes[idx, :] = [
boxes[idx][0], boxes[idx][1], boxes[idx][2],
boxes[idx][3], boxes[idx][4]
]
else:
gt_boxes[idx, :] = [
boxes[idx][0], boxes[idx][1], boxes[idx][2],
boxes[idx][3], boxes[idx][4], boxes[idx][5],
boxes[idx][6], boxes[idx][7]
]
gt_classes[idx] = 1
if gt_boxes.shape[0] <= 0:
continue
gt_boxes = gt_boxes.astype(np.float64)
im_info = {
'im_id': count,
'gt_classes': gt_classes,
'image': img_name,
'boxes': gt_boxes,
'height': img.shape[0],
'width': img.shape[1],
'is_crowd': is_crowd,
'is_difficult': is_difficult
}
im_infos.append(im_info)
count += 1
return im_infos
class ICDAR2017Dataset(object):
"""A class representing a ICDAR2017 dataset."""
def __init__(self, mode):
print('Creating: {}'.format(cfg.dataset))
self.name = cfg.data_dir
#print('**************', self.name)
self.mode = mode
data_path = DatasetPath(mode, self.name)
data_dir = data_path.get_data_dir()
#print("&**************", data_dir)
file_list = data_path.get_file_list()
self.image_dir = data_dir
self.gt_dir = file_list
def get_roidb(self):
"""Return an roidb corresponding to the json dataset. Optionally:
- include ground truth boxes in the roidb
"""
image_list = os.listdir(self.image_dir)
image_list.sort()
im_infos = []
count = 0
class_idx = 1
class_name = {}
post_fix = ['jpg', 'bmp', 'png']
if self.mode == 'val':
labels_map = get_labels_maps()
for image in image_list:
prefix = image[:-4]
#print(image)
if image.split('.')[-1] not in post_fix:
continue
img_name = os.path.join(self.image_dir, image)
gt_name = os.path.join(self.gt_dir, 'gt_' + prefix + '.txt')
gt_classes = []
#boxes = []
#hard_boxes = []
boxes = []
gt_obj = open(gt_name, 'r', encoding='UTF-8-sig')
gt_txt = gt_obj.read()
gt_split = gt_txt.split('\n')
img = cv2.imread(img_name)
f = False
for gt_line in gt_split:
gt_ind = gt_line.split(',')
# can get the text information
if len(gt_ind) > 3:
if self.mode == 'val':
gt_classes.append(labels_map[gt_ind[-1]])
else:
if gt_ind[-1] not in class_name:
class_name[gt_ind[-1]] = class_idx
#gt_classes.append(class_idx)
class_idx += 1
gt_classes.append(class_name[gt_ind[-1]])
pt1 = (int(gt_ind[0]), int(gt_ind[1]))
pt2 = (int(gt_ind[2]), int(gt_ind[3]))
pt3 = (int(gt_ind[4]), int(gt_ind[5]))
pt4 = (int(gt_ind[6]), int(gt_ind[7]))
edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (
pt1[1] - pt2[1]) * (pt1[1] - pt2[1]))
edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (
pt2[1] - pt3[1]) * (pt2[1] - pt3[1]))
angle = 0
if edge1 > edge2:
width = edge1
height = edge2
if pt1[0] - pt2[0] != 0:
angle = -np.arctan(
float(pt1[1] - pt2[1]) /
float(pt1[0] - pt2[0])) / np.pi * 180
else:
angle = 90.0
elif edge2 >= edge1:
width = edge2
height = edge1
# print pt2[0], pt3[0]
if pt2[0] - pt3[0] != 0:
angle = -np.arctan(
float(pt2[1] - pt3[1]) /
float(pt2[0] - pt3[0])) / np.pi * 180
else:
angle = 90.0
if angle < -45.0:
angle = angle + 180
x_ctr = float(pt1[0] + pt3[
0]) / 2 # pt1[0] + np.abs(float(pt1[0] - pt3[0])) / 2
y_ctr = float(pt1[1] + pt3[
1]) / 2 # pt1[1] + np.abs(float(pt1[1] - pt3[1])) / 2
if self.mode == 'val':
boxes.append(
list(np.array([pt1, pt2, pt3, pt4]).reshape(8)))
else:
boxes.append([x_ctr, y_ctr, width, height, angle])
len_of_bboxes = len(boxes)
#print(len_of_bboxes)
is_difficult = np.zeros((len_of_bboxes, 1), dtype=np.int32)
if self.mode == 'train':
gt_boxes = np.zeros((len_of_bboxes, 5), dtype=np.int32)
else:
gt_boxes = np.zeros((len_of_bboxes, 8), dtype=np.int32)
gt_classes = np.array(gt_classes).reshape(len_of_bboxes, 1)
is_crowd = np.zeros((len_of_bboxes), dtype=np.int32)
for idx in range(len(boxes)):
if self.mode == 'train':
gt_boxes[idx, :] = [
boxes[idx][0], boxes[idx][1], boxes[idx][2],
boxes[idx][3], boxes[idx][4]
]
else:
gt_boxes[idx, :] = [
boxes[idx][0], boxes[idx][1], boxes[idx][2],
boxes[idx][3], boxes[idx][4], boxes[idx][5],
boxes[idx][6], boxes[idx][7]
]
#gt_classes[idx] = 1
if gt_boxes.shape[0] <= 0:
continue
gt_boxes = gt_boxes.astype(np.float64)
im_info = {
'im_id': count,
'gt_classes': gt_classes,
'image': img_name,
'boxes': gt_boxes,
'height': img.shape[0],
'width': img.shape[1],
'is_crowd': is_crowd,
'is_difficult': is_difficult
}
im_infos.append(im_info)
count += 1
if self.mode == 'train':
with open(os.path.join(cfg.data_dir, 'label_list'), 'w') as g:
for k in class_name:
g.write(k + "\n")
return im_infos
def get_labels_maps():
labels_map = {}
with open(os.path.join(cfg.data_dir, 'label_list')) as f:
lines = f.readlines()
for idx, line in enumerate(lines):
labels_map[line.strip()] = idx + 1
return labels_map
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
def set_paddle_flags(flags):
for key, value in flags.items():
if os.environ.get(key, None) is None:
os.environ[key] = str(value)
set_paddle_flags({
'FLAGS_conv_workspace_size_limit': 500,
'FLAGS_eager_delete_tensor_gb': 0, # enable gc
'FLAGS_memory_fraction_of_eager_deletion': 1,
'FLAGS_fraction_of_gpu_memory_to_use': 0.98
})
import sys
import numpy as np
import time
import shutil
import collections
import paddle
import paddle.fluid as fluid
import reader
import models.model_builder as model_builder
import models.resnet as resnet
import checkpoint as checkpoint
from config import cfg
from utility import parse_args, print_arguments, SmoothedValue, TrainingStats, now_time, check_gpu
num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
def get_device_num():
# NOTE(zcd): for multi-processe training, each process use one GPU card.
if num_trainers > 1:
return 1
return fluid.core.get_cuda_device_count()
def train():
learning_rate = cfg.learning_rate
image_shape = [3, cfg.TRAIN.max_size, cfg.TRAIN.max_size]
devices_num = get_device_num()
total_batch_size = devices_num * cfg.TRAIN.im_per_batch
use_random = True
startup_prog = fluid.Program()
train_prog = fluid.Program()
with fluid.program_guard(train_prog, startup_prog):
with fluid.unique_name.guard():
model = model_builder.RRPN(
add_conv_body_func=resnet.ResNet(),
add_roi_box_head_func=resnet.ResNetC5(),
use_pyreader=cfg.use_pyreader,
use_random=use_random)
model.build_model(image_shape)
losses, keys, rpn_rois = model.loss()
loss = losses[0]
fetch_list = losses
boundaries = cfg.lr_steps
gamma = cfg.lr_gamma
step_num = len(cfg.lr_steps)
values = [learning_rate * (gamma**i) for i in range(step_num + 1)]
start_lr = learning_rate * cfg.start_factor
lr = fluid.layers.piecewise_decay(boundaries, values)
lr = fluid.layers.linear_lr_warmup(lr, cfg.warm_up_iter, start_lr,
learning_rate)
optimizer = fluid.optimizer.Momentum(
learning_rate=lr,
regularization=fluid.regularizer.L2Decay(cfg.weight_decay),
momentum=cfg.momentum)
optimizer.minimize(loss)
fetch_list = fetch_list + [lr]
for var in fetch_list:
var.persistable = True
gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
place = fluid.CUDAPlace(gpu_id) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_all_optimizer_ops = False
build_strategy.fuse_elewise_add_act_ops = True
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_iteration_per_drop_scope = 1
exe.run(startup_prog)
if cfg.pretrained_model:
checkpoint.load_and_fusebn(exe, train_prog, cfg.pretrained_model)
compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
loss_name=loss.name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
shuffle = True
shuffle_seed = None
if num_trainers > 1:
shuffle_seed = 1
if cfg.use_pyreader:
train_reader = reader.train(
batch_size=cfg.TRAIN.im_per_batch,
total_batch_size=total_batch_size,
padding_total=cfg.TRAIN.padding_minibatch,
shuffle=shuffle,
shuffle_seed=shuffle_seed)
if num_trainers > 1:
assert shuffle_seed is not None, \
"If num_trainers > 1, the shuffle_seed must be set, because " \
"the order of batch data generated by reader " \
"must be the same in the respective processes."
# NOTE: the order of batch data generated by batch_reader
# must be the same in the respective processes.
if num_trainers > 1:
train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader)
py_reader = model.py_reader
py_reader.decorate_paddle_reader(train_reader)
else:
if num_trainers > 1: shuffle = False
train_reader = reader.train(
batch_size=total_batch_size, shuffle=shuffle)
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
def train_loop_pyreader():
py_reader.start()
train_stats = TrainingStats(cfg.log_window, keys)
try:
start_time = time.time()
prev_start_time = start_time
for iter_id in range(cfg.max_iter):
prev_start_time = start_time
start_time = time.time()
outs = exe.run(compiled_train_prog,
fetch_list=[v.name for v in fetch_list])
stats = {k: np.array(v).mean() for k, v in zip(keys, outs[:-1])}
train_stats.update(stats)
logs = train_stats.log()
if iter_id % 10 == 0:
strs = '{}, iter: {}, lr: {:.5f}, {}, time: {:.3f}'.format(
now_time(), iter_id,
np.mean(outs[-1]), logs, start_time - prev_start_time)
print(strs)
sys.stdout.flush()
if (iter_id) % cfg.TRAIN.snapshot_iter == 0 and iter_id != 0:
save_name = "{}".format(iter_id)
checkpoint.save(exe, train_prog,
os.path.join(cfg.model_save_dir, save_name))
if (iter_id) == cfg.max_iter:
checkpoint.save(
exe, train_prog,
os.path.join(cfg.model_save_dir, "model_final"))
break
end_time = time.time()
total_time = end_time - start_time
last_loss = np.array(outs[0]).mean()
except (StopIteration, fluid.core.EOFException):
py_reader.reset()
def train_loop():
start_time = time.time()
prev_start_time = start_time
start = start_time
train_stats = TrainingStats(cfg.log_window, keys)
for iter_id, data in enumerate(train_reader()):
prev_start_time = start_time
start_time = time.time()
if data[0][1].shape[0] == 0:
continue
outs = exe.run(compiled_train_prog,
fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(data))
stats = {k: np.array(v).mean() for k, v in zip(keys, outs[:-1])}
train_stats.update(stats)
logs = train_stats.log()
if iter_id % 10 == 0:
strs = '{}, iter: {}, lr: {:.5f}, {}, time: {:.3f}'.format(
now_time(), iter_id,
np.mean(outs[-1]), logs, start_time - prev_start_time)
print(strs)
sys.stdout.flush()
if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0 and iter_id != 0:
save_name = "{}".format(iter_id + 1)
checkpoint.save(exe, train_prog,
os.path.join(cfg.model_save_dir, save_name))
if (iter_id + 1) == cfg.max_iter:
checkpoint.save(exe, train_prog,
os.path.join(cfg.model_save_dir, "model_final"))
break
end_time = time.time()
total_time = end_time - start_time
last_loss = np.array(outs[0]).mean()
if cfg.use_pyreader:
train_loop_pyreader()
else:
train_loop()
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
check_gpu(args.use_gpu)
train()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
"""
Contains common utility functions.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import paddle.fluid as fluid
import distutils.util
import numpy as np
import six
import argparse
import functools
import collections
import datetime
from collections import deque
from paddle.fluid import core
from collections import deque
from config import *
def print_arguments(args):
"""Print argparse's arguments.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
parser.add_argument("name", default="Jonh", type=str, help="User name.")
args = parser.parse_args()
print_arguments(args)
:param args: Input argparse.Namespace for printing.
:type args: argparse.Namespace
"""
print("----------- Configuration Arguments -----------")
for arg, value in sorted(six.iteritems(vars(args))):
print("%s: %s" % (arg, value))
print("------------------------------------------------")
def add_arguments(argname, type, default, help, argparser, **kwargs):
"""Add argparse's argument.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
add_argument("name", str, "Jonh", "User name.", parser)
args = parser.parse_args()
"""
type = distutils.util.strtobool if type == bool else type
argparser.add_argument(
"--" + argname,
default=default,
type=type,
help=help + ' Default: %(default)s.',
**kwargs)
class SmoothedValue(object):
"""Track a series of values and provide access to smoothed values over a
window or the global series average.
"""
def __init__(self, window_size):
self.deque = deque(maxlen=window_size)
def add_value(self, value):
self.deque.append(value)
def get_median_value(self):
return np.median(self.deque)
def now_time():
return datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')
class TrainingStats(object):
def __init__(self, window_size, stats_keys):
self.smoothed_losses_and_metrics = {
key: SmoothedValue(window_size)
for key in stats_keys
}
def update(self, stats):
for k, v in self.smoothed_losses_and_metrics.items():
v.add_value(stats[k])
def get(self, extras=None):
stats = collections.OrderedDict()
if extras:
for k, v in extras.items():
stats[k] = v
for k, v in self.smoothed_losses_and_metrics.items():
stats[k] = round(v.get_median_value(), 3)
return stats
def log(self, extras=None):
d = self.get(extras)
strs = ', '.join(str(dict({x: y})).strip('{}') for x, y in d.items())
return strs
def parse_args():
"""return all args
"""
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
# ENV
add_arg('use_gpu', bool, True, "Whether use GPU.")
add_arg('model_save_dir', str, 'output', "The path to save model.")
add_arg('pretrained_model', str, 'ResNet50_cos_pretrained', "The init model path.")
add_arg('dataset', str, 'icdar2015', "icdar2015, icdar2017.")
add_arg('class_num', int, 2, "Class number.")
add_arg('data_dir', str, 'dataset/icdar2015', "The data root path.")
add_arg('use_pyreader', bool, False, "Use pyreader.")
add_arg('use_profile', bool, False, "Whether use profiler.")
add_arg('padding_minibatch',bool, False,
"If False, only resize image and not pad, image shape is different between"
" GPUs in one mini-batch. If True, image shape is the same in one mini-batch.")
#SOLVER
add_arg('learning_rate', float, 0.02, "Learning rate.")
add_arg('max_iter', int, 17500, "Iter number.")
add_arg('log_window', int, 20, "Log smooth window, set 1 for debug, set 20 for train.")
# RCNN
# RPN
add_arg('anchor_sizes', int, [128, 256, 512], "The size of anchors.")
add_arg('aspect_ratios', float, [0.2, 0.5,1.0], "The ratio of anchors.")
add_arg('anchor_angle', float, [-30.0, 0.0, 30.0, 60.0, 90.0, 120.0], "The angles of anchors.")
add_arg('variance', float, [1.0, 1.0, 1.0, 1.0, 1.0], "The variance of anchors.")
add_arg('rpn_stride', float, [16.,16.], "Stride of the feature map that RPN is attached.")
add_arg('rpn_nms_thresh', float, 0.7, "NMS threshold used on RPN proposals")
# TRAIN VAL INFER
add_arg('im_per_batch', int, 1, "Minibatch size.")
add_arg('pixel_means', float, [0.485, 0.456, 0.406], "pixel mean")
add_arg('nms_thresh', float, 0.3, "NMS threshold.")
add_arg('score_thresh', float, 0.01, "score threshold for NMS.")
add_arg('snapshot_stride', int, 1000, "save model every snapshot stride.")
# SINGLE EVAL AND DRAW
add_arg('draw_threshold', float, 0.8, "Confidence threshold to draw bbox.")
add_arg('image_path', str, 'ICDAR2015/tmp/', "The image path used to inference and visualize.")
# yapf: enable
args = parser.parse_args()
file_name = sys.argv[0]
if 'train' in file_name or 'profile' in file_name:
merge_cfg_from_args(args, 'train')
else:
merge_cfg_from_args(args, 'val')
return args
def check_gpu(use_gpu):
"""
Log error and exit when set use_gpu=true in paddlepaddle
cpu version.
"""
err = "Config use_gpu cannot be set as true while you are " \
"using paddlepaddle cpu version ! \nPlease try: \n" \
"\t1. Install paddlepaddle-gpu to run model on GPU \n" \
"\t2. Set use_gpu as false in config file to run " \
"model on CPU"
try:
if use_gpu and not fluid.is_compiled_with_cuda():
logger.error(err)
sys.exit(1)
except Exception as e:
pass
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册