未验证 提交 64e5cd96 编写于 作者: W wangguanzhong 提交者: GitHub

Add cornernet (#438)

* code for cornernet

* add doc

* refine custom op

* refine code from review comments

* refine doc

* update doc

* update code

* add cornernet_squeeze_mixup_cosine config

* update code

* update code
上级 f891064a
# Anchor Free系列模型
## 内容
- [简介](#简介)
- [模型库与基线](#模型库与基线)
- [算法细节](#算法细节)
- [如何贡献代码](#如何贡献代码)
## 简介
目前主流的检测算法大体分为两类: single-stage和two-stage,其中single-stage的经典算法包括SSD, YOLO等,two-stage方法有RCNN系列模型,两大类算法在[PaddleDetection Model Zoo](../MODEL_ZOO.md)中均有给出,它们的共同特点是先定义一系列密集的,大小不等的anchor区域,再基于这些先验区域进行分类和回归,这种方式极大的受限于anchor自身的设计。随着CornerNet的提出,涌现了多种anchor free方法,PaddleDetection也集成了一系列anchor free算法。
## 模型库与基线
下表中展示了PaddleDetection当前支持的网络结构,具体细节请参考[算法细节](#算法细节)
| | ResNet50 | ResNet50-vd | Hourglass104 |
|:------------------------:|:--------:|:--------------------------:|:------------------------:|
| [CornerNet-Squeeze](#CornerNet-Squeeze) | x | ✓ | ✓ |
| [FCOS](#FCOS) | ✓ | x | x |
### 模型库
#### COCO数据集上的mAP
| 网络结构 | 骨干网络 | 图片个数/GPU | 预训练模型 | mAP | FPS | 模型下载 |
|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|
| CornerNet-Squeeze | Hourglass104 | 14 | 无 | 34.5 | 35.5 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_hg104.tar) |
| CornerNet-Squeeze | ResNet50-vd | 14 | [faster\_rcnn\_r50\_vd\_fpn\_2x](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_2x.tar) | 32.7 | 42.45 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_r50_vd_fpn.tar) |
| CornerNet-Squeeze-dcn | ResNet50-vd | 14 | [faster\_rcnn\_dcn\_r50\_vd\_fpn\_2x](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar) | 34.9 | 40.05 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_dcn_r50_vd_fpn.tar) |
| CornerNet-Squeeze-dcn-mixup-cosine* | ResNet50-vd | 14 | [faster\_rcnn\_dcn\_r50\_vd\_fpn\_2x](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar) | 38.2 | 40.05 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine.pdparams) |
| FCOS | ResNet50 | 2 | [ResNet50\_cos\_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar) | 39.8 | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/fcos_r50_fpn_1x.pdparams) |
| FCOS+multiscale_train | ResNet50 | 2 | [ResNet50\_cos\_pretrained](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar) | 42.0 | - | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/fcos_r50_fpn_multiscale_2x.pdparams) |
**注意:**
- 模型FPS在Tesla V100单卡环境中通过tools/eval.py进行测试
- CornerNet-Squeeze中使用ResNet结构的骨干网络时,加入了FPN结构,骨干网络的输出feature map采用FPN中的P3层输出。
- \*CornerNet-Squeeze-dcn-mixup-cosine是基于原版CornerNet-Squeeze优化效果最好的模型,在ResNet的骨干网络基础上增加mixup预处理和使用cosine_decay
- FCOS使用GIoU loss、用location分支预测centerness、左上右下角点偏移量归一化和ground truth中心匹配策略
## 算法细节
### CornerNet-Squeeze
**简介:** [CornerNet-Squeeze](https://arxiv.org/abs/1904.08900)[Cornernet](https://arxiv.org/abs/1808.01244)基础上进行改进,预测目标框的左上角和右下角的位置,同时参考SqueezeNet和MobileNet的特点,优化了CornerNet骨干网络Hourglass-104,大幅提升了模型预测速度,相较于原版[YOLO-v3](https://arxiv.org/abs/1804.02767),在训练精度和推理速度上都具备一定优势。
**特点:**
- 使用corner_pooling获取候选框左上角和右下角的位置
- 替换Hourglass-104中的residual block为SqueezeNet中的fire-module
- 替换第二层3x3卷积为3x3深度可分离卷积
### FCOS
**简介:** [FCOS](https://arxiv.org/abs/1904.01355)是一种密集预测的anchor-free检测算法,使用RetinaNet的骨架,直接在feature map上回归目标物体的长宽,并预测物体的类别以及centerness(feature map上像素点离物体中心的偏移程度),centerness最终会作为权重来调整物体得分。
**特点:**
- 利用FPN结构在不同层预测不同scale的物体框,避免了同一feature map像素点处有多个物体框重叠的情况
- 通过center-ness单层分支预测当前点是否是目标中心,消除低质量误检
## 如何贡献代码
我们非常欢迎您可以为PaddleDetection中的Anchor Free检测模型提供代码,您可以提交PR供我们review;也十分感谢您的反馈,可以提交相应issue,我们会及时解答。
architecture: CornerNetSqueeze
use_gpu: true
max_iters: 500000
log_smooth_window: 20
log_iter: 20
save_dir: output
snapshot_iter: 10000
metric: COCO
pretrain_weights: NULL
weights: output/cornernet_squeeze/model_final
num_classes: 80
stack: 2
CornerNetSqueeze:
backbone: Hourglass
corner_head: CornerHead
Hourglass:
dims: [256, 256, 384, 384, 512]
modules: [2, 2, 2, 2, 4]
CornerHead:
train_batch_size: 14
test_batch_size: 1
ae_threshold: 0.5
num_dets: 100
top_k: 20
PostProcess:
use_soft_nms: true
detections_per_im: 100
nms_thresh: 0.001
sigma: 0.5
LearningRate:
base_lr: 0.00025
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 450000
OptimizerBuilder:
optimizer:
type: Adam
regularizer: NULL
TrainReader:
inputs_def:
image_shape: [3, 511, 511]
fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks']
output_size: 64
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: False
- !CornerCrop
input_size: 511
- !Resize
target_dim: 511
- !RandomFlipImage
prob: 0.5
- !CornerRandColor
saturation: 0.4
contrast: 0.4
brightness: 0.4
- !Lighting
eigval: [0.2141788, 0.01817699, 0.00341571]
eigvec: [[-0.58752847, -0.69563484, 0.41340352],
[-0.5832747, 0.00994535, -0.81221408],
[-0.56089297, 0.71832671, 0.41158938]]
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: False
is_channel_first: False
- !Permute
to_bgr: False
- !CornerTarget
output_size: [64, 64]
num_classes: 80
batch_size: 14
shuffle: true
drop_last: true
worker_num: 2
use_process: true
drop_empty: false
EvalReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
batch_size: 1
drop_empty: false
worker_num: 2
use_process: true
TestReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
batch_size: 1
architecture: CornerNetSqueeze
use_gpu: true
max_iters: 500000
log_smooth_window: 20
log_iter: 20
save_dir: output
snapshot_iter: 10000
metric: COCO
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar
weights: output/cornernet_squeeze_dcn_r50_vd_fpn/model_final
num_classes: 80
stack: 1
CornerNetSqueeze:
backbone: ResNet
fpn: FPN
corner_head: CornerHead
ResNet:
norm_type: bn
depth: 50
feature_maps: [3, 4, 5]
freeze_at: 2
variant: d
dcn_v2_stages: [3, 4, 5]
FPN:
min_level: 3
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125]
CornerHead:
train_batch_size: 14
test_batch_size: 1
ae_threshold: 0.5
num_dets: 100
top_k: 20
PostProcess:
use_soft_nms: true
detections_per_im: 100
nms_thresh: 0.001
sigma: 0.5
LearningRate:
base_lr: 0.0005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 400000
- 450000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
TrainReader:
inputs_def:
image_shape: [3, 511, 511]
fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks']
output_size: 64
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: False
- !CornerCrop
input_size: 511
- !Resize
target_dim: 511
- !RandomFlipImage
prob: 0.5
- !CornerRandColor
saturation: 0.4
contrast: 0.4
brightness: 0.4
- !Lighting
eigval: [0.2141788, 0.01817699, 0.00341571]
eigvec: [[-0.58752847, -0.69563484, 0.41340352],
[-0.5832747, 0.00994535, -0.81221408],
[-0.56089297, 0.71832671, 0.41158938]]
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: False
is_channel_first: False
- !Permute
to_bgr: False
- !CornerTarget
output_size: [64, 64]
num_classes: 80
batch_size: 14
shuffle: true
drop_last: true
worker_num: 2
use_process: true
drop_empty: false
EvalReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
use_process: true
batch_size: 1
drop_empty: false
worker_num: 2
TestReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
batch_size: 1
architecture: CornerNetSqueeze
use_gpu: true
max_iters: 500000
log_smooth_window: 20
log_iter: 20
save_dir: output
snapshot_iter: 10000
metric: COCO
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar
weights: output/cornernet_squeeze_dcn_r50_vd_fpn_mixup_cosine/model_final
num_classes: 80
stack: 1
CornerNetSqueeze:
backbone: ResNet
fpn: FPN
corner_head: CornerHead
ResNet:
norm_type: bn
depth: 50
feature_maps: [3, 4, 5]
freeze_at: 2
variant: d
dcn_v2_stages: [3, 4, 5]
FPN:
min_level: 3
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125]
CornerHead:
train_batch_size: 14
test_batch_size: 1
ae_threshold: 0.5
num_dets: 100
top_k: 20
PostProcess:
use_soft_nms: true
detections_per_im: 100
nms_thresh: 0.001
sigma: 0.5
LearningRate:
base_lr: 0.005
schedulers:
- !CosineDecay
max_iters: 500000
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
TrainReader:
inputs_def:
image_shape: [3, 511, 511]
fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks']
output_size: 64
max_tag_len: 256
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: False
with_mixup: True
- !MixupImage
alpha: 1.5
beta: 1.5
- !CornerCrop
input_size: 511
- !Resize
target_dim: 511
- !RandomFlipImage
prob: 0.5
- !CornerRandColor
saturation: 0.4
contrast: 0.4
brightness: 0.4
- !Lighting
eigval: [0.2141788, 0.01817699, 0.00341571]
eigvec: [[-0.58752847, -0.69563484, 0.41340352],
[-0.5832747, 0.00994535, -0.81221408],
[-0.56089297, 0.71832671, 0.41158938]]
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: False
is_channel_first: False
- !Permute
to_bgr: False
- !CornerTarget
output_size: [64, 64]
num_classes: 80
max_tag_len: 256
batch_size: 14
shuffle: true
drop_last: true
worker_num: 2
use_process: true
drop_empty: false
mixup_epoch: 200
EvalReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
use_process: true
batch_size: 1
drop_empty: false
worker_num: 2
TestReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
batch_size: 1
architecture: CornerNetSqueeze
use_gpu: true
max_iters: 500000
log_smooth_window: 20
log_iter: 20
save_dir: output
snapshot_iter: 10000
metric: COCO
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_dcn_r50_vd_fpn_2x.tar
weights: output/cornernet_squeeze_r50_vd_fpn/model_final
num_classes: 80
stack: 1
CornerNetSqueeze:
backbone: ResNet
fpn: FPN
corner_head: CornerHead
ResNet:
norm_type: affine_channel
depth: 50
feature_maps: [3, 4, 5]
freeze_at: 2
variant: d
FPN:
min_level: 3
max_level: 6
num_chan: 256
spatial_scale: [0.03125, 0.0625, 0.125]
CornerHead:
train_batch_size: 14
test_batch_size: 1
ae_threshold: 0.5
num_dets: 100
top_k: 20
PostProcess:
use_soft_nms: true
detections_per_im: 100
nms_thresh: 0.001
sigma: 0.5
LearningRate:
base_lr: 0.0005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 450000
OptimizerBuilder:
optimizer:
type: Adam
regularizer: NULL
TrainReader:
inputs_def:
image_shape: [3, 511, 511]
fields: ['image', 'im_id', 'gt_bbox', 'gt_class', 'tl_heatmaps', 'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags', 'tag_masks']
output_size: 64
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: False
- !CornerCrop
input_size: 511
- !Resize
target_dim: 511
- !RandomFlipImage
prob: 0.5
- !CornerRandColor
saturation: 0.4
contrast: 0.4
brightness: 0.4
- !Lighting
eigval: [0.2141788, 0.01817699, 0.00341571]
eigvec: [[-0.58752847, -0.69563484, 0.41340352],
[-0.5832747, 0.00994535, -0.81221408],
[-0.56089297, 0.71832671, 0.41158938]]
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: False
is_channel_first: False
- !Permute
to_bgr: False
- !CornerTarget
output_size: [64, 64]
num_classes: 80
batch_size: 14
shuffle: true
drop_last: true
worker_num: 2
use_process: true
drop_empty: false
EvalReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
use_process: true
batch_size: 1
drop_empty: false
worker_num: 2
TestReader:
inputs_def:
fields: ['image', 'im_id', 'ratios', 'borders']
output_size: 64
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: false
- !CornerCrop
is_train: false
- !CornerRatio
input_size: 511
output_size: 64
- !Permute
to_bgr: False
- !NormalizeImage
mean: [0.40789654, 0.44719302, 0.47026115]
std: [0.28863828, 0.27408164, 0.2780983]
is_scale: True
is_channel_first: True
batch_size: 1
......@@ -214,3 +214,7 @@ Please refer [face detection models](https://github.com/PaddlePaddle/PaddleDetec
### Object Detection in Open Images Dataset V5
Please refer [Open Images Dataset V5 Baseline model](featured_model/OIDV5_BASELINE_MODEL.md) for details.
### Anchor Free Models
Please refer [Anchor Free Models](featured_model/ANCHOR_FREE_DETECTION.md) for details.
......@@ -204,3 +204,7 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
### 基于Open Images V5数据集的物体检测
详细请参考[Open Images V5数据集基线模型](featured_model/OIDV5_BASELINE_MODEL.md)
### Anchor Free系列模型
详细请参考[Anchor Free系列模型](featured_model/ANCHOR_FREE_DETECTION.md)
**文档教程请参考:** [ACHOR\_FREE\_DETECTION.md](../../configs/anchor_free/README.md) <br/>
......@@ -393,3 +393,52 @@ def is_poly(segm):
assert isinstance(segm, (list, dict)), \
"Invalid segm type: {}".format(type(segm))
return isinstance(segm, list)
def gaussian_radius(bbox_size, min_overlap):
height, width = bbox_size
a1 = 1
b1 = (height + width)
c1 = width * height * (1 - min_overlap) / (1 + min_overlap)
sq1 = np.sqrt(b1**2 - 4 * a1 * c1)
radius1 = (b1 - sq1) / (2 * a1)
a2 = 4
b2 = 2 * (height + width)
c2 = (1 - min_overlap) * width * height
sq2 = np.sqrt(b2**2 - 4 * a2 * c2)
radius2 = (b2 - sq2) / (2 * a2)
a3 = 4 * min_overlap
b3 = -2 * min_overlap * (height + width)
c3 = (min_overlap - 1) * width * height
sq3 = np.sqrt(b3**2 - 4 * a3 * c3)
radius3 = (b3 + sq3) / (2 * a3)
return min(radius1, radius2, radius3)
def draw_gaussian(heatmap, center, radius, k=1, delte=6):
diameter = 2 * radius + 1
gaussian = gaussian2D((diameter, diameter), sigma=diameter / delte)
x, y = center
height, width = heatmap.shape[0:2]
left, right = min(x, radius), min(width - x, radius + 1)
top, bottom = min(y, radius), min(height - y, radius + 1)
masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right]
masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:
radius + right]
np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap)
def gaussian2D(shape, sigma=1):
m, n = [(ss - 1.) / 2. for ss in shape]
y, x = np.ogrid[-m:m + 1, -n:n + 1]
h = np.exp(-(x * x + y * y) / (2 * sigma * sigma))
h[h < np.finfo(h.dtype).eps * h.max()] = 0
return h
......@@ -42,7 +42,7 @@ from .op_helper import (satisfy_sample_constraint, filter_and_process,
generate_sample_bbox, clip_bbox, data_anchor_sampling,
satisfy_sample_constraint_coverage, crop_image_sampling,
generate_sample_bbox_square, bbox_area_sampling,
is_poly)
is_poly, gaussian_radius, draw_gaussian)
logger = logging.getLogger(__name__)
......@@ -1243,10 +1243,13 @@ class ColorDistort(BaseOperator):
def __call__(self, sample, context=None):
img = sample['image']
if self.random_apply:
distortions = np.random.permutation([
self.apply_brightness, self.apply_contrast,
self.apply_saturation, self.apply_hue
])
functions = [
self.apply_brightness,
self.apply_contrast,
self.apply_saturation,
self.apply_hue,
]
distortions = np.random.permutation(functions)
for func in distortions:
img = func(img)
sample['image'] = img
......@@ -1266,6 +1269,66 @@ class ColorDistort(BaseOperator):
return sample
@register_op
class CornerRandColor(ColorDistort):
"""Random color for CornerNet series models.
Args:
saturation (float): saturation settings.
contrast (float): contrast settings.
brightness (float): brightness settings.
is_scale (bool): whether to scale the input image.
"""
def __init__(self,
saturation=0.4,
contrast=0.4,
brightness=0.4,
is_scale=True):
super(CornerRandColor, self).__init__(
saturation=saturation, contrast=contrast, brightness=brightness)
self.is_scale = is_scale
def apply_saturation(self, img, img_gray):
alpha = 1. + np.random.uniform(
low=-self.saturation, high=self.saturation)
self._blend(alpha, img, img_gray[:, :, None])
return img
def apply_contrast(self, img, img_gray):
alpha = 1. + np.random.uniform(low=-self.contrast, high=self.contrast)
img_mean = img_gray.mean()
self._blend(alpha, img, img_mean)
return img
def apply_brightness(self, img, img_gray):
alpha = 1 + np.random.uniform(
low=-self.brightness, high=self.brightness)
img *= alpha
return img
def _blend(self, alpha, img, img_mean):
img *= alpha
img_mean *= (1 - alpha)
img += img_mean
def __call__(self, sample, context=None):
img = sample['image']
if self.is_scale:
img = img.astype(np.float32, copy=False)
img /= 255.
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
functions = [
self.apply_brightness,
self.apply_contrast,
self.apply_saturation,
]
distortions = np.random.permutation(functions)
for func in distortions:
img = func(img, img_gray)
sample['image'] = img
return sample
@register_op
class NormalizePermute(BaseOperator):
"""Normalize and permute channel order.
......@@ -1672,3 +1735,239 @@ class BboxXYXY2XYWH(BaseOperator):
bbox[:, :2] = bbox[:, :2] + bbox[:, 2:4] / 2.
sample['gt_bbox'] = bbox
return sample
@register_op
class Lighting(BaseOperator):
"""
Lighting the imagen by eigenvalues and eigenvectors
Args:
eigval (list): eigenvalues
eigvec (list): eigenvectors
alphastd (float): random weight of lighting, 0.1 by default
"""
def __init__(self, eigval, eigvec, alphastd=0.1):
super(Lighting, self).__init__()
self.alphastd = alphastd
self.eigval = np.array(eigval).astype('float32')
self.eigvec = np.array(eigvec).astype('float32')
def __call__(self, sample, context=None):
alpha = np.random.normal(scale=self.alphastd, size=(3, ))
sample['image'] += np.dot(self.eigvec, self.eigval * alpha)
return sample
@register_op
class CornerTarget(BaseOperator):
"""
Generate targets for CornerNet by ground truth data.
Args:
output_size (int): the size of output heatmaps.
num_classes (int): num of classes.
gaussian_bump (bool): whether to apply gaussian bump on gt targets.
True by default.
gaussian_rad (int): radius of gaussian bump. If it is set to -1, the
radius will be calculated by iou. -1 by default.
gaussian_iou (float): the threshold iou of predicted bbox to gt bbox.
If the iou is larger than threshold, the predicted bboox seems as
positive sample. 0.3 by default
max_tag_len (int): max num of gt box per image.
"""
def __init__(self,
output_size,
num_classes,
gaussian_bump=True,
gaussian_rad=-1,
gaussian_iou=0.3,
max_tag_len=128):
super(CornerTarget, self).__init__()
self.num_classes = num_classes
self.output_size = output_size
self.gaussian_bump = gaussian_bump
self.gaussian_rad = gaussian_rad
self.gaussian_iou = gaussian_iou
self.max_tag_len = max_tag_len
def __call__(self, sample, context=None):
tl_heatmaps = np.zeros(
(self.num_classes, self.output_size[0], self.output_size[1]),
dtype=np.float32)
br_heatmaps = np.zeros(
(self.num_classes, self.output_size[0], self.output_size[1]),
dtype=np.float32)
tl_regrs = np.zeros((self.max_tag_len, 2), dtype=np.float32)
br_regrs = np.zeros((self.max_tag_len, 2), dtype=np.float32)
tl_tags = np.zeros((self.max_tag_len), dtype=np.int64)
br_tags = np.zeros((self.max_tag_len), dtype=np.int64)
tag_masks = np.zeros((self.max_tag_len), dtype=np.uint8)
tag_lens = np.zeros((), dtype=np.int32)
tag_nums = np.zeros((1), dtype=np.int32)
gt_bbox = sample['gt_bbox']
gt_class = sample['gt_class']
keep_inds = ((gt_bbox[:, 2] - gt_bbox[:, 0]) > 0) & \
((gt_bbox[:, 3] - gt_bbox[:, 1]) > 0)
gt_bbox = gt_bbox[keep_inds]
gt_class = gt_class[keep_inds]
sample['gt_bbox'] = gt_bbox
sample['gt_class'] = gt_class
width_ratio = self.output_size[1] / sample['w']
height_ratio = self.output_size[0] / sample['h']
for i in range(gt_bbox.shape[0]):
width = gt_bbox[i][2] - gt_bbox[i][0]
height = gt_bbox[i][3] - gt_bbox[i][1]
xtl, ytl = gt_bbox[i][0], gt_bbox[i][1]
xbr, ybr = gt_bbox[i][2], gt_bbox[i][3]
fxtl = (xtl * width_ratio)
fytl = (ytl * height_ratio)
fxbr = (xbr * width_ratio)
fybr = (ybr * height_ratio)
xtl = int(fxtl)
ytl = int(fytl)
xbr = int(fxbr)
ybr = int(fybr)
if self.gaussian_bump:
width = math.ceil(width * width_ratio)
height = math.ceil(height * height_ratio)
if self.gaussian_rad == -1:
radius = gaussian_radius((height, width), self.gaussian_iou)
radius = max(0, int(radius))
else:
radius = self.gaussian_rad
draw_gaussian(tl_heatmaps[gt_class[i][0]], [xtl, ytl], radius)
draw_gaussian(br_heatmaps[gt_class[i][0]], [xbr, ybr], radius)
else:
tl_heatmaps[gt_class[i][0], ytl, xtl] = 1
br_heatmaps[gt_class[i][0], ybr, xbr] = 1
tl_regrs[i, :] = [fxtl - xtl, fytl - ytl]
br_regrs[i, :] = [fxbr - xbr, fybr - ybr]
tl_tags[tag_lens] = ytl * self.output_size[1] + xtl
br_tags[tag_lens] = ybr * self.output_size[1] + xbr
tag_lens += 1
tag_masks[:tag_lens] = 1
sample['tl_heatmaps'] = tl_heatmaps
sample['br_heatmaps'] = br_heatmaps
sample['tl_regrs'] = tl_regrs
sample['br_regrs'] = br_regrs
sample['tl_tags'] = tl_tags
sample['br_tags'] = br_tags
sample['tag_masks'] = tag_masks
return sample
@register_op
class CornerCrop(BaseOperator):
"""
Random crop for CornerNet
Args:
random_scales (list): scales of output_size to input_size.
border (int): border of corp center
is_train (bool): train or test
input_size (int): size of input image
"""
def __init__(self,
random_scales=[0.6, 0.7, 0.8, 0.9, 1., 1.1, 1.2, 1.3],
border=128,
is_train=True,
input_size=511):
super(CornerCrop, self).__init__()
self.random_scales = random_scales
self.border = border
self.is_train = is_train
self.input_size = input_size
def __call__(self, sample, context=None):
im_h, im_w = int(sample['h']), int(sample['w'])
if self.is_train:
scale = np.random.choice(self.random_scales)
height = int(self.input_size * scale)
width = int(self.input_size * scale)
w_border = self._get_border(self.border, im_w)
h_border = self._get_border(self.border, im_h)
ctx = np.random.randint(low=w_border, high=im_w - w_border)
cty = np.random.randint(low=h_border, high=im_h - h_border)
else:
cty, ctx = im_h // 2, im_w // 2
height = im_h | 127
width = im_w | 127
cropped_image = np.zeros(
(height, width, 3), dtype=sample['image'].dtype)
x0, x1 = max(ctx - width // 2, 0), min(ctx + width // 2, im_w)
y0, y1 = max(cty - height // 2, 0), min(cty + height // 2, im_h)
left_w, right_w = ctx - x0, x1 - ctx
top_h, bottom_h = cty - y0, y1 - cty
# crop image
cropped_ctx, cropped_cty = width // 2, height // 2
x_slice = slice(int(cropped_ctx - left_w), int(cropped_ctx + right_w))
y_slice = slice(int(cropped_cty - top_h), int(cropped_cty + bottom_h))
cropped_image[y_slice, x_slice, :] = sample['image'][y0:y1, x0:x1, :]
sample['image'] = cropped_image
sample['h'], sample['w'] = height, width
if self.is_train:
# crop detections
gt_bbox = sample['gt_bbox']
gt_bbox[:, 0:4:2] -= x0
gt_bbox[:, 1:4:2] -= y0
gt_bbox[:, 0:4:2] += cropped_ctx - left_w
gt_bbox[:, 1:4:2] += cropped_cty - top_h
else:
sample['borders'] = np.array(
[
cropped_cty - top_h, cropped_cty + bottom_h,
cropped_ctx - left_w, cropped_ctx + right_w
],
dtype=np.float32)
return sample
def _get_border(self, border, size):
i = 1
while size - border // i <= border // i:
i *= 2
return border // i
@register_op
class CornerRatio(BaseOperator):
"""
Ratio of output size to image size
Args:
input_size (int): the size of input size
output_size (int): the size of heatmap
"""
def __init__(self, input_size=511, output_size=64):
super(CornerRatio, self).__init__()
self.input_size = input_size
self.output_size = output_size
def __call__(self, sample, context=None):
scale = (self.input_size + 1) // self.output_size
out_height, out_width = (sample['h'] + 1) // scale, (
sample['w'] + 1) // scale
height_ratio = out_height / float(sample['h'])
width_ratio = out_width / float(sample['w'])
sample['ratios'] = np.array([height_ratio, width_ratio])
return sample
# 自定义OP的编译过程
**注意:** 编译自定义OP使用的gcc版本须与Paddle编译使用gcc版本一致,Paddle develop每日版本目前采用**gcc 4.8.2**版本编译,若使用每日版本,请使用**gcc 4.8.2**版本编译自定义OP,否则可能出现兼容性问题。
## 代码结构
- src: 扩展OP C++/CUDA 源码
- cornerpool_lib.py: Python API封装
- tests: 各OP单测程序
## 编译自定义OP
自定义op需要将实现的C++、CUDA代码编译成动态库,```src/mask.sh```中通过g++/nvcc编译,当然您也可以写Makefile或者CMake。
编译需要include PaddlePaddle的相关头文件,链接PaddlePaddle的lib库。 头文件和lib库可通过下面命令获取到:
```
# python
>>> import paddle
>>> print(paddle.sysconfig.get_include())
/paddle/pyenv/local/lib/python2.7/site-packages/paddle/include
>>> print(paddle.sysconfig.get_lib())
/paddle/pyenv/local/lib/python2.7/site-packages/paddle/libs
```
我们提供动态库编译脚本如下:
```
cd src
sh make.sh
```
最终编译会产出`cornerpool_lib.so`
**说明:** 若使用源码编译安装PaddlePaddle的方式,编译过程中`cmake`未设置`WITH_MKLDNN`的方式,
编译自定义OP时会报错找不到`mkldnn.h`等文件,可在`make.sh`中删除编译命令中的`-DPADDLE_WITH_MKLDNN`选项。
## 执行单测
执行下列单测,确保自定义算子可在网络中正确使用:
```
# 回到 ext_op 目录,添加 PYTHONPATH
cd ..
export PYTHONPATH=$PYTHONPATH:`pwd`
# 运行单测
python test/test_corner_op.py
```
单测运行成功会输出提示信息,如下所示:
```
.
----------------------------------------------------------------------
Ran 4 test in 2.858s
OK
```
更多关于如何在框架外部自定义 C++ OP,可阅读[官网说明文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_usage/index_cn.html)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from . import cornerpool_lib
from .cornerpool_lib import *
__all__ = cornerpool_lib.__all__
import os
import paddle.fluid as fluid
file_dir = os.path.dirname(os.path.abspath(__file__))
fluid.load_op_library(os.path.join(file_dir, 'src/cornerpool_lib.so'))
from paddle.fluid.layer_helper import LayerHelper
__all__ = [
'bottom_pool',
'top_pool',
'right_pool',
'left_pool',
]
def bottom_pool(input, is_test=False, name=None):
"""
This layer calculates the bottom pooling output based on the input.
Scan the input from top to bottm for the vertical max-pooling.
The output has the same shape with input.
Args:
input(Variable): This input is a Tensor with shape [N, C, H, W].
The data type is float32 or float64.
Returns:
Variable(Tensor): The output of bottom_pool, with shape [N, C, H, W].
The data type is float32 or float64.
Examples:
..code-block:: python
import paddle.fluid as fluid
import cornerpool_lib
input = fluid.data(
name='input', shape=[2, 64, 10, 10], dtype='float32')
output = corner_pool.bottom_pool(input)
"""
if is_test:
helper = LayerHelper('bottom_pool', **locals())
dtype = helper.input_dtype()
output = helper.create_variable_for_type_inference(dtype)
max_map = helper.create_variable_for_type_inference(dtype)
helper.append_op(
type="bottom_pool",
inputs={"X": input},
outputs={"Output": output,
"MaxMap": max_map})
return output
H = input.shape[2]
i = 1
output = input
while i < H:
cur = output[:, :, i:, :]
next = output[:, :, :H - i, :]
max_v = fluid.layers.elementwise_max(cur, next)
output = fluid.layers.concat([output[:, :, :i, :], max_v], axis=2)
i *= 2
return output
def top_pool(input, is_test=False, name=None):
"""
This layer calculates the top pooling output based on the input.
Scan the input from bottom to top for the vertical max-pooling.
The output has the same shape with input.
Args:
input(Variable): This input is a Tensor with shape [N, C, H, W].
The data type is float32 or float64.
Returns:
Variable(Tensor): The output of top_pool, with shape [N, C, H, W].
The data type is float32 or float64.
Examples:
..code-block:: python
import paddle.fluid as fluid
import cornerpool_lib
input = fluid.data(
name='input', shape=[2, 64, 10, 10], dtype='float32')
output = corner_pool.top_pool(input)
"""
if is_test:
helper = LayerHelper('top_pool', **locals())
dtype = helper.input_dtype()
output = helper.create_variable_for_type_inference(dtype)
max_map = helper.create_variable_for_type_inference(dtype)
helper.append_op(
type="top_pool",
inputs={"X": input},
outputs={"Output": output,
"MaxMap": max_map})
return output
H = input.shape[2]
i = 1
output = input
while i < H:
cur = output[:, :, :H - i, :]
next = output[:, :, i:, :]
max_v = fluid.layers.elementwise_max(cur, next)
output = fluid.layers.concat([max_v, output[:, :, H - i:, :]], axis=2)
i *= 2
return output
def right_pool(input, is_test=False, name=None):
"""
This layer calculates the right pooling output based on the input.
Scan the input from left to right for the horizontal max-pooling.
The output has the same shape with input.
Args:
input(Variable): This input is a Tensor with shape [N, C, H, W].
The data type is float32 or float64.
Returns:
Variable(Tensor): The output of right_pool, with shape [N, C, H, W].
The data type is float32 or float64.
Examples:
..code-block:: python
import paddle.fluid as fluid
import cornerpool_lib
input = fluid.data(
name='input', shape=[2, 64, 10, 10], dtype='float32')
output = corner_pool.right_pool(input)
"""
if is_test:
helper = LayerHelper('right_pool', **locals())
dtype = helper.input_dtype()
output = helper.create_variable_for_type_inference(dtype)
max_map = helper.create_variable_for_type_inference(dtype)
helper.append_op(
type="right_pool",
inputs={"X": input},
outputs={"Output": output,
"MaxMap": max_map})
return output
W = input.shape[3]
i = 1
output = input
while i < W:
cur = output[:, :, :, i:]
next = output[:, :, :, :W - i]
max_v = fluid.layers.elementwise_max(cur, next)
output = fluid.layers.concat([output[:, :, :, :i], max_v], axis=-1)
i *= 2
return output
def left_pool(input, is_test=False, name=None):
"""
This layer calculates the left pooling output based on the input.
Scan the input from right to left for the horizontal max-pooling.
The output has the same shape with input.
Args:
input(Variable): This input is a Tensor with shape [N, C, H, W].
The data type is float32 or float64.
Returns:
Variable(Tensor): The output of left_pool, with shape [N, C, H, W].
The data type is float32 or float64.
Examples:
..code-block:: python
import paddle.fluid as fluid
import cornerpool_lib
input = fluid.data(
name='input', shape=[2, 64, 10, 10], dtype='float32')
output = corner_pool.left_pool(input)
"""
if is_test:
helper = LayerHelper('left_pool', **locals())
dtype = helper.input_dtype()
output = helper.create_variable_for_type_inference(dtype)
max_map = helper.create_variable_for_type_inference(dtype)
helper.append_op(
type="left_pool",
inputs={"X": input},
outputs={"Output": output,
"MaxMap": max_map})
return output
W = input.shape[3]
i = 1
output = input
while i < W:
cur = output[:, :, :, :W - i]
next = output[:, :, :, i:]
max_v = fluid.layers.elementwise_max(cur, next)
output = fluid.layers.concat([max_v, output[:, :, :, W - i:]], axis=-1)
i *= 2
return output
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class BottomPoolOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
ctx->ShareDim("X", /*->*/ "MaxMap");
ctx->ShareDim("X", /*->*/ "Output");
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
ctx.GetPlace());
}
};
class BottomPoolOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"Input with shape (batch, C, H, W)");
AddOutput("MaxMap", "Max map with index of maximum value of input");
AddOutput("Output", "output with same shape as input(X)");
AddComment(
R"Doc(
This operatio calculates the bottom pooling output based on the input.
Scan the input from top to bottom for the vertical max-pooling.
The output has the same shape with input.
)Doc");
}
};
class BottomPoolOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
"Input(Output@GRAD) should not be null");
auto out_grad_name = framework::GradVarName("Output");
ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
}
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
ctx.GetPlace());
}
};
template <typename T>
class BottomPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
protected:
void Apply(GradOpPtr<T> op) const override {
op->SetType("bottom_pool_grad");
op->SetInput("X", this->Input("X"));
op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
op->SetInput("MaxMap", this->Output("MaxMap"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetAttrMap(this->Attrs());
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(bottom_pool,
ops::BottomPoolOp,
ops::BottomPoolOpMaker,
ops::BottomPoolGradDescMaker<paddle::framework::OpDesc>,
ops::BottomPoolGradDescMaker<paddle::imperative::OpBase>);
REGISTER_OPERATOR(bottom_pool_grad, ops::BottomPoolOpGrad);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
#include "util.cu.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
static constexpr int kNumCUDAThreads = 512;
static constexpr int kNumMaximumNumBlocks = 4096;
static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaximumNumBlocks);
}
template <typename T>
class BottomPoolOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"This kernel only runs on GPU device.");
auto *x = ctx.Input<Tensor>("X");
auto *max_map = ctx.Output<Tensor>("MaxMap");
auto *output = ctx.Output<Tensor>("Output");
auto *x_data = x->data<T>();
auto x_dims = x->dims();
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int num = x->numel();
auto& dev_ctx = ctx.cuda_device_context();
int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int blocks = NumBlocks(num / height);
auto max_val_ptr = memory::Alloc(gpu_place, num / height * sizeof(T));
T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
auto max_ind_ptr = memory::Alloc(gpu_place, num / height * sizeof(int));
int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 2, false, max_val_data, max_ind_data, max_map_data);
blocks = NumBlocks(num);
ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 2, output_data);
}
};
template <typename T>
class BottomPoolGradOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* max_map = ctx.Input<Tensor>("MaxMap");
auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto x_dims = x->dims();
auto& dev_ctx = ctx.cuda_device_context();
T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int grad_num = in_grad->numel();
int blocks = NumBlocks(grad_num);
FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 2, in_grad_data);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(bottom_pool,
ops::BottomPoolOpCUDAKernel<float>,
ops::BottomPoolOpCUDAKernel<double>);
REGISTER_OP_CUDA_KERNEL(bottom_pool_grad,
ops::BottomPoolGradOpCUDAKernel<float>,
ops::BottomPoolGradOpCUDAKernel<double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class LeftPoolOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
ctx->ShareDim("X", /*->*/ "MaxMap");
ctx->ShareDim("X", /*->*/ "Output");
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
ctx.GetPlace());
}
};
class LeftPoolOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"Input with shape (batch, C, H, W)");
AddOutput("MaxMap", "Max map with index of maximum value of input");
AddOutput("Output", "output with same shape as input(X)");
AddComment(
R"Doc(
This operatio calculates the left pooling output based on the input.
Scan the input from right to left for the horizontal max-pooling.
The output has the same shape with input.
)Doc");
}
};
class LeftPoolOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
"Input(Output@GRAD) should not be null");
auto out_grad_name = framework::GradVarName("Output");
ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
}
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
ctx.GetPlace());
}
};
template <typename T>
class LeftPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
protected:
void Apply(GradOpPtr<T> op) const override {
op->SetType("left_pool_grad");
op->SetInput("X", this->Input("X"));
op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
op->SetInput("MaxMap", this->Output("MaxMap"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetAttrMap(this->Attrs());
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(left_pool,
ops::LeftPoolOp,
ops::LeftPoolOpMaker,
ops::LeftPoolGradDescMaker<paddle::framework::OpDesc>,
ops::LeftPoolGradDescMaker<paddle::imperative::OpBase>);
REGISTER_OPERATOR(left_pool_grad, ops::LeftPoolOpGrad);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
#include "util.cu.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
static constexpr int kNumCUDAThreads = 512;
static constexpr int kNumMaximumNumBlocks = 4096;
static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaximumNumBlocks);
}
template <typename T>
class LeftPoolOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"This kernel only runs on GPU device.");
auto *x = ctx.Input<Tensor>("X");
auto *max_map = ctx.Output<Tensor>("MaxMap");
auto *output = ctx.Output<Tensor>("Output");
auto *x_data = x->data<T>();
auto x_dims = x->dims();
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int num = x->numel();
auto& dev_ctx = ctx.cuda_device_context();
int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int blocks = NumBlocks(num / width);
auto max_val_ptr = memory::Alloc(gpu_place, num / width * sizeof(T));
T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
auto max_ind_ptr = memory::Alloc(gpu_place, num / width * sizeof(int));
int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 3, true, max_val_data, max_ind_data, max_map_data);
blocks = NumBlocks(num);
ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 3, output_data);
}
};
template <typename T>
class LeftPoolGradOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* max_map = ctx.Input<Tensor>("MaxMap");
auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto x_dims = x->dims();
auto& dev_ctx = ctx.cuda_device_context();
T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int grad_num = in_grad->numel();
int blocks = NumBlocks(grad_num);
FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 3, in_grad_data);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(left_pool,
ops::LeftPoolOpCUDAKernel<float>,
ops::LeftPoolOpCUDAKernel<double>);
REGISTER_OP_CUDA_KERNEL(left_pool_grad,
ops::LeftPoolGradOpCUDAKernel<float>,
ops::LeftPoolGradOpCUDAKernel<double>);
include_dir=$( python -c 'import paddle; print(paddle.sysconfig.get_include())' )
lib_dir=$( python -c 'import paddle; print(paddle.sysconfig.get_lib())' )
echo $include_dir
echo $lib_dir
OPS='bottom_pool_op top_pool_op right_pool_op left_pool_op'
for op in ${OPS}
do
nvcc ${op}.cu -c -o ${op}.cu.o -ccbin cc -DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO -DPADDLE_WITH_MKLDNN -Xcompiler -fPIC -std=c++11 -Xcompiler -fPIC -w --expt-relaxed-constexpr -O0 -g -DNVCC \
-I ${include_dir}/third_party/ \
-I ${include_dir}
done
g++ bottom_pool_op.cc bottom_pool_op.cu.o top_pool_op.cc top_pool_op.cu.o right_pool_op.cc right_pool_op.cu.o left_pool_op.cc left_pool_op.cu.o -o cornerpool_lib.so -DPADDLE_WITH_MKLDNN -shared -fPIC -std=c++11 -O0 -g \
-I ${include_dir}/third_party/ \
-I ${include_dir} \
-L ${lib_dir} \
-L /usr/local/cuda/lib64 -lpaddle_framework -lcudart
rm *.cu.o
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$lib_dir
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class RightPoolOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
ctx->ShareDim("X", /*->*/ "MaxMap");
ctx->ShareDim("X", /*->*/ "Output");
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
ctx.GetPlace());
}
};
class RightPoolOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"Input with shape (batch, C, H, W)");
AddOutput("MaxMap", "Max map with index of maximum value of input");
AddOutput("Output", "output with same shape as input(X)");
AddComment(
R"Doc(
This operatio calculates the right pooling output based on the input.
Scan the input from left to right or the horizontal max-pooling.
The output has the same shape with input.
)Doc");
}
};
class RightPoolOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
"Input(Output@GRAD) should not be null");
auto out_grad_name = framework::GradVarName("Output");
ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
}
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
ctx.GetPlace());
}
};
template <typename T>
class RightPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
protected:
void Apply(GradOpPtr<T> op) const override {
op->SetType("right_pool_grad");
op->SetInput("X", this->Input("X"));
op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
op->SetInput("MaxMap", this->Output("MaxMap"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetAttrMap(this->Attrs());
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(right_pool,
ops::RightPoolOp,
ops::RightPoolOpMaker,
ops::RightPoolGradDescMaker<paddle::framework::OpDesc>,
ops::RightPoolGradDescMaker<paddle::imperative::OpBase>);
REGISTER_OPERATOR(right_pool_grad, ops::RightPoolOpGrad);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
#include "util.cu.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
static constexpr int kNumCUDAThreads = 512;
static constexpr int kNumMaximumNumBlocks = 4096;
static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaximumNumBlocks);
}
template <typename T>
class RightPoolOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"This kernel only runs on GPU device.");
auto *x = ctx.Input<Tensor>("X");
auto *max_map = ctx.Output<Tensor>("MaxMap");
auto *output = ctx.Output<Tensor>("Output");
auto *x_data = x->data<T>();
auto x_dims = x->dims();
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int num = x->numel();
auto& dev_ctx = ctx.cuda_device_context();
int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int blocks = NumBlocks(num / width);
auto max_val_ptr = memory::Alloc(gpu_place, num / width * sizeof(T));
T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
auto max_ind_ptr = memory::Alloc(gpu_place, num / width * sizeof(int));
int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 3, false, max_val_data, max_ind_data, max_map_data);
blocks = NumBlocks(num);
ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 3, output_data);
}
};
template <typename T>
class RightPoolGradOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* max_map = ctx.Input<Tensor>("MaxMap");
auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto x_dims = x->dims();
auto& dev_ctx = ctx.cuda_device_context();
T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int grad_num = in_grad->numel();
int blocks = NumBlocks(grad_num);
FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 3, in_grad_data);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(right_pool,
ops::RightPoolOpCUDAKernel<float>,
ops::RightPoolOpCUDAKernel<double>);
REGISTER_OP_CUDA_KERNEL(right_pool_grad,
ops::RightPoolGradOpCUDAKernel<float>,
ops::RightPoolGradOpCUDAKernel<double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class TopPoolOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
ctx->ShareDim("X", /*->*/ "MaxMap");
ctx->ShareDim("X", /*->*/ "Output");
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
ctx.GetPlace());
}
};
class TopPoolOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"Input with shape (batch, C, H, W)");
AddOutput("MaxMap", "Max map with index of maximum value of input");
AddOutput("Output", "Output with same shape as input(X)");
AddComment(
R"Doc(
This operatio calculates the top pooling output based on the input.
Scan the input from bottom to top for the vertical max-pooling.
The output has the same shape with input.
)Doc");
}
};
class TopPoolOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
"Input(Output@GRAD) should not be null");
auto out_grad_name = framework::GradVarName("Output");
ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
}
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
ctx.GetPlace());
}
};
template <typename T>
class TopPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
protected:
void Apply(GradOpPtr<T> op) const override {
op->SetType("top_pool_grad");
op->SetInput("X", this->Input("X"));
op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
op->SetInput("MaxMap", this->Output("MaxMap"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetAttrMap(this->Attrs());
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(top_pool,
ops::TopPoolOp,
ops::TopPoolOpMaker,
ops::TopPoolGradDescMaker<paddle::framework::OpDesc>,
ops::TopPoolGradDescMaker<paddle::imperative::OpBase>);
REGISTER_OPERATOR(top_pool_grad, ops::TopPoolOpGrad);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
GUnless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
#include "util.cu.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
static constexpr int kNumCUDAThreads = 512;
static constexpr int kNumMaximumNumBlocks = 4096;
static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaximumNumBlocks);
}
template <typename T>
class TopPoolOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"This kernel only runs on GPU device.");
auto *x = ctx.Input<Tensor>("X");
auto *max_map = ctx.Output<Tensor>("MaxMap");
auto *output = ctx.Output<Tensor>("Output");
auto *x_data = x->data<T>();
auto x_dims = x->dims();
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int num = x->numel();
auto& dev_ctx = ctx.cuda_device_context();
int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int blocks = NumBlocks(num / height);
auto max_val_ptr = memory::Alloc(gpu_place, num / height * sizeof(T));
T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
auto max_ind_ptr = memory::Alloc(gpu_place, num / height * sizeof(int));
int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 2, true, max_val_data, max_ind_data, max_map_data);
blocks = NumBlocks(num);
ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 2, output_data);
}
};
template <typename T>
class TopPoolGradOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* max_map = ctx.Input<Tensor>("MaxMap");
auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto x_dims = x->dims();
auto& dev_ctx = ctx.cuda_device_context();
T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int grad_num = in_grad->numel();
int blocks = NumBlocks(grad_num);
FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 2, in_grad_data);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(top_pool,
ops::TopPoolOpCUDAKernel<float>,
ops::TopPoolOpCUDAKernel<double>);
REGISTER_OP_CUDA_KERNEL(top_pool_grad,
ops::TopPoolGradOpCUDAKernel<float>,
ops::TopPoolGradOpCUDAKernel<double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
namespace paddle {
namespace operators {
using framework::Tensor;
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
i += blockDim.x * gridDim.x)
template <typename T>
__global__ void FillConstant(T* x, int num, int fill_num) {
CUDA_1D_KERNEL_LOOP(i, fill_num) {
x[i] = static_cast<T>(num);
}
}
template <typename T>
__global__ void SliceOnAxis(const T* x, const int NC_num, const int H, const int W,
const int axis, const int start, const int end,
T* output) {
int HW_num = H * W;
int length = axis == 2 ? W : H;
int sliced_len = end - start;
int cur_HW_num = length * sliced_len;
// slice input on H or W (axis is 2 or 3)
CUDA_1D_KERNEL_LOOP(i, NC_num * cur_HW_num) {
int NC_id = i / cur_HW_num;
int HW_id = i % cur_HW_num;
if (axis == 2){
output[i] = x[NC_id * HW_num + start * W + HW_id];
} else if (axis == 3) {
int col = HW_id % sliced_len;
int row = HW_id / sliced_len;
output[i] = x[NC_id * HW_num + row * W + start + col];
}
}
}
template <typename T>
__global__ void MaxOut(const T* input, const int next_ind, const int NC_num,
const int H, const int W, const int axis,
const int start, const int end, T* output) {
int HW_num = H * W;
int length = axis == 2 ? W : H;
T cur = static_cast<T>(0.);
T next = static_cast<T>(0.);
T max_v = static_cast<T>(0.);
int sliced_len = end - start;
int cur_HW_num = length * sliced_len;
// compare cur and next and assign max values to output
CUDA_1D_KERNEL_LOOP(i, NC_num * cur_HW_num) {
int NC_id = i / cur_HW_num;
int HW_id = i % cur_HW_num;
if (axis == 2){
cur = input[NC_id * HW_num + start * W + HW_id];
next = input[NC_id * HW_num + next_ind * W + HW_id];
max_v = cur > next ? cur : next;
output[NC_id * HW_num + start * W + HW_id] = max_v;
} else if (axis == 3) {
int col = HW_id % sliced_len;
int row = HW_id / sliced_len;
cur = input[NC_id * HW_num + row * W + start + col];
next = input[NC_id * HW_num + row * W + next_ind + col];
max_v = cur > next ? cur : next;
output[NC_id * HW_num + row * W + start + col] = max_v;
}
__syncthreads();
}
}
template <typename T>
__global__ void UpdateMaxInfo(const T* input, const int NC_num,
const int H, const int W, const int axis,
const int index, T* max_val, int* max_ind) {
int length = axis == 2 ? W : H;
int HW_num = H * W;
T val = static_cast<T>(0.);
CUDA_1D_KERNEL_LOOP(i, NC_num * length) {
int NC_id = i / length;
int length_id = i % length;
if (axis == 2) {
val = input[NC_id * HW_num + index * W + length_id];
} else if (axis == 3) {
val = input[NC_id * HW_num + length_id * W + index];
}
if (val > max_val[i]) {
max_val[i] = val;
max_ind[i] = index;
}
__syncthreads();
}
}
template <typename T>
__global__ void ScatterAddOnAxis(const T* input, const int start, const int* max_ind, const int NC_num, const int H, const int W, const int axis, T* output) {
int length = axis == 2 ? W : H;
int HW_num = H * W;
CUDA_1D_KERNEL_LOOP(i, NC_num * length) {
int NC_id = i / length;
int length_id = i % length;
int id_ = max_ind[i];
if (axis == 2) {
platform::CudaAtomicAdd(output + NC_id * HW_num + id_ * W + length_id, input[NC_id * HW_num + start * W + length_id]);
//output[NC_id * HW_num + id_ * W + length_id] += input[NC_id * HW_num + start * W + length_id];
} else if (axis == 3) {
platform::CudaAtomicAdd(output + NC_id * HW_num + length_id * W + id_, input[NC_id * HW_num + length_id * W + start]);
//output[NC_id * HW_num + length_id * W + id_] += input[NC_id * HW_num + length_id * W + start];
}
__syncthreads();
}
}
template <typename T>
__global__ void GetMaxInfo(const T* input, const int NC_num,
const int H, const int W, const int axis,
const bool reverse, T* max_val, int* max_ind,
int* max_map) {
int start = 0;
int end = axis == 2 ? H: W;
int s = reverse ? end-1 : start;
int e = reverse ? start-1 : end;
int step = reverse ? -1 : 1;
int len = axis == 2 ? W : H;
int loc = 0;
T val = static_cast<T>(0.);
for (int i = s; ; ) {
if (i == s) {
CUDA_1D_KERNEL_LOOP(j, NC_num * len) {
int NC_id = j / len;
int len_id = j % len;
if (axis == 2) {
loc = NC_id * H * W + i * W + len_id;
} else if (axis == 3){
loc = NC_id * H * W + len_id * W + i;
}
max_ind[j] = i;
max_map[loc] = max_ind[j];
max_val[j] = input[loc];
__syncthreads();
}
} else {
CUDA_1D_KERNEL_LOOP(j, NC_num * len) {
int NC_id = j / len;
int len_id = j % len;
if (axis == 2) {
loc = NC_id * H * W + i * W + len_id;
} else if (axis == 3){
loc = NC_id * H * W + len_id * W + i;
}
val = input[loc];
T max_v = max_val[j];
if (val > max_v) {
max_val[j] = val;
max_map[loc] = i;
max_ind[j] = i;
} else {
max_map[loc] = max_ind[j];
}
__syncthreads();
}
}
i += step;
if (s < e && i >= e) break;
if (s > e && i <= e) break;
}
}
template <typename T>
__global__ void ScatterAddFw(const T* input, const int* max_map, const int NC_num, const int H, const int W, const int axis, T* output){
CUDA_1D_KERNEL_LOOP(i, NC_num * H * W) {
int loc = max_map[i];
int NC_id = i / (H * W);
int len_id = 0;
if (axis == 2) {
len_id = i % W;
output[i] = input[NC_id * H * W + loc * W + len_id];
} else {
len_id = i % (H * W) / W;
output[i] = input[NC_id * H * W + len_id * W + loc];
}
}
}
template <typename T>
__global__ void ScatterAddBw(const T* input, const int* max_map, const int NC_num, const int H, const int W, const int axis, T* output){
CUDA_1D_KERNEL_LOOP(i, NC_num * H * W) {
int loc = max_map[i];
int NC_id = i / (H * W);
int len_id = 0;
int offset = 0;
if (axis == 2) {
len_id = i % W;
offset = NC_id * H * W + loc * W + len_id;
} else {
len_id = i % (H * W) / W;
offset = NC_id * H * W + len_id * W + loc;
}
platform::CudaAtomicAdd(output + offset, input[i]);
}
}
} // namespace operators
} // namespace paddle
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import unittest
import numpy as np
import paddle.fluid as fluid
import cornerpool_lib
def bottom_pool_np(x):
height = x.shape[2]
output = x.copy()
for ind in range(height):
cur = output[:, :, ind:height, :]
next = output[:, :, :height - ind, :]
output[:, :, ind:height, :] = np.maximum(cur, next)
return output
def top_pool_np(x):
height = x.shape[2]
output = x.copy()
for ind in range(height):
cur = output[:, :, :height - ind, :]
next = output[:, :, ind:height, :]
output[:, :, :height - ind, :] = np.maximum(cur, next)
return output
def right_pool_np(x):
width = x.shape[3]
output = x.copy()
for ind in range(width):
cur = output[:, :, :, ind:width]
next = output[:, :, :, :width - ind]
output[:, :, :, ind:width] = np.maximum(cur, next)
return output
def left_pool_np(x):
width = x.shape[3]
output = x.copy()
for ind in range(width):
cur = output[:, :, :, :width - ind]
next = output[:, :, :, ind:width]
output[:, :, :, :width - ind] = np.maximum(cur, next)
return output
class TestRightPoolOp(unittest.TestCase):
def funcmap(self):
self.func_map = {
'bottom_x': [cornerpool_lib.bottom_pool, bottom_pool_np],
'top_x': [cornerpool_lib.top_pool, top_pool_np],
'right_x': [cornerpool_lib.right_pool, right_pool_np],
'left_x': [cornerpool_lib.left_pool, left_pool_np]
}
def setup(self):
self.name = 'right_x'
def test_check_output(self):
self.funcmap()
self.setup()
x_shape = (2, 10, 16, 16)
x_type = "float64"
sp = fluid.Program()
tp = fluid.Program()
place = fluid.CUDAPlace(0)
with fluid.program_guard(tp, sp):
x = fluid.layers.data(
name=self.name,
shape=x_shape,
dtype=x_type,
append_batch_size=False)
y = self.func_map[self.name][0](x)
np.random.seed(0)
x_np = np.random.uniform(-1000, 1000, x_shape).astype(x_type)
out_np = self.func_map[self.name][1](x_np)
exe = fluid.Executor(place)
outs = exe.run(tp, feed={self.name: x_np}, fetch_list=[y])
self.assertTrue(np.allclose(outs, out_np))
class TestTopPoolOp(TestRightPoolOp):
def setup(self):
self.name = 'top_x'
class TestBottomPoolOp(TestRightPoolOp):
def setup(self):
self.name = 'bottom_x'
class TestLeftPoolOp(TestRightPoolOp):
def setup(self):
self.name = 'left_x'
if __name__ == "__main__":
unittest.main()
......@@ -18,8 +18,10 @@ from . import rpn_head
from . import yolo_head
from . import retina_head
from . import fcos_head
from . import corner_head
from .rpn_head import *
from .yolo_head import *
from .retina_head import *
from .fcos_head import *
from .corner_head import *
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Constant
from ..backbones.hourglass import _conv_norm, kaiming_init
from ppdet.core.workspace import register
import numpy as np
try:
import cornerpool_lib
except:
print(
"warning: cornerpool_lib not found, compile in ext_op at first if needed"
)
__all__ = ['CornerHead']
def corner_output(x, pool1, pool2, dim, name=None):
p_conv1 = fluid.layers.conv2d(
pool1 + pool2,
filter_size=3,
num_filters=dim,
padding=1,
param_attr=ParamAttr(
name=name + "_p_conv1_weight",
initializer=kaiming_init(pool1 + pool2, 3)),
bias_attr=False,
name=name + '_p_conv1')
p_bn1 = fluid.layers.batch_norm(
p_conv1,
param_attr=ParamAttr(name=name + '_p_bn1_weight'),
bias_attr=ParamAttr(name=name + '_p_bn1_bias'),
moving_mean_name=name + '_p_bn1_running_mean',
moving_variance_name=name + '_p_bn1_running_var',
name=name + '_p_bn1')
conv1 = fluid.layers.conv2d(
x,
filter_size=1,
num_filters=dim,
param_attr=ParamAttr(
name=name + "_conv1_weight", initializer=kaiming_init(x, 1)),
bias_attr=False,
name=name + '_conv1')
bn1 = fluid.layers.batch_norm(
conv1,
param_attr=ParamAttr(name=name + '_bn1_weight'),
bias_attr=ParamAttr(name=name + '_bn1_bias'),
moving_mean_name=name + '_bn1_running_mean',
moving_variance_name=name + '_bn1_running_var',
name=name + '_bn1')
relu1 = fluid.layers.relu(p_bn1 + bn1)
conv2 = _conv_norm(
relu1, 3, dim, pad=1, bn_act='relu', name=name + '_conv2')
return conv2
def corner_pool(x, dim, pool1, pool2, is_test=False, name=None):
p1_conv1 = _conv_norm(
x, 3, 128, pad=1, bn_act='relu', name=name + '_p1_conv1')
pool1 = pool1(p1_conv1, is_test=is_test, name=name + '_pool1')
p2_conv1 = _conv_norm(
x, 3, 128, pad=1, bn_act='relu', name=name + '_p2_conv1')
pool2 = pool2(p2_conv1, is_test=is_test, name=name + '_pool2')
conv2 = corner_output(x, pool1, pool2, dim, name)
return conv2
def gather_feat(feat, ind, batch_size=1):
feats = []
for bind in range(batch_size):
feat_b = feat[bind]
ind_b = ind[bind]
ind_b.stop_gradient = True
feat_bg = fluid.layers.gather(feat_b, ind_b)
feats.append(fluid.layers.unsqueeze(feat_bg, axes=[0]))
feat_g = fluid.layers.concat(feats, axis=0)
return feat_g
def mask_feat(feat, ind, batch_size=1):
feat_t = fluid.layers.transpose(feat, [0, 2, 3, 1])
C = feat_t.shape[3]
feat_r = fluid.layers.reshape(feat_t, [0, -1, C])
return gather_feat(feat_r, ind, batch_size)
def nms(heat):
hmax = fluid.layers.pool2d(heat, pool_size=3, pool_padding=1)
keep = fluid.layers.cast(heat == hmax, 'float32')
return heat * keep
def _topk(scores, batch_size, height, width, K):
scores_r = fluid.layers.reshape(scores, [batch_size, -1])
topk_scores, topk_inds = fluid.layers.topk(scores_r, K)
topk_clses = topk_inds / (height * width)
topk_inds = topk_inds % (height * width)
topk_ys = fluid.layers.cast(topk_inds / width, 'float32')
topk_xs = fluid.layers.cast(topk_inds % width, 'float32')
return topk_scores, topk_inds, topk_clses, topk_ys, topk_xs
def filter_scores(scores, index_list):
for ind in index_list:
tmp = scores * fluid.layers.cast((1 - ind), 'float32')
scores = tmp - fluid.layers.cast(ind, 'float32')
return scores
def decode(tl_heat,
br_heat,
tl_tag,
br_tag,
tl_regr,
br_regr,
ae_threshold=1,
num_dets=1000,
K=100,
batch_size=1):
shape = fluid.layers.shape(tl_heat)
H, W = shape[2], shape[3]
tl_heat = fluid.layers.sigmoid(tl_heat)
br_heat = fluid.layers.sigmoid(br_heat)
tl_heat_nms = nms(tl_heat)
br_heat_nms = nms(br_heat)
tl_scores, tl_inds, tl_clses, tl_ys, tl_xs = _topk(tl_heat_nms, batch_size,
H, W, K)
br_scores, br_inds, br_clses, br_ys, br_xs = _topk(br_heat_nms, batch_size,
H, W, K)
tl_ys = fluid.layers.expand(
fluid.layers.reshape(tl_ys, [-1, K, 1]), [1, 1, K])
tl_xs = fluid.layers.expand(
fluid.layers.reshape(tl_xs, [-1, K, 1]), [1, 1, K])
br_ys = fluid.layers.expand(
fluid.layers.reshape(br_ys, [-1, 1, K]), [1, K, 1])
br_xs = fluid.layers.expand(
fluid.layers.reshape(br_xs, [-1, 1, K]), [1, K, 1])
tl_regr = mask_feat(tl_regr, tl_inds, batch_size)
br_regr = mask_feat(br_regr, br_inds, batch_size)
tl_regr = fluid.layers.reshape(tl_regr, [-1, K, 1, 2])
br_regr = fluid.layers.reshape(br_regr, [-1, 1, K, 2])
tl_xs = tl_xs + tl_regr[:, :, :, 0]
tl_ys = tl_ys + tl_regr[:, :, :, 1]
br_xs = br_xs + br_regr[:, :, :, 0]
br_ys = br_ys + br_regr[:, :, :, 1]
bboxes = fluid.layers.stack([tl_xs, tl_ys, br_xs, br_ys], axis=-1)
tl_tag = mask_feat(tl_tag, tl_inds, batch_size)
br_tag = mask_feat(br_tag, br_inds, batch_size)
tl_tag = fluid.layers.expand(
fluid.layers.reshape(tl_tag, [-1, K, 1]), [1, 1, K])
br_tag = fluid.layers.expand(
fluid.layers.reshape(br_tag, [-1, 1, K]), [1, K, 1])
dists = fluid.layers.abs(tl_tag - br_tag)
tl_scores = fluid.layers.expand(
fluid.layers.reshape(tl_scores, [-1, K, 1]), [1, 1, K])
br_scores = fluid.layers.expand(
fluid.layers.reshape(br_scores, [-1, 1, K]), [1, K, 1])
scores = (tl_scores + br_scores) / 2.
tl_clses = fluid.layers.expand(
fluid.layers.reshape(tl_clses, [-1, K, 1]), [1, 1, K])
br_clses = fluid.layers.expand(
fluid.layers.reshape(br_clses, [-1, 1, K]), [1, K, 1])
cls_inds = fluid.layers.cast(tl_clses != br_clses, 'int32')
dist_inds = fluid.layers.cast(dists > ae_threshold, 'int32')
width_inds = fluid.layers.cast(br_xs < tl_xs, 'int32')
height_inds = fluid.layers.cast(br_ys < tl_ys, 'int32')
scores = filter_scores(scores,
[cls_inds, dist_inds, width_inds, height_inds])
scores = fluid.layers.reshape(scores, [-1, K * K])
scores, inds = fluid.layers.topk(scores, num_dets)
scores = fluid.layers.reshape(scores, [-1, num_dets, 1])
bboxes = fluid.layers.reshape(bboxes, [batch_size, -1, 4])
bboxes = gather_feat(bboxes, inds, batch_size)
clses = fluid.layers.reshape(tl_clses, [batch_size, -1, 1])
clses = gather_feat(clses, inds, batch_size)
tl_scores = fluid.layers.reshape(tl_scores, [batch_size, -1, 1])
tl_scores = gather_feat(tl_scores, inds, batch_size)
br_scores = fluid.layers.reshape(br_scores, [batch_size, -1, 1])
br_scores = gather_feat(br_scores, inds, batch_size)
bboxes = fluid.layers.cast(bboxes, 'float32')
clses = fluid.layers.cast(clses, 'float32')
return bboxes, scores, tl_scores, br_scores, clses
@register
class CornerHead(object):
"""
CornerNet head with corner_pooling
Args:
train_batch_size(int): batch_size in training process
test_batch_size(int): batch_size in test process, 1 by default
num_classes(int): num of classes, 80 by default
stack(int): stack of backbone, 2 by default
pull_weight(float): weight of pull_loss, 0.1 by default
push_weight(float): weight of push_loss, 0.1 by default
ae_threshold(float|int): threshold for valid distance of predicted tags, 1 by default
num_dets(int): num of detections, 1000 by default
top_k(int): choose top_k pair of corners in prediction, 100 by default
"""
__shared__ = ['num_classes', 'stack']
def __init__(self,
train_batch_size,
test_batch_size=1,
num_classes=80,
stack=2,
pull_weight=0.1,
push_weight=0.1,
ae_threshold=1,
num_dets=1000,
top_k=100):
self.train_batch_size = train_batch_size
self.test_batch_size = test_batch_size
self.num_classes = num_classes
self.stack = stack
self.pull_weight = pull_weight
self.push_weight = push_weight
self.ae_threshold = ae_threshold
self.num_dets = num_dets
self.K = top_k
self.tl_heats = []
self.br_heats = []
self.tl_tags = []
self.br_tags = []
self.tl_offs = []
self.br_offs = []
def pred_mod(self, x, dim, name=None):
conv0 = _conv_norm(
x, 1, 256, with_bn=False, bn_act='relu', name=name + '_0')
conv1 = fluid.layers.conv2d(
input=conv0,
filter_size=1,
num_filters=dim,
param_attr=ParamAttr(
name=name + "_1_weight", initializer=kaiming_init(conv0, 1)),
bias_attr=ParamAttr(
name=name + "_1_bias", initializer=Constant(-2.19)),
name=name + '_1')
return conv1
def get_output(self, input):
for ind in range(self.stack):
cnv = input[ind]
tl_modules = corner_pool(
cnv,
256,
cornerpool_lib.top_pool,
cornerpool_lib.left_pool,
name='tl_modules_' + str(ind))
br_modules = corner_pool(
cnv,
256,
cornerpool_lib.bottom_pool,
cornerpool_lib.right_pool,
name='br_modules_' + str(ind))
tl_heat = self.pred_mod(
tl_modules, self.num_classes, name='tl_heats_' + str(ind))
br_heat = self.pred_mod(
br_modules, self.num_classes, name='br_heats_' + str(ind))
tl_tag = self.pred_mod(tl_modules, 1, name='tl_tags_' + str(ind))
br_tag = self.pred_mod(br_modules, 1, name='br_tags_' + str(ind))
tl_off = self.pred_mod(tl_modules, 2, name='tl_offs_' + str(ind))
br_off = self.pred_mod(br_modules, 2, name='br_offs_' + str(ind))
self.tl_heats.append(tl_heat)
self.br_heats.append(br_heat)
self.tl_tags.append(tl_tag)
self.br_tags.append(br_tag)
self.tl_offs.append(tl_off)
self.br_offs.append(br_off)
def focal_loss(self, preds, gt, gt_masks):
preds_clip = []
none_pos = fluid.layers.cast(
fluid.layers.reduce_sum(gt_masks) == 0, 'float32')
none_pos.stop_gradient = True
min = fluid.layers.assign(np.array([1e-4], dtype='float32'))
max = fluid.layers.assign(np.array([1 - 1e-4], dtype='float32'))
for pred in preds:
pred_s = fluid.layers.sigmoid(pred)
pred_min = fluid.layers.elementwise_max(pred_s, min)
pred_max = fluid.layers.elementwise_min(pred_min, max)
preds_clip.append(pred_max)
ones = fluid.layers.ones_like(gt)
fg_map = fluid.layers.cast(gt == ones, 'float32')
fg_map.stop_gradient = True
num_pos = fluid.layers.reduce_sum(fg_map)
min_num = fluid.layers.ones_like(num_pos)
num_pos = fluid.layers.elementwise_max(num_pos, min_num)
num_pos.stop_gradient = True
bg_map = fluid.layers.cast(gt < ones, 'float32')
bg_map.stop_gradient = True
neg_weights = fluid.layers.pow(1 - gt, 4) * bg_map
neg_weights.stop_gradient = True
loss = fluid.layers.assign(np.array([0], dtype='float32'))
for ind, pred in enumerate(preds_clip):
pos_loss = fluid.layers.log(pred) * fluid.layers.pow(1 - pred,
2) * fg_map
neg_loss = fluid.layers.log(1 - pred) * fluid.layers.pow(
pred, 2) * neg_weights
pos_loss = fluid.layers.reduce_sum(pos_loss)
neg_loss = fluid.layers.reduce_sum(neg_loss)
focal_loss_ = (neg_loss + pos_loss) / (num_pos + none_pos)
loss -= focal_loss_
return loss
def ae_loss(self, tl_tag, br_tag, gt_masks):
num = fluid.layers.reduce_sum(gt_masks, dim=1)
num_stop_gradient = True
tag0 = fluid.layers.squeeze(tl_tag, [2])
tag1 = fluid.layers.squeeze(br_tag, [2])
tag_mean = (tag0 + tag1) / 2
tag0 = fluid.layers.pow(tag0 - tag_mean, 2) / (num + 1e-4) * gt_masks
tag1 = fluid.layers.pow(tag1 - tag_mean, 2) / (num + 1e-4) * gt_masks
tag0 = fluid.layers.reduce_sum(tag0)
tag1 = fluid.layers.reduce_sum(tag1)
pull = tag0 + tag1
mask_1 = fluid.layers.expand(
fluid.layers.unsqueeze(gt_masks, [1]), [1, gt_masks.shape[1], 1])
mask_2 = fluid.layers.expand(
fluid.layers.unsqueeze(gt_masks, [2]), [1, 1, gt_masks.shape[1]])
mask = fluid.layers.cast((mask_1 + mask_2) == 2, 'float32')
mask.stop_gradient = True
num2 = (num - 1) * num
num2.stop_gradient = True
tag_mean_1 = fluid.layers.expand(
fluid.layers.unsqueeze(tag_mean, [1]), [1, tag_mean.shape[1], 1])
tag_mean_2 = fluid.layers.expand(
fluid.layers.unsqueeze(tag_mean, [2]), [1, 1, tag_mean.shape[1]])
dist = tag_mean_1 - tag_mean_2
dist = 1 - fluid.layers.abs(dist)
dist = fluid.layers.relu(dist)
dist = dist - 1 / (num + 1e-4)
dist = dist / (num2 + 1e-4)
dist = dist * mask
push = fluid.layers.reduce_sum(dist)
return pull, push
def off_loss(self, off, gt_off, gt_masks):
mask = fluid.layers.unsqueeze(gt_masks, [2])
mask = fluid.layers.expand_as(mask, gt_off)
mask.stop_gradient = True
off_loss = fluid.layers.smooth_l1(off, gt_off, mask, mask)
off_loss = fluid.layers.reduce_sum(off_loss)
total_num = fluid.layers.reduce_sum(gt_masks)
total_num.stop_gradient = True
return off_loss / (total_num + 1e-4)
def get_loss(self, targets):
gt_tl_heat = targets['tl_heatmaps']
gt_br_heat = targets['br_heatmaps']
gt_masks = targets['tag_masks']
gt_tl_off = targets['tl_regrs']
gt_br_off = targets['br_regrs']
gt_tl_ind = targets['tl_tags']
gt_br_ind = targets['br_tags']
gt_masks = fluid.layers.cast(gt_masks, 'float32')
focal_loss = 0
focal_loss_ = self.focal_loss(self.tl_heats, gt_tl_heat, gt_masks)
focal_loss += focal_loss_
focal_loss_ = self.focal_loss(self.br_heats, gt_br_heat, gt_masks)
focal_loss += focal_loss_
pull_loss = 0
push_loss = 0
ones = fluid.layers.assign(np.array([1], dtype='float32'))
tl_tags = [
mask_feat(tl_tag, gt_tl_ind, self.train_batch_size)
for tl_tag in self.tl_tags
]
br_tags = [
mask_feat(br_tag, gt_br_ind, self.train_batch_size)
for br_tag in self.br_tags
]
pull_loss, push_loss = 0, 0
for tl_tag, br_tag in zip(tl_tags, br_tags):
pull, push = self.ae_loss(tl_tag, br_tag, gt_masks)
pull_loss += pull
push_loss += push
tl_offs = [
mask_feat(tl_off, gt_tl_ind, self.train_batch_size)
for tl_off in self.tl_offs
]
br_offs = [
mask_feat(br_off, gt_br_ind, self.train_batch_size)
for br_off in self.br_offs
]
off_loss = 0
for tl_off, br_off in zip(tl_offs, br_offs):
off_loss += self.off_loss(tl_off, gt_tl_off, gt_masks)
off_loss += self.off_loss(br_off, gt_br_off, gt_masks)
pull_loss = self.pull_weight * pull_loss
push_loss = self.push_weight * push_loss
loss = (
focal_loss + pull_loss + push_loss + off_loss) / len(self.tl_heats)
return {'loss': loss}
def get_prediction(self, input):
ind = self.stack - 1
tl_modules = corner_pool(
input,
256,
cornerpool_lib.top_pool,
cornerpool_lib.left_pool,
is_test=True,
name='tl_modules_' + str(ind))
br_modules = corner_pool(
input,
256,
cornerpool_lib.bottom_pool,
cornerpool_lib.right_pool,
is_test=True,
name='br_modules_' + str(ind))
tl_heat = self.pred_mod(
tl_modules, self.num_classes, name='tl_heats_' + str(ind))
br_heat = self.pred_mod(
br_modules, self.num_classes, name='br_heats_' + str(ind))
tl_tag = self.pred_mod(tl_modules, 1, name='tl_tags_' + str(ind))
br_tag = self.pred_mod(br_modules, 1, name='br_tags_' + str(ind))
tl_off = self.pred_mod(tl_modules, 2, name='tl_offs_' + str(ind))
br_off = self.pred_mod(br_modules, 2, name='br_offs_' + str(ind))
return decode(tl_heat, br_heat, tl_tag, br_tag, tl_off, br_off,
self.ae_threshold, self.num_dets, self.K,
self.test_batch_size)
......@@ -25,6 +25,7 @@ from . import retinanet
from . import blazeface
from . import faceboxes
from . import fcos
from . import cornernet_squeeze
from .faster_rcnn import *
from .mask_rcnn import *
......@@ -37,3 +38,4 @@ from .retinanet import *
from .blazeface import *
from .faceboxes import *
from .fcos import *
from .cornernet_squeeze import *
......@@ -408,7 +408,7 @@ class CascadeMaskRCNN(object):
box_fields = ['bbox', 'bbox_flip'] if use_flip else ['bbox']
for key in box_fields:
inputs_def[key] = {
'shape': [6],
'shape': [None, 6],
'dtype': 'float32',
'lod_level': 1
}
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from paddle import fluid
from ppdet.core.workspace import register
import numpy as np
__all__ = ['CornerNetSqueeze']
def rescale_bboxes(bboxes, ratios, borders):
x1, y1, x2, y2 = fluid.layers.split(bboxes, 4)
x1 = x1 / ratios[:, 1] - borders[:, 2]
x2 = x2 / ratios[:, 1] - borders[:, 2]
y1 = y1 / ratios[:, 0] - borders[:, 0]
y2 = y2 / ratios[:, 0] - borders[:, 0]
return fluid.layers.concat([x1, y1, x2, y2], axis=2)
@register
class CornerNetSqueeze(object):
"""
"""
__category__ = 'architecture'
__inject__ = ['backbone', 'corner_head', 'fpn']
__shared__ = ['num_classes']
def __init__(self,
backbone,
corner_head='CornerHead',
num_classes=80,
fpn=None):
super(CornerNetSqueeze, self).__init__()
self.backbone = backbone
self.corner_head = corner_head
self.num_classes = num_classes
self.fpn = fpn
def build(self, feed_vars, mode='train'):
im = feed_vars['image']
body_feats = self.backbone(im)
if self.fpn is not None:
body_feats, _ = self.fpn.get_output(body_feats)
body_feats = [body_feats.values()[-1]]
if mode == 'train':
target_vars = [
'tl_heatmaps', 'br_heatmaps', 'tag_masks', 'tl_regrs',
'br_regrs', 'tl_tags', 'br_tags'
]
target = {key: feed_vars[key] for key in target_vars}
self.corner_head.get_output(body_feats)
loss = self.corner_head.get_loss(target)
return loss
elif mode == 'test':
ratios = feed_vars['ratios']
borders = feed_vars['borders']
bboxes, scores, tl_scores, br_scores, clses = self.corner_head.get_prediction(
body_feats[-1])
bboxes = rescale_bboxes(bboxes, ratios, borders)
detections = fluid.layers.concat([clses, scores, bboxes], axis=2)
detections = detections[0]
return {'bbox': detections}
def _inputs_def(self, image_shape, output_size, max_tag_len):
im_shape = [None] + image_shape
C = self.num_classes
# yapf: disable
inputs_def = {
'image': {'shape': im_shape, 'dtype': 'float32', 'lod_level': 0},
'im_id': {'shape': [None, 1], 'dtype': 'int64', 'lod_level': 0},
'gt_bbox': {'shape': [None, 4], 'dtype': 'float32', 'lod_level': 1},
'gt_class': {'shape': [None, 1], 'dtype': 'int32', 'lod_level': 1},
'ratios': {'shape': [None, 2], 'dtype': 'float32', 'lod_level': 0},
'borders': {'shape': [None, 4], 'dtype': 'float32', 'lod_level': 0},
'tl_heatmaps': {'shape': [None, C, output_size, output_size], 'dtype': 'float32', 'lod_level': 0},
'br_heatmaps': {'shape': [None, C, output_size, output_size], 'dtype': 'float32', 'lod_level': 0},
'tl_regrs': {'shape': [None, max_tag_len, 2], 'dtype': 'float32', 'lod_level': 0},
'br_regrs': {'shape': [None, max_tag_len, 2], 'dtype': 'float32', 'lod_level': 0},
'tl_tags': {'shape': [None, max_tag_len], 'dtype': 'int64', 'lod_level': 0},
'br_tags': {'shape': [None, max_tag_len], 'dtype': 'int64', 'lod_level': 0},
'tag_masks': {'shape': [None, max_tag_len], 'dtype': 'int32', 'lod_level': 0},
}
# yapf: enable
return inputs_def
def build_inputs(
self,
image_shape=[3, None, None],
fields=[
'image', 'im_id', 'gt_box', 'gt_class', 'tl_heatmaps',
'br_heatmaps', 'tl_regrs', 'br_regrs', 'tl_tags', 'br_tags',
'tag_masks'
], # for train
output_size=64,
max_tag_len=128,
use_dataloader=True,
iterable=False):
inputs_def = self._inputs_def(image_shape, output_size, max_tag_len)
feed_vars = OrderedDict([(key, fluid.data(
name=key,
shape=inputs_def[key]['shape'],
dtype=inputs_def[key]['dtype'],
lod_level=inputs_def[key]['lod_level'])) for key in fields])
loader = fluid.io.DataLoader.from_generator(
feed_list=list(feed_vars.values()),
capacity=64,
use_double_buffer=True,
iterable=iterable) if use_dataloader else None
return feed_vars, loader
def train(self, feed_vars):
return self.build(feed_vars, mode='train')
def eval(self, feed_vars):
return self.build(feed_vars, mode='test')
def test(self, feed_vars):
return self.build(feed_vars, mode='test')
......@@ -311,7 +311,7 @@ class MaskRCNN(object):
box_fields = ['bbox', 'bbox_flip'] if use_flip else ['bbox']
for key in box_fields:
inputs_def[key] = {
'shape': [6],
'shape': [None, 6],
'dtype': 'float32',
'lod_level': 1
}
......
......@@ -29,6 +29,7 @@ from . import res2net
from . import hrnet
from . import hrfpn
from . import bfp
from . import hourglass
from .resnet import *
from .resnext import *
......@@ -45,3 +46,4 @@ from .res2net import *
from .hrnet import *
from .hrfpn import *
from .bfp import *
from .hourglass import *
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Uniform
import functools
from ppdet.core.workspace import register
from .resnet import ResNet
import math
__all__ = ['Hourglass']
def kaiming_init(input, filter_size):
fan_in = input.shape[1]
std = (1.0 / (fan_in * filter_size * filter_size))**0.5
return Uniform(0. - std, std)
def _conv_norm(x,
k,
out_dim,
stride=1,
pad=0,
groups=None,
with_bn=True,
bn_act=None,
ind=None,
name=None):
conv_name = "_conv" if ind is None else "_conv" + str(ind)
bn_name = "_bn" if ind is None else "_bn" + str(ind)
conv = fluid.layers.conv2d(
input=x,
filter_size=k,
num_filters=out_dim,
stride=stride,
padding=pad,
groups=groups,
param_attr=ParamAttr(
name=name + conv_name + "_weight", initializer=kaiming_init(x, k)),
bias_attr=ParamAttr(
name=name + conv_name + "_bias", initializer=kaiming_init(x, k))
if not with_bn else False,
name=name + '_output')
if with_bn:
pattr = ParamAttr(name=name + bn_name + '_weight')
battr = ParamAttr(name=name + bn_name + '_bias')
out = fluid.layers.batch_norm(
input=conv,
act=bn_act,
name=name + '_bn_output',
param_attr=pattr,
bias_attr=battr,
moving_mean_name=name + bn_name + '_running_mean',
moving_variance_name=name + bn_name +
'_running_var') if with_bn else conv
else:
out = fluid.layers.relu(conv)
return out
def residual_block(x, out_dim, k=3, stride=1, name=None):
p = (k - 1) // 2
conv1 = _conv_norm(
x, k, out_dim, pad=p, stride=stride, bn_act='relu', ind=1, name=name)
conv2 = _conv_norm(conv1, k, out_dim, pad=p, ind=2, name=name)
skip = _conv_norm(
x, 1, out_dim, stride=stride,
name=name + '_skip') if stride != 1 or x.shape[1] != out_dim else x
return fluid.layers.elementwise_add(
x=skip, y=conv2, act='relu', name=name + "_add")
def fire_block(x, out_dim, sr=2, stride=1, name=None):
conv1 = _conv_norm(x, 1, out_dim // sr, ind=1, name=name)
conv_1x1 = fluid.layers.conv2d(
conv1,
filter_size=1,
num_filters=out_dim // 2,
stride=stride,
param_attr=ParamAttr(
name=name + "_conv_1x1_weight", initializer=kaiming_init(conv1, 1)),
bias_attr=False,
name=name + '_conv_1x1')
conv_3x3 = fluid.layers.conv2d(
conv1,
filter_size=3,
num_filters=out_dim // 2,
stride=stride,
padding=1,
groups=out_dim // sr,
param_attr=ParamAttr(
name=name + "_conv_3x3_weight", initializer=kaiming_init(conv1, 3)),
bias_attr=False,
name=name + '_conv_3x3',
use_cudnn=False)
conv2 = fluid.layers.concat(
[conv_1x1, conv_3x3], axis=1, name=name + '_conv2')
pattr = ParamAttr(name=name + '_bn2_weight')
battr = ParamAttr(name=name + '_bn2_bias')
bn2 = fluid.layers.batch_norm(
input=conv2,
name=name + '_bn2',
param_attr=pattr,
bias_attr=battr,
moving_mean_name=name + '_bn2_running_mean',
moving_variance_name=name + '_bn2_running_var')
if stride == 1 and x.shape[1] == out_dim:
return fluid.layers.elementwise_add(
x=bn2, y=x, act='relu', name=name + "_add_relu")
else:
return fluid.layers.relu(bn2, name="_relu")
def make_layer(x, in_dim, out_dim, modules, block, name=None):
layers = block(x, out_dim, name=name + '_0')
for i in range(1, modules):
layers = block(layers, out_dim, name=name + '_' + str(i))
return layers
def make_hg_layer(x, in_dim, out_dim, modules, block, name=None):
layers = block(x, out_dim, stride=2, name=name + '_0')
for i in range(1, modules):
layers = block(layers, out_dim, name=name + '_' + str(i))
return layers
def make_layer_revr(x, in_dim, out_dim, modules, block, name=None):
for i in range(modules - 1):
x = block(x, in_dim, name=name + '_' + str(i))
layers = block(x, out_dim, name=name + '_' + str(modules - 1))
return layers
def make_unpool_layer(x, dim, name=None):
pattr = ParamAttr(name=name + '_weight', initializer=kaiming_init(x, 4))
battr = ParamAttr(name=name + '_bias', initializer=kaiming_init(x, 4))
layer = fluid.layers.conv2d_transpose(
input=x,
num_filters=dim,
filter_size=4,
stride=2,
padding=1,
param_attr=pattr,
bias_attr=battr)
return layer
@register
class Hourglass(object):
"""
Hourglass Network, see https://arxiv.org/abs/1603.06937
Args:
stack (int): stack of hourglass, 2 by default
dims (list): dims of each level in hg_module
modules (list): num of modules in each level
"""
__shared__ = ['stack']
def __init__(self,
stack=2,
dims=[256, 256, 384, 384, 512],
modules=[2, 2, 2, 2, 4],
block_name='fire'):
super(Hourglass, self).__init__()
self.stack = stack
assert len(dims) == len(modules), \
"Expected len of dims equal to len of modules, Receiced len of "\
"dims: {}, len of modules: {}".format(len(dims), len(modules))
self.dims = dims
self.modules = modules
self.num_level = len(dims) - 1
block_dict = {'fire': fire_block}
self.block = block_dict[block_name]
def __call__(self, input, name='hg'):
inter = self.pre(input, name + '_pre')
cnvs = []
for ind in range(self.stack):
hg = self.hg_module(
inter,
self.num_level,
self.dims,
self.modules,
name=name + '_hgs_' + str(ind))
cnv = _conv_norm(
hg,
3,
256,
bn_act='relu',
pad=1,
name=name + '_cnvs_' + str(ind))
cnvs.append(cnv)
if ind < self.stack - 1:
inter = _conv_norm(
inter, 1, 256, name=name + '_inters__' +
str(ind)) + _conv_norm(
cnv, 1, 256, name=name + '_cnvs__' + str(ind))
inter = fluid.layers.relu(inter)
inter = residual_block(
inter, 256, name=name + '_inters_' + str(ind))
return cnvs
def pre(self, x, name=None):
conv = _conv_norm(
x, 7, 128, stride=2, pad=3, bn_act='relu', name=name + '_0')
res1 = residual_block(conv, 256, stride=2, name=name + '_1')
res2 = residual_block(res1, 256, stride=2, name=name + '_2')
return res2
def hg_module(self,
x,
n=4,
dims=[256, 256, 384, 384, 512],
modules=[2, 2, 2, 2, 4],
make_up_layer=make_layer,
make_hg_layer=make_hg_layer,
make_low_layer=make_layer,
make_hg_layer_revr=make_layer_revr,
make_unpool_layer=make_unpool_layer,
name=None):
curr_mod = modules[0]
next_mod = modules[1]
curr_dim = dims[0]
next_dim = dims[1]
up1 = make_up_layer(
x, curr_dim, curr_dim, curr_mod, self.block, name=name + '_up1')
max1 = x
low1 = make_hg_layer(
max1, curr_dim, next_dim, curr_mod, self.block, name=name + '_low1')
low2 = self.hg_module(
low1,
n - 1,
dims[1:],
modules[1:],
make_up_layer=make_up_layer,
make_hg_layer=make_hg_layer,
make_low_layer=make_low_layer,
make_hg_layer_revr=make_hg_layer_revr,
make_unpool_layer=make_unpool_layer,
name=name + '_low2') if n > 1 else make_low_layer(
low1,
next_dim,
next_dim,
next_mod,
self.block,
name=name + '_low2')
low3 = make_hg_layer_revr(
low2, next_dim, curr_dim, curr_mod, self.block, name=name + '_low3')
up2 = make_unpool_layer(low3, curr_dim, name=name + '_up2')
merg = fluid.layers.elementwise_add(x=up1, y=up2, name=name + '_merg')
return merg
......@@ -378,7 +378,7 @@ class MultiClassSoftNMS(object):
fluid.default_main_program(),
name='softnms_pred_result',
dtype='float32',
shape=[6],
shape=[-1, 6],
lod_level=1)
fluid.layers.py_func(
func=_soft_nms, x=[bboxes, scores], out=pred_result)
......
......@@ -148,10 +148,12 @@ class OptimizerBuilder():
self.optimizer = optimizer
def __call__(self, learning_rate):
if self.regularizer:
reg_type = self.regularizer['type'] + 'Decay'
reg_factor = self.regularizer['factor']
regularization = getattr(regularizer, reg_type)(reg_factor)
else:
regularization = None
optim_args = self.optimizer.copy()
optim_type = optim_args['type']
del optim_args['type']
......
......@@ -230,9 +230,10 @@ def bbox2out(results, clsid2catid, is_bbox_normalized=False):
xywh_res = []
for t in results:
bboxes = t['bbox'][0]
if len(t['bbox'][1]) == 0: continue
lengths = t['bbox'][1][0]
im_ids = np.array(t['im_id'][0]).flatten()
if bboxes.shape == (1, 1) or bboxes is None:
if bboxes.shape == (1, 1) or bboxes is None or len(bboxes) == 0:
continue
k = 0
......
......@@ -135,7 +135,8 @@ def eval_run(exe,
mask_multi_scale_test = multi_scale_test and 'Mask' in cfg.architecture
if multi_scale_test:
post_res = mstest_box_post_process(res, cfg)
post_res = mstest_box_post_process(res, multi_scale_test,
cfg.num_classes)
res.update(post_res)
if mask_multi_scale_test:
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
......@@ -156,10 +157,16 @@ def eval_run(exe,
if 'mask' in res:
from ppdet.utils.post_process import mask_encode
res['mask'] = mask_encode(res, resolution)
post_config = getattr(cfg, 'PostProcess', None)
if 'Corner' in cfg.architecture and post_config is not None:
from ppdet.utils.post_process import corner_post_process
corner_post_process(res, post_config, cfg.num_classes)
results.append(res)
if iter_id % 100 == 0:
logger.info('Test iter {}'.format(iter_id))
iter_id += 1
if len(res['bbox'][1]) == 0:
has_bbox = False
images_num += len(res['bbox'][1][0]) if has_bbox else 1
except (StopIteration, fluid.core.EOFException):
loader.reset()
......
......@@ -38,7 +38,7 @@ def box_flip(boxes, im_shape):
def nms(dets, thresh):
"""Apply classic DPM-style greedy NMS."""
if dets.shape[0] == 0:
return []
return dets[[], :]
scores = dets[:, 0]
x1 = dets[:, 1]
y1 = dets[:, 2]
......@@ -86,8 +86,40 @@ def nms(dets, thresh):
ovr = inter / (iarea + areas[j] - inter)
if ovr >= thresh:
suppressed[j] = 1
keep = np.where(suppressed == 0)[0]
dets = dets[keep, :]
return dets
return np.where(suppressed == 0)[0]
def soft_nms(dets, sigma, thres):
dets_final = []
while len(dets) > 0:
maxpos = np.argmax(dets[:, 0])
dets_final.append(dets[maxpos].copy())
ts, tx1, ty1, tx2, ty2 = dets[maxpos]
scores = dets[:, 0]
# force remove bbox at maxpos
scores[maxpos] = -1
x1 = dets[:, 1]
y1 = dets[:, 2]
x2 = dets[:, 3]
y2 = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
xx1 = np.maximum(tx1, x1)
yy1 = np.maximum(ty1, y1)
xx2 = np.minimum(tx2, x2)
yy2 = np.minimum(ty2, y2)
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas + areas[maxpos] - inter)
weight = np.exp(-(ovr * ovr) / sigma)
scores = scores * weight
idx_keep = np.where(scores >= thres)
dets[:, 0] = scores
dets = dets[idx_keep]
dets_final = np.array(dets_final).reshape(-1, 5)
return dets_final
def bbox_area(box):
......@@ -128,39 +160,49 @@ def box_voting(nms_dets, dets, vote_thresh):
return top_dets
def get_nms_result(boxes, scores, cfg):
cls_boxes = [[] for _ in range(cfg.num_classes)]
for j in range(1, cfg.num_classes):
inds = np.where(scores[:, j] > cfg.MultiScaleTEST['score_thresh'])[0]
scores_j = scores[inds, j]
boxes_j = boxes[inds, j * 4:(j + 1) * 4]
def get_nms_result(boxes,
scores,
config,
num_classes,
background_label=0,
labels=None):
has_labels = labels is not None
cls_boxes = [[] for _ in range(num_classes)]
start_idx = 1 if background_label == 0 else 0
for j in range(start_idx, num_classes):
inds = np.where(labels == j)[0] if has_labels else np.where(
scores[:, j] > config['score_thresh'])[0]
scores_j = scores[inds] if has_labels else scores[inds, j]
boxes_j = boxes[inds, :] if has_labels else boxes[inds, j * 4:(j + 1) *
4]
dets_j = np.hstack((scores_j[:, np.newaxis], boxes_j)).astype(
np.float32, copy=False)
keep = nms(dets_j, cfg.MultiScaleTEST['nms_thresh'])
nms_dets = dets_j[keep, :]
if cfg.MultiScaleTEST['enable_voting']:
nms_dets = box_voting(nms_dets, dets_j,
cfg.MultiScaleTEST['vote_thresh'])
if config.get('use_soft_nms', False):
nms_dets = soft_nms(dets_j, config['sigma'], config['nms_thresh'])
else:
nms_dets = nms(dets_j, config['nms_thresh'])
if config.get('enable_voting', False):
nms_dets = box_voting(nms_dets, dets_j, config['vote_thresh'])
#add labels
label = np.array([j for _ in range(len(keep))])
label = np.array([j for _ in range(len(nms_dets))])
nms_dets = np.hstack((label[:, np.newaxis], nms_dets)).astype(
np.float32, copy=False)
cls_boxes[j] = nms_dets
# Limit to max_per_image detections **over all classes**
image_scores = np.hstack(
[cls_boxes[j][:, 1] for j in range(1, cfg.num_classes)])
if len(image_scores) > cfg.MultiScaleTEST['detections_per_im']:
image_thresh = np.sort(image_scores)[-cfg.MultiScaleTEST[
'detections_per_im']]
for j in range(1, cfg.num_classes):
[cls_boxes[j][:, 1] for j in range(start_idx, num_classes)])
if len(image_scores) > config['detections_per_im']:
image_thresh = np.sort(image_scores)[-config['detections_per_im']]
for j in range(start_idx, num_classes):
keep = np.where(cls_boxes[j][:, 1] >= image_thresh)[0]
cls_boxes[j] = cls_boxes[j][keep, :]
im_results = np.vstack([cls_boxes[j] for j in range(1, cfg.num_classes)])
im_results = np.vstack(
[cls_boxes[j] for j in range(start_idx, num_classes)])
return im_results
def mstest_box_post_process(result, cfg):
def mstest_box_post_process(result, config, num_classes):
"""
Multi-scale Test
Only available for batch_size=1 now.
......@@ -173,7 +215,7 @@ def mstest_box_post_process(result, cfg):
for k in result.keys():
if 'bbox' in k:
boxes = result[k][0]
boxes = np.reshape(boxes, (-1, 4 * cfg.num_classes))
boxes = np.reshape(boxes, (-1, 4 * num_classes))
scores = result['score' + k[4:]][0]
if 'flip' in k:
boxes = box_flip(boxes, im_shape)
......@@ -183,7 +225,7 @@ def mstest_box_post_process(result, cfg):
ms_boxes = np.concatenate(ms_boxes)
ms_scores = np.concatenate(ms_scores)
bbox_pred = get_nms_result(ms_boxes, ms_scores, cfg)
bbox_pred = get_nms_result(ms_boxes, ms_scores, config, num_classes)
post_bbox.update({'bbox': (bbox_pred, [[len(bbox_pred)]])})
if use_flip:
bbox = bbox_pred[:, 2:]
......@@ -271,3 +313,15 @@ def mask_encode(results, resolution, thresh_binarize=0.5):
im_mask[:, :, np.newaxis], order='F'))[0]
segms.append(segm)
return segms
def corner_post_process(results, config, num_classes):
detections = results['bbox'][0]
keep_inds = (detections[:, 1] > -1)
detections = detections[keep_inds]
labels = detections[:, 0]
scores = detections[:, 1]
boxes = detections[:, 2:6]
cls_boxes = get_nms_result(
boxes, scores, config, num_classes, background_label=-1, labels=labels)
results.update({'bbox': (cls_boxes, [[len(cls_boxes)]])})
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册