未验证 提交 eb8b4899 编写于 作者: Q qingqing01 提交者: GitHub

Clean code (#1895)

* Clean code
* rm py_op/post_processing.py and utils/data_structure.py
上级 ad353419
简体中文 | [English](README_en.md)
文档:[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
# PaddleDetection
飞桨推出的PaddleDetection是端到端目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的训练、精度速度优化到部署全流程。PaddleDetection以模块化的设计实现了多种主流目标检测算法,并且提供了丰富的数据增强、网络组件、损失函数等模块,集成了模型压缩和跨平台高性能部署能力。目前基于PaddleDetection已经完成落地的项目涉及工业质检、遥感图像检测、无人巡检等多个领域。
**目前检测库下模型均要求使用PaddlePaddle 1.7及以上版本或适当的develop版本。**
<div align="center">
<img src="docs/images/000000570688.jpg" />
</div>
## 简介
特性:
- 模型丰富:
PaddleDetection提供了丰富的模型,包含目标检测、实例分割、人脸检测等100+个预训练模型,涵盖多种数据集竞赛冠军方案、适合云端/边缘端设备部署的检测方案。
- 易部署:
PaddleDetection的模型中使用的核心算子均通过C++或CUDA实现,同时基于PaddlePaddle的高性能推理引擎可以方便地部署在多种硬件平台上。
- 高灵活度:
PaddleDetection通过模块化设计来解耦各个组件,基于配置文件可以轻松地搭建各种检测模型。
- 高性能:
基于PaddlePaddle框架的高性能内核,在模型训练速度、显存占用上有一定的优势。例如,YOLOv3的训练速度快于其他框架,在Tesla V100 16GB环境下,Mask-RCNN(ResNet50)可以单卡Batch Size可以达到4 (甚至到5)。
动态图版本的PaddleDetection, 支持的模型:
支持的模型结构:
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet | HRNet | Res2Net |
|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:------:| :--: |
| Faster R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ |
| Mask R-CNN | ✓ | ✓ | x | ✓ | ✗ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ |
| Cascade Faster-RCNN | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Cascade Mask-RCNN | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| Libra R-CNN | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| YOLOv3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ |
| BlazeFace | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Faceboxes | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) 模型预测速度基本不变的情况下提高了精度。
更多的模型:
- EfficientDet
- FCOS
- CornerNet-Squeeze
- YOLOv4
更多的Backone:
- DarkNet
- VGG
- GCNet
- CBNet
- Hourglass
- Faster-RCNN (FPN)
- Mask-RCNN (FPN)
- Cascade RCNN
- YOLOv3
扩展特性:
......@@ -74,45 +13,15 @@
- [x] **Group Norm**
- [x] **Modulated Deformable Convolution**
- [x] **Deformable PSRoI Pooling**
- [x] **Non-local和GCNet**
**注意:** Synchronized batch normalization 只能在多GPU环境下使用,不能在CPU环境或者单GPU环境下使用。
以下为选取各模型结构和骨干网络的代表模型COCO数据集精度mAP和单卡Tesla V100上预测速度(FPS)关系图。
<div align="center">
<img src="docs/images/map_fps.png" />
</div>
**说明:**
- `CBResNet``Cascade-Faster-RCNN-CBResNet200vd-FPN`模型,COCO数据集mAP高达53.3%
- `Cascade-Faster-RCNN``Cascade-Faster-RCNN-ResNet50vd-DCN`,PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS
- PaddleDetection增强版`YOLOv3-ResNet50vd-DCN`在COCO数据集mAP高于原作10.6个绝对百分点,推理速度为61.3FPS,快于原作约70%
- 图中模型均可在[模型库](#模型库)中获取
## 文档教程
### 入门教程
### 教程
- [安装说明](docs/tutorials/INSTALL_cn.md)
- [快速开始](docs/tutorials/QUICK_STARTED_cn.md)
- [训练/评估/预测流程](docs/tutorials/GETTING_STARTED_cn.md)
- [常见问题汇总](docs/FAQ.md)
### 进阶教程
- [数据预处理及自定义数据集](docs/advanced_tutorials/READER.md)
- [搭建模型步骤](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- [模型参数配置](docs/advanced_tutorials/config_doc):
- [配置模块设计和介绍](docs/advanced_tutorials/config_doc/CONFIG_cn.md)
- [RCNN模型参数说明](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md)
- [迁移学习教程](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md)
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [模型压缩](slim)
- [压缩benchmark](slim)
- [量化](slim/quantization)
- [剪枝](slim/prune)
- [蒸馏](slim/distillation)
- [神经网络搜索](slim/nas)
- [推理部署](deploy)
- [模型导出教程](docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
- [Python端推理部署](deploy/python)
......@@ -120,25 +29,7 @@
- [推理Benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md)
## 模型库
- [模型库](docs/MODEL_ZOO_cn.md)
- [移动端模型](configs/mobile/README.md)
- [Anchor free模型](configs/anchor_free/README.md)
- [人脸检测模型](docs/featured_model/FACE_DETECTION.md)
- [YOLOv3增强模型](docs/featured_model/YOLOv3_ENHANCEMENT.md): COCO mAP高达43.6%,原论文精度为33.0%
- [行人检测预训练模型](docs/featured_model/CONTRIB_cn.md)
- [车辆检测预训练模型](docs/featured_model/CONTRIB_cn.md)
- [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
- [服务器端实用目标检测模型](configs/rcnn_enhance/README.md): V100上速度20FPS时,COCO mAP高达47.8%。
## 许可证书
本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
## 版本更新
v0.3.0版本已经在`05/2020`发布,增加Anchor-free、EfficientDet和YOLOv4等多个模型,推出移动端、服务器端实用高效多个模型,例如移动端将YOLOv3-MobileNetv3加速3.5倍,服务器端优化两阶段模型,速度和精度具备较高性价比。重构预测部署功能,提升易用性,修复已知诸多bug等,详细内容请参考[版本更新文档](docs/CHANGELOG.md)
## 如何贡献代码
我们非常欢迎你可以为PaddleDetection提供代码,也十分感谢你的反馈。
# All rights `PaddleDetection` reserved
# References:
# @TechReport{fddbTech,
# author = {Vidit Jain and Erik Learned-Miller},
# title = {FDDB: A Benchmark for Face Detection in Unconstrained Settings},
# institution = {University of Massachusetts, Amherst},
# year = {2010},
# number = {UM-CS-2010-009}
# }
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
# external link to the Faces in the Wild data set and annotations file
wget http://tamaraberg.com/faceDataset/originalPics.tar.gz
wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz
wget http://vis-www.cs.umass.edu/fddb/evaluation.tgz
# Extract the data.
echo "Extracting..."
tar -zxf originalPics.tar.gz
tar -zxf FDDB-folds.tgz
tar -zxf evaluation.tgz
# Generate full image path list and groundtruth in FDDB-folds:
cd FDDB-folds
cat `ls|grep -v"ellipse"` > filePath.txt && cat *ellipse* > fddb_annotFile.txt
cd ..
echo "------------- All done! --------------"
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os.path as osp
import logging
# add python path of PadleDetection to sys.path
parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3)))
if parent_path not in sys.path:
sys.path.append(parent_path)
from ppdet.utils.download import download_dataset
logging.basicConfig(level=logging.INFO)
download_path = osp.split(osp.realpath(sys.argv[0]))[0]
download_dataset(download_path, 'fruit')
# All rights `PaddleDetection` reserved
# References:
# @inproceedings{yang2016wider,
# Author = {Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou},
# Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
# Title = {WIDER FACE: A Face Detection Benchmark},
# Year = {2016}}
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
# Download the data.
echo "Downloading..."
wget https://dataset.bj.bcebos.com/wider_face/WIDER_train.zip
wget https://dataset.bj.bcebos.com/wider_face/WIDER_val.zip
wget https://dataset.bj.bcebos.com/wider_face/wider_face_split.zip
# Extract the data.
echo "Extracting..."
unzip WIDER_train.zip
unzip WIDER_val.zip
unzip wider_face_split.zip
TrainReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_mask']
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
sample_num: 10
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !RandomFlipImage
is_mask_flip: true
is_normalized: false
prob: 0.5
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 1333
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: false
batch_size: 1
shuffle: true
worker_num: 2
drop_last: false
use_process: false
EvalReader:
inputs_def:
fields: ['image', 'im_info', 'im_id']
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
sample_num: 10
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]
- !ResizeImage
interp: 1
max_size: 1333
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
drop_last: false
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import unittest
import sys
import logging
import random
import copy
# add python path of PadleDetection to sys.path
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4)))
if parent_path not in sys.path:
sys.path.append(parent_path)
from ppdet.data.parallel_map import ParallelMap
class MemorySource(object):
""" memory data source for testing
"""
def __init__(self, samples):
self._epoch = -1
self._pos = -1
self._drained = False
self._samples = samples
def __iter__(self):
return self
def __next__(self):
return self.next()
def next(self):
if self._epoch < 0:
self.reset()
if self._pos >= self.size():
self._drained = True
raise StopIteration("no more data in " + str(self))
else:
sample = copy.deepcopy(self._samples[self._pos])
self._pos += 1
return sample
def reset(self):
if self._epoch < 0:
self._epoch = 0
else:
self._epoch += 1
self._pos = 0
self._drained = False
random.shuffle(self._samples)
def size(self):
return len(self._samples)
def drained(self):
assert self._epoch >= 0, "the first epoch has not started yet"
return self._pos >= self.size()
def epoch_id(self):
return self._epoch
class TestDataset(unittest.TestCase):
"""Test cases for ppdet.data.dataset
"""
@classmethod
def setUpClass(cls):
""" setup
"""
pass
@classmethod
def tearDownClass(cls):
""" tearDownClass """
pass
def test_next(self):
""" test next
"""
samples = list(range(10))
mem_sc = MemorySource(samples)
for i, d in enumerate(mem_sc):
self.assertTrue(d in samples)
def test_transform_with_abnormal_worker(self):
""" test dataset transform with abnormally exit process
"""
samples = list(range(20))
mem_sc = MemorySource(samples)
def _worker(sample):
if sample == 3:
sys.exit(1)
return 2 * sample
test_worker = ParallelMap(
mem_sc, _worker, worker_num=2, use_process=True, memsize='2M')
ct = 0
for i, d in enumerate(test_worker):
ct += 1
self.assertTrue(d / 2 in samples)
self.assertEqual(len(samples) - 1, ct)
def test_transform_with_delay_worker(self):
""" test dataset transform with delayed process
"""
samples = list(range(20))
mem_sc = MemorySource(samples)
def _worker(sample):
if sample == 3:
time.sleep(30)
return 2 * sample
test_worker = ParallelMap(
mem_sc, _worker, worker_num=2, use_process=True, memsize='2M')
ct = 0
for i, d in enumerate(test_worker):
ct += 1
self.assertTrue(d / 2 in samples)
self.assertEqual(len(samples), ct)
if __name__ == '__main__':
logging.basicConfig()
unittest.main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import os
import sys
# add python path of PadleDetection to sys.path
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4)))
if parent_path not in sys.path:
sys.path.append(parent_path)
from ppdet.data.source.coco import COCODataSet
from ppdet.data.reader import Reader
from ppdet.utils.download import get_path
from ppdet.utils.download import DATASET_HOME
from ppdet.data.transform.operators import DecodeImage, ResizeImage, Permute
from ppdet.data.transform.batch_operators import PadBatch
COCO_VAL_URL = 'http://images.cocodataset.org/zips/val2017.zip'
COCO_VAL_MD5SUM = '442b8da7639aecaf257c1dceb8ba8c80'
COCO_ANNO_URL = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
COCO_ANNO_MD5SUM = 'f4bbac642086de4f52a3fdda2de5fa2c'
class TestReader(unittest.TestCase):
@classmethod
def setUpClass(cls):
""" setup
"""
root_path = os.path.join(DATASET_HOME, 'coco')
_, _ = get_path(COCO_VAL_URL, root_path, COCO_VAL_MD5SUM)
_, _ = get_path(COCO_ANNO_URL, root_path, COCO_ANNO_MD5SUM)
cls.anno_path = 'annotations/instances_val2017.json'
cls.image_dir = 'val2017'
cls.root_path = root_path
@classmethod
def tearDownClass(cls):
""" tearDownClass """
pass
def test_loader(self):
coco_loader = COCODataSet(
dataset_dir=self.root_path,
image_dir=self.image_dir,
anno_path=self.anno_path,
sample_num=10)
sample_trans = [
DecodeImage(to_rgb=True), ResizeImage(
target_size=800, max_size=1333, interp=1), Permute(to_bgr=False)
]
batch_trans = [PadBatch(pad_to_stride=32, use_padded_im_info=True), ]
inputs_def = {
'fields': [
'image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd',
'gt_mask'
],
}
data_loader = Reader(
coco_loader,
sample_transforms=sample_trans,
batch_transforms=batch_trans,
batch_size=2,
shuffle=True,
drop_empty=True,
inputs_def=inputs_def)()
for i in range(2):
for samples in data_loader:
for sample in samples:
im_shape = sample[0].shape
self.assertEqual(im_shape[0], 3)
self.assertEqual(im_shape[1] % 32, 0)
self.assertEqual(im_shape[2] % 32, 0)
im_info_shape = sample[1].shape
self.assertEqual(im_info_shape[-1], 3)
im_id_shape = sample[2].shape
self.assertEqual(im_id_shape[-1], 1)
gt_bbox_shape = sample[3].shape
self.assertEqual(gt_bbox_shape[-1], 4)
gt_class_shape = sample[4].shape
self.assertEqual(gt_class_shape[-1], 1)
self.assertEqual(gt_class_shape[0], gt_bbox_shape[0])
is_crowd_shape = sample[5].shape
self.assertEqual(is_crowd_shape[-1], 1)
self.assertEqual(is_crowd_shape[0], gt_bbox_shape[0])
mask = sample[6]
self.assertEqual(len(mask), gt_bbox_shape[0])
self.assertEqual(mask[0][0].shape[-1], 2)
data_loader.reset()
def test_loader_multi_threads(self):
coco_loader = COCODataSet(
dataset_dir=self.root_path,
image_dir=self.image_dir,
anno_path=self.anno_path,
sample_num=10)
sample_trans = [
DecodeImage(to_rgb=True), ResizeImage(
target_size=800, max_size=1333, interp=1), Permute(to_bgr=False)
]
batch_trans = [PadBatch(pad_to_stride=32, use_padded_im_info=True), ]
inputs_def = {
'fields': [
'image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd',
'gt_mask'
],
}
data_loader = Reader(
coco_loader,
sample_transforms=sample_trans,
batch_transforms=batch_trans,
batch_size=2,
shuffle=True,
drop_empty=True,
worker_num=2,
use_process=False,
bufsize=8,
inputs_def=inputs_def)()
for i in range(2):
for samples in data_loader:
for sample in samples:
im_shape = sample[0].shape
self.assertEqual(im_shape[0], 3)
self.assertEqual(im_shape[1] % 32, 0)
self.assertEqual(im_shape[2] % 32, 0)
im_info_shape = sample[1].shape
self.assertEqual(im_info_shape[-1], 3)
im_id_shape = sample[2].shape
self.assertEqual(im_id_shape[-1], 1)
gt_bbox_shape = sample[3].shape
self.assertEqual(gt_bbox_shape[-1], 4)
gt_class_shape = sample[4].shape
self.assertEqual(gt_class_shape[-1], 1)
self.assertEqual(gt_class_shape[0], gt_bbox_shape[0])
is_crowd_shape = sample[5].shape
self.assertEqual(is_crowd_shape[-1], 1)
self.assertEqual(is_crowd_shape[0], gt_bbox_shape[0])
mask = sample[6]
self.assertEqual(len(mask), gt_bbox_shape[0])
self.assertEqual(mask[0][0].shape[-1], 2)
data_loader.reset()
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import os
import yaml
import logging
import sys
# add python path of PadleDetection to sys.path
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4)))
if parent_path not in sys.path:
sys.path.append(parent_path)
from ppdet.utils.download import get_path
from ppdet.utils.download import DATASET_HOME
from ppdet.core.workspace import load_config, merge_config
from ppdet.data.reader import create_reader
COCO_VAL_URL = 'http://images.cocodataset.org/zips/val2017.zip'
COCO_VAL_MD5SUM = '442b8da7639aecaf257c1dceb8ba8c80'
COCO_ANNO_URL = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
COCO_ANNO_MD5SUM = 'f4bbac642086de4f52a3fdda2de5fa2c'
FORMAT = '[%(asctime)s-%(filename)s-%(levelname)s:%(message)s]'
logging.basicConfig(level=logging.INFO, format=FORMAT)
logger = logging.getLogger(__name__)
class TestReaderYAML(unittest.TestCase):
@classmethod
def setUpClass(cls):
""" setup
"""
root_path = os.path.join(DATASET_HOME, 'coco')
_, _ = get_path(COCO_VAL_URL, root_path, COCO_VAL_MD5SUM)
_, _ = get_path(COCO_ANNO_URL, root_path, COCO_ANNO_MD5SUM)
cls.anno_path = 'annotations/instances_val2017.json'
cls.image_dir = 'val2017'
cls.root_path = root_path
@classmethod
def tearDownClass(cls):
""" tearDownClass """
pass
def test_loader_yaml(self):
cfg_file = 'ppdet/data/tests/test.yml'
cfg = load_config(cfg_file)
data_cfg = '[!COCODataSet {{image_dir: {0}, dataset_dir: {1}, ' \
'anno_path: {2}, sample_num: 10}}]'.format(
self.image_dir, self.root_path, self.anno_path)
dataset_ins = yaml.load(data_cfg, Loader=yaml.Loader)
update_train_cfg = {'TrainReader': {'dataset': dataset_ins[0]}}
update_test_cfg = {'EvalReader': {'dataset': dataset_ins[0]}}
merge_config(update_train_cfg)
merge_config(update_test_cfg)
reader = create_reader(cfg['TrainReader'], 10)()
for samples in reader:
for sample in samples:
im_shape = sample[0].shape
self.assertEqual(im_shape[0], 3)
self.assertEqual(im_shape[1] % 32, 0)
self.assertEqual(im_shape[2] % 32, 0)
im_info_shape = sample[1].shape
self.assertEqual(im_info_shape[-1], 3)
im_id_shape = sample[2].shape
self.assertEqual(im_id_shape[-1], 1)
gt_bbox_shape = sample[3].shape
self.assertEqual(gt_bbox_shape[-1], 4)
gt_class_shape = sample[4].shape
self.assertEqual(gt_class_shape[-1], 1)
self.assertEqual(gt_class_shape[0], gt_bbox_shape[0])
is_crowd_shape = sample[5].shape
self.assertEqual(is_crowd_shape[-1], 1)
self.assertEqual(is_crowd_shape[0], gt_bbox_shape[0])
mask = sample[6]
self.assertEqual(len(mask), gt_bbox_shape[0])
self.assertEqual(mask[0][0].shape[-1], 2)
reader = create_reader(cfg['EvalReader'], 10)()
for samples in reader:
for sample in samples:
im_shape = sample[0].shape
self.assertEqual(im_shape[0], 3)
self.assertEqual(im_shape[1] % 32, 0)
self.assertEqual(im_shape[2] % 32, 0)
im_info_shape = sample[1].shape
self.assertEqual(im_info_shape[-1], 3)
im_id_shape = sample[2].shape
self.assertEqual(im_id_shape[-1], 1)
if __name__ == '__main__':
unittest.main()
# 自定义OP的编译过程
**注意:** 编译自定义OP使用的gcc版本须与Paddle编译使用gcc版本一致,Paddle develop每日版本目前采用**gcc 4.8.2**版本编译,若使用每日版本,请使用**gcc 4.8.2**版本编译自定义OP,否则可能出现兼容性问题。
## 代码结构
- src: 扩展OP C++/CUDA 源码
- cornerpool_lib.py: Python API封装
- tests: 各OP单测程序
## 编译自定义OP
自定义op需要将实现的C++、CUDA代码编译成动态库,```src/mask.sh```中通过g++/nvcc编译,当然您也可以写Makefile或者CMake。
编译需要include PaddlePaddle的相关头文件,链接PaddlePaddle的lib库。 头文件和lib库可通过下面命令获取到:
```
# python
>>> import paddle
>>> print(paddle.sysconfig.get_include())
/paddle/pyenv/local/lib/python2.7/site-packages/paddle/include
>>> print(paddle.sysconfig.get_lib())
/paddle/pyenv/local/lib/python2.7/site-packages/paddle/libs
```
我们提供动态库编译脚本如下:
```
cd src
sh make.sh
```
最终编译会产出`cornerpool_lib.so`
**说明:** 若使用源码编译安装PaddlePaddle的方式,编译过程中`cmake`未设置`WITH_MKLDNN`的方式,
编译自定义OP时会报错找不到`mkldnn.h`等文件,可在`make.sh`中删除编译命令中的`-DPADDLE_WITH_MKLDNN`选项。
## 设置环境变量
需要将Paddle的核心库设置到`LD_LIBRARY_PATH`里, 先运行下面程序获取路径:
```
import paddle
print(paddle.sysconfig.get_lib())
```
可通过如下方式添加动态库路径:
```
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`python -c 'import paddle; print(paddle.sysconfig.get_lib())'`
```
## 执行单测
执行下列单测,确保自定义算子可在网络中正确使用:
```
# 回到 ext_op 目录,运行单测
cd ..
python test/test_corner_pool.py
```
单测运行成功会输出提示信息,如下所示:
```
.
----------------------------------------------------------------------
Ran 4 test in 2.858s
OK
```
更多关于如何在框架外部自定义 C++ OP,可阅读[官网说明文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_usage/index_cn.html)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from . import cornerpool_lib
from .cornerpool_lib import *
__all__ = cornerpool_lib.__all__
import os
import paddle.fluid as fluid
use_cpp = False
file_dir = os.path.dirname(os.path.abspath(__file__))
try:
fluid.load_op_library(os.path.join(file_dir, 'src/cornerpool_lib.so'))
use_cpp = True
except:
print(
'Warning: cornerpool_lib.so not found, use python version instead which may drop the inference speed. Compile in ppdet/ext_op at first if you need cpp version.'
)
from paddle.fluid.layer_helper import LayerHelper
__all__ = [
'bottom_pool',
'top_pool',
'right_pool',
'left_pool',
]
def cornerpool_op(layer_type, input, name):
helper = LayerHelper(layer_type, input=input, name=name)
dtype = helper.input_dtype()
output = helper.create_variable_for_type_inference(dtype)
max_map = helper.create_variable_for_type_inference(dtype)
helper.append_op(
type=layer_type,
inputs={"X": input},
outputs={"Output": output,
"MaxMap": max_map})
return output
def bottom_pool(input, is_test=False, name=None):
"""
This layer calculates the bottom pooling output based on the input.
Scan the input from top to bottm for the vertical max-pooling.
The output has the same shape with input.
Args:
input(Variable): This input is a Tensor with shape [N, C, H, W].
The data type is float32 or float64.
Returns:
Variable(Tensor): The output of bottom_pool, with shape [N, C, H, W].
The data type is float32 or float64.
Examples:
..code-block:: python
import paddle.fluid as fluid
import cornerpool_lib
input = fluid.data(
name='input', shape=[2, 64, 10, 10], dtype='float32')
output = corner_pool.bottom_pool(input)
"""
if is_test:
if use_cpp:
output = cornerpool_op("bottom_pool", input, name)
return output
def cond(i, output):
return i < H
def body(i, output):
cur = fluid.layers.slice(output, [2], [i], [H])
next = fluid.layers.slice(output, [2], [0], [H - i])
max_v = fluid.layers.elementwise_max(cur, next)
orig = fluid.layers.slice(output, [2], [0], [i])
output = fluid.layers.concat([orig, max_v], axis=2)
i = i * 2
return [i, output]
H = fluid.layers.shape(input)[2]
i = fluid.layers.fill_constant(shape=[1], dtype='int32', value=1)
output = input
output = fluid.layers.while_loop(cond, body, [i, output])
return output[-1]
H = input.shape[2]
i = 1
output = input
while i < H:
cur = output[:, :, i:, :]
next = output[:, :, :H - i, :]
max_v = fluid.layers.elementwise_max(cur, next)
output = fluid.layers.concat([output[:, :, :i, :], max_v], axis=2)
i *= 2
return output
def top_pool(input, is_test=False, name=None):
"""
This layer calculates the top pooling output based on the input.
Scan the input from bottom to top for the vertical max-pooling.
The output has the same shape with input.
Args:
input(Variable): This input is a Tensor with shape [N, C, H, W].
The data type is float32 or float64.
Returns:
Variable(Tensor): The output of top_pool, with shape [N, C, H, W].
The data type is float32 or float64.
Examples:
..code-block:: python
import paddle.fluid as fluid
import cornerpool_lib
input = fluid.data(
name='input', shape=[2, 64, 10, 10], dtype='float32')
output = corner_pool.top_pool(input)
"""
if is_test:
if use_cpp:
output = cornerpool_op("top_pool", input, name)
return output
def cond(i, output):
return i < H
def body(i, output):
cur = fluid.layers.slice(output, [2], [0], [H - i])
next = fluid.layers.slice(output, [2], [i], [H])
max_v = fluid.layers.elementwise_max(cur, next)
orig = fluid.layers.slice(output, [2], [H - i], [H])
output = fluid.layers.concat([max_v, orig], axis=2)
i = i * 2
return [i, output]
H = fluid.layers.shape(input)[2]
i = fluid.layers.fill_constant(shape=[1], dtype='int32', value=1)
output = input
output = fluid.layers.while_loop(cond, body, [i, output])
return output[-1]
H = input.shape[2]
i = 1
output = input
while i < H:
cur = output[:, :, :H - i, :]
next = output[:, :, i:, :]
max_v = fluid.layers.elementwise_max(cur, next)
output = fluid.layers.concat([max_v, output[:, :, H - i:, :]], axis=2)
i *= 2
return output
def right_pool(input, is_test=False, name=None):
"""
This layer calculates the right pooling output based on the input.
Scan the input from left to right for the horizontal max-pooling.
The output has the same shape with input.
Args:
input(Variable): This input is a Tensor with shape [N, C, H, W].
The data type is float32 or float64.
Returns:
Variable(Tensor): The output of right_pool, with shape [N, C, H, W].
The data type is float32 or float64.
Examples:
..code-block:: python
import paddle.fluid as fluid
import cornerpool_lib
input = fluid.data(
name='input', shape=[2, 64, 10, 10], dtype='float32')
output = corner_pool.right_pool(input)
"""
if is_test:
if use_cpp:
output = cornerpool_op("right_pool", input, name)
return output
def cond(i, output):
return i < W
def body(i, output):
cur = fluid.layers.slice(output, [3], [i], [W])
next = fluid.layers.slice(output, [3], [0], [W - i])
max_v = fluid.layers.elementwise_max(cur, next)
orig = fluid.layers.slice(output, [3], [0], [i])
output = fluid.layers.concat([orig, max_v], axis=-1)
i = i * 2
return [i, output]
W = fluid.layers.shape(input)[3]
i = fluid.layers.fill_constant(shape=[1], dtype='int32', value=1)
output = input
output = fluid.layers.while_loop(cond, body, [i, output])
return output[-1]
W = input.shape[3]
i = 1
output = input
while i < W:
cur = output[:, :, :, i:]
next = output[:, :, :, :W - i]
max_v = fluid.layers.elementwise_max(cur, next)
output = fluid.layers.concat([output[:, :, :, :i], max_v], axis=-1)
i *= 2
return output
def left_pool(input, is_test=False, name=None):
"""
This layer calculates the left pooling output based on the input.
Scan the input from right to left for the horizontal max-pooling.
The output has the same shape with input.
Args:
input(Variable): This input is a Tensor with shape [N, C, H, W].
The data type is float32 or float64.
Returns:
Variable(Tensor): The output of left_pool, with shape [N, C, H, W].
The data type is float32 or float64.
Examples:
..code-block:: python
import paddle.fluid as fluid
import cornerpool_lib
input = fluid.data(
name='input', shape=[2, 64, 10, 10], dtype='float32')
output = corner_pool.left_pool(input)
"""
if is_test:
if use_cpp:
output = cornerpool_op("left_pool", input, name)
return output
def cond(i, output):
return i < W
def body(i, output):
cur = fluid.layers.slice(output, [3], [0], [W - i])
next = fluid.layers.slice(output, [3], [i], [W])
max_v = fluid.layers.elementwise_max(cur, next)
orig = fluid.layers.slice(output, [3], [W - i], [W])
output = fluid.layers.concat([max_v, orig], axis=-1)
i = i * 2
return [i, output]
W = fluid.layers.shape(input)[3]
i = fluid.layers.fill_constant(shape=[1], dtype='int32', value=1)
output = input
output = fluid.layers.while_loop(cond, body, [i, output])
return output[-1]
W = input.shape[3]
i = 1
output = input
while i < W:
cur = output[:, :, :, :W - i]
next = output[:, :, :, i:]
max_v = fluid.layers.elementwise_max(cur, next)
output = fluid.layers.concat([max_v, output[:, :, :, W - i:]], axis=-1)
i *= 2
return output
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class BottomPoolOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
ctx->ShareDim("X", /*->*/ "MaxMap");
ctx->ShareDim("X", /*->*/ "Output");
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
ctx.GetPlace());
}
};
class BottomPoolOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"Input with shape (batch, C, H, W)");
AddOutput("MaxMap", "Max map with index of maximum value of input");
AddOutput("Output", "output with same shape as input(X)");
AddComment(
R"Doc(
This operatio calculates the bottom pooling output based on the input.
Scan the input from top to bottom for the vertical max-pooling.
The output has the same shape with input.
)Doc");
}
};
class BottomPoolOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
"Input(Output@GRAD) should not be null");
auto out_grad_name = framework::GradVarName("Output");
ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
}
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
ctx.GetPlace());
}
};
template <typename T>
class BottomPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
protected:
void Apply(GradOpPtr<T> op) const override {
op->SetType("bottom_pool_grad");
op->SetInput("X", this->Input("X"));
op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
op->SetInput("MaxMap", this->Output("MaxMap"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetAttrMap(this->Attrs());
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(bottom_pool,
ops::BottomPoolOp,
ops::BottomPoolOpMaker,
ops::BottomPoolGradDescMaker<paddle::framework::OpDesc>,
ops::BottomPoolGradDescMaker<paddle::imperative::OpBase>);
REGISTER_OPERATOR(bottom_pool_grad, ops::BottomPoolOpGrad);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
#include "util.cu.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
static constexpr int kNumCUDAThreads = 512;
static constexpr int kNumMaximumNumBlocks = 4096;
static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaximumNumBlocks);
}
template <typename T>
class BottomPoolOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"This kernel only runs on GPU device.");
auto *x = ctx.Input<Tensor>("X");
auto *max_map = ctx.Output<Tensor>("MaxMap");
auto *output = ctx.Output<Tensor>("Output");
auto *x_data = x->data<T>();
auto x_dims = x->dims();
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int num = x->numel();
auto& dev_ctx = ctx.cuda_device_context();
int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int blocks = NumBlocks(num / height);
auto max_val_ptr = memory::Alloc(gpu_place, num / height * sizeof(T));
T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
auto max_ind_ptr = memory::Alloc(gpu_place, num / height * sizeof(int));
int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 2, false, max_val_data, max_ind_data, max_map_data);
blocks = NumBlocks(num);
ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 2, output_data);
}
};
template <typename T>
class BottomPoolGradOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* max_map = ctx.Input<Tensor>("MaxMap");
auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto x_dims = x->dims();
auto& dev_ctx = ctx.cuda_device_context();
T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int grad_num = in_grad->numel();
int blocks = NumBlocks(grad_num);
FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 2, in_grad_data);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(bottom_pool,
ops::BottomPoolOpCUDAKernel<float>,
ops::BottomPoolOpCUDAKernel<double>);
REGISTER_OP_CUDA_KERNEL(bottom_pool_grad,
ops::BottomPoolGradOpCUDAKernel<float>,
ops::BottomPoolGradOpCUDAKernel<double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class LeftPoolOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
ctx->ShareDim("X", /*->*/ "MaxMap");
ctx->ShareDim("X", /*->*/ "Output");
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
ctx.GetPlace());
}
};
class LeftPoolOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"Input with shape (batch, C, H, W)");
AddOutput("MaxMap", "Max map with index of maximum value of input");
AddOutput("Output", "output with same shape as input(X)");
AddComment(
R"Doc(
This operatio calculates the left pooling output based on the input.
Scan the input from right to left for the horizontal max-pooling.
The output has the same shape with input.
)Doc");
}
};
class LeftPoolOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
"Input(Output@GRAD) should not be null");
auto out_grad_name = framework::GradVarName("Output");
ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
}
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
ctx.GetPlace());
}
};
template <typename T>
class LeftPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
protected:
void Apply(GradOpPtr<T> op) const override {
op->SetType("left_pool_grad");
op->SetInput("X", this->Input("X"));
op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
op->SetInput("MaxMap", this->Output("MaxMap"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetAttrMap(this->Attrs());
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(left_pool,
ops::LeftPoolOp,
ops::LeftPoolOpMaker,
ops::LeftPoolGradDescMaker<paddle::framework::OpDesc>,
ops::LeftPoolGradDescMaker<paddle::imperative::OpBase>);
REGISTER_OPERATOR(left_pool_grad, ops::LeftPoolOpGrad);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
#include "util.cu.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
static constexpr int kNumCUDAThreads = 512;
static constexpr int kNumMaximumNumBlocks = 4096;
static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaximumNumBlocks);
}
template <typename T>
class LeftPoolOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"This kernel only runs on GPU device.");
auto *x = ctx.Input<Tensor>("X");
auto *max_map = ctx.Output<Tensor>("MaxMap");
auto *output = ctx.Output<Tensor>("Output");
auto *x_data = x->data<T>();
auto x_dims = x->dims();
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int num = x->numel();
auto& dev_ctx = ctx.cuda_device_context();
int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int blocks = NumBlocks(num / width);
auto max_val_ptr = memory::Alloc(gpu_place, num / width * sizeof(T));
T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
auto max_ind_ptr = memory::Alloc(gpu_place, num / width * sizeof(int));
int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 3, true, max_val_data, max_ind_data, max_map_data);
blocks = NumBlocks(num);
ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 3, output_data);
}
};
template <typename T>
class LeftPoolGradOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* max_map = ctx.Input<Tensor>("MaxMap");
auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto x_dims = x->dims();
auto& dev_ctx = ctx.cuda_device_context();
T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int grad_num = in_grad->numel();
int blocks = NumBlocks(grad_num);
FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 3, in_grad_data);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(left_pool,
ops::LeftPoolOpCUDAKernel<float>,
ops::LeftPoolOpCUDAKernel<double>);
REGISTER_OP_CUDA_KERNEL(left_pool_grad,
ops::LeftPoolGradOpCUDAKernel<float>,
ops::LeftPoolGradOpCUDAKernel<double>);
include_dir=$( python -c 'import paddle; print(paddle.sysconfig.get_include())' )
lib_dir=$( python -c 'import paddle; print(paddle.sysconfig.get_lib())' )
echo $include_dir
echo $lib_dir
OPS='bottom_pool_op top_pool_op right_pool_op left_pool_op'
for op in ${OPS}
do
nvcc ${op}.cu -c -o ${op}.cu.o -ccbin cc -DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO -DPADDLE_WITH_MKLDNN -Xcompiler -fPIC -std=c++11 -Xcompiler -fPIC -w --expt-relaxed-constexpr -O0 -g -DNVCC \
-I ${include_dir}/third_party/ \
-I ${include_dir}
done
g++ bottom_pool_op.cc bottom_pool_op.cu.o top_pool_op.cc top_pool_op.cu.o right_pool_op.cc right_pool_op.cu.o left_pool_op.cc left_pool_op.cu.o -o cornerpool_lib.so -DPADDLE_WITH_MKLDNN -shared -fPIC -std=c++11 -O0 -g \
-I ${include_dir}/third_party/ \
-I ${include_dir} \
-L ${lib_dir} \
-L /usr/local/cuda/lib64 -lpaddle_framework -lcudart
rm *.cu.o
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class RightPoolOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
ctx->ShareDim("X", /*->*/ "MaxMap");
ctx->ShareDim("X", /*->*/ "Output");
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
ctx.GetPlace());
}
};
class RightPoolOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"Input with shape (batch, C, H, W)");
AddOutput("MaxMap", "Max map with index of maximum value of input");
AddOutput("Output", "output with same shape as input(X)");
AddComment(
R"Doc(
This operatio calculates the right pooling output based on the input.
Scan the input from left to right or the horizontal max-pooling.
The output has the same shape with input.
)Doc");
}
};
class RightPoolOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
"Input(Output@GRAD) should not be null");
auto out_grad_name = framework::GradVarName("Output");
ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
}
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
ctx.GetPlace());
}
};
template <typename T>
class RightPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
protected:
void Apply(GradOpPtr<T> op) const override {
op->SetType("right_pool_grad");
op->SetInput("X", this->Input("X"));
op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
op->SetInput("MaxMap", this->Output("MaxMap"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetAttrMap(this->Attrs());
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(right_pool,
ops::RightPoolOp,
ops::RightPoolOpMaker,
ops::RightPoolGradDescMaker<paddle::framework::OpDesc>,
ops::RightPoolGradDescMaker<paddle::imperative::OpBase>);
REGISTER_OPERATOR(right_pool_grad, ops::RightPoolOpGrad);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
#include "util.cu.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
static constexpr int kNumCUDAThreads = 512;
static constexpr int kNumMaximumNumBlocks = 4096;
static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaximumNumBlocks);
}
template <typename T>
class RightPoolOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"This kernel only runs on GPU device.");
auto *x = ctx.Input<Tensor>("X");
auto *max_map = ctx.Output<Tensor>("MaxMap");
auto *output = ctx.Output<Tensor>("Output");
auto *x_data = x->data<T>();
auto x_dims = x->dims();
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int num = x->numel();
auto& dev_ctx = ctx.cuda_device_context();
int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int blocks = NumBlocks(num / width);
auto max_val_ptr = memory::Alloc(gpu_place, num / width * sizeof(T));
T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
auto max_ind_ptr = memory::Alloc(gpu_place, num / width * sizeof(int));
int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 3, false, max_val_data, max_ind_data, max_map_data);
blocks = NumBlocks(num);
ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 3, output_data);
}
};
template <typename T>
class RightPoolGradOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* max_map = ctx.Input<Tensor>("MaxMap");
auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto x_dims = x->dims();
auto& dev_ctx = ctx.cuda_device_context();
T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int grad_num = in_grad->numel();
int blocks = NumBlocks(grad_num);
FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 3, in_grad_data);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(right_pool,
ops::RightPoolOpCUDAKernel<float>,
ops::RightPoolOpCUDAKernel<double>);
REGISTER_OP_CUDA_KERNEL(right_pool_grad,
ops::RightPoolGradOpCUDAKernel<float>,
ops::RightPoolGradOpCUDAKernel<double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class TopPoolOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
ctx->ShareDim("X", /*->*/ "MaxMap");
ctx->ShareDim("X", /*->*/ "Output");
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
ctx.GetPlace());
}
};
class TopPoolOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"Input with shape (batch, C, H, W)");
AddOutput("MaxMap", "Max map with index of maximum value of input");
AddOutput("Output", "Output with same shape as input(X)");
AddComment(
R"Doc(
This operatio calculates the top pooling output based on the input.
Scan the input from bottom to top for the vertical max-pooling.
The output has the same shape with input.
)Doc");
}
};
class TopPoolOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
"Input(Output@GRAD) should not be null");
auto out_grad_name = framework::GradVarName("Output");
ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
}
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
ctx.GetPlace());
}
};
template <typename T>
class TopPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
protected:
void Apply(GradOpPtr<T> op) const override {
op->SetType("top_pool_grad");
op->SetInput("X", this->Input("X"));
op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
op->SetInput("MaxMap", this->Output("MaxMap"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetAttrMap(this->Attrs());
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(top_pool,
ops::TopPoolOp,
ops::TopPoolOpMaker,
ops::TopPoolGradDescMaker<paddle::framework::OpDesc>,
ops::TopPoolGradDescMaker<paddle::imperative::OpBase>);
REGISTER_OPERATOR(top_pool_grad, ops::TopPoolOpGrad);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
GUnless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
#include "util.cu.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
static constexpr int kNumCUDAThreads = 512;
static constexpr int kNumMaximumNumBlocks = 4096;
static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaximumNumBlocks);
}
template <typename T>
class TopPoolOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"This kernel only runs on GPU device.");
auto *x = ctx.Input<Tensor>("X");
auto *max_map = ctx.Output<Tensor>("MaxMap");
auto *output = ctx.Output<Tensor>("Output");
auto *x_data = x->data<T>();
auto x_dims = x->dims();
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int num = x->numel();
auto& dev_ctx = ctx.cuda_device_context();
int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int blocks = NumBlocks(num / height);
auto max_val_ptr = memory::Alloc(gpu_place, num / height * sizeof(T));
T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
auto max_ind_ptr = memory::Alloc(gpu_place, num / height * sizeof(int));
int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 2, true, max_val_data, max_ind_data, max_map_data);
blocks = NumBlocks(num);
ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 2, output_data);
}
};
template <typename T>
class TopPoolGradOpCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* max_map = ctx.Input<Tensor>("MaxMap");
auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto x_dims = x->dims();
auto& dev_ctx = ctx.cuda_device_context();
T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
int threads = kNumCUDAThreads;
int NC_num = x_dims[0] * x_dims[1];
int height = x_dims[2];
int width = x_dims[3];
int grad_num = in_grad->numel();
int blocks = NumBlocks(grad_num);
FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 2, in_grad_data);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(top_pool,
ops::TopPoolOpCUDAKernel<float>,
ops::TopPoolOpCUDAKernel<double>);
REGISTER_OP_CUDA_KERNEL(top_pool_grad,
ops::TopPoolGradOpCUDAKernel<float>,
ops::TopPoolGradOpCUDAKernel<double>);
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/memory/memory.h"
#include <vector>
namespace paddle {
namespace operators {
using framework::Tensor;
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
i += blockDim.x * gridDim.x)
template <typename T>
__global__ void FillConstant(T* x, int num, int fill_num) {
CUDA_1D_KERNEL_LOOP(i, fill_num) {
x[i] = static_cast<T>(num);
}
}
template <typename T>
__global__ void SliceOnAxis(const T* x, const int NC_num, const int H, const int W,
const int axis, const int start, const int end,
T* output) {
int HW_num = H * W;
int length = axis == 2 ? W : H;
int sliced_len = end - start;
int cur_HW_num = length * sliced_len;
// slice input on H or W (axis is 2 or 3)
CUDA_1D_KERNEL_LOOP(i, NC_num * cur_HW_num) {
int NC_id = i / cur_HW_num;
int HW_id = i % cur_HW_num;
if (axis == 2){
output[i] = x[NC_id * HW_num + start * W + HW_id];
} else if (axis == 3) {
int col = HW_id % sliced_len;
int row = HW_id / sliced_len;
output[i] = x[NC_id * HW_num + row * W + start + col];
}
}
}
template <typename T>
__global__ void MaxOut(const T* input, const int next_ind, const int NC_num,
const int H, const int W, const int axis,
const int start, const int end, T* output) {
int HW_num = H * W;
int length = axis == 2 ? W : H;
T cur = static_cast<T>(0.);
T next = static_cast<T>(0.);
T max_v = static_cast<T>(0.);
int sliced_len = end - start;
int cur_HW_num = length * sliced_len;
// compare cur and next and assign max values to output
CUDA_1D_KERNEL_LOOP(i, NC_num * cur_HW_num) {
int NC_id = i / cur_HW_num;
int HW_id = i % cur_HW_num;
if (axis == 2){
cur = input[NC_id * HW_num + start * W + HW_id];
next = input[NC_id * HW_num + next_ind * W + HW_id];
max_v = cur > next ? cur : next;
output[NC_id * HW_num + start * W + HW_id] = max_v;
} else if (axis == 3) {
int col = HW_id % sliced_len;
int row = HW_id / sliced_len;
cur = input[NC_id * HW_num + row * W + start + col];
next = input[NC_id * HW_num + row * W + next_ind + col];
max_v = cur > next ? cur : next;
output[NC_id * HW_num + row * W + start + col] = max_v;
}
__syncthreads();
}
}
template <typename T>
__global__ void UpdateMaxInfo(const T* input, const int NC_num,
const int H, const int W, const int axis,
const int index, T* max_val, int* max_ind) {
int length = axis == 2 ? W : H;
int HW_num = H * W;
T val = static_cast<T>(0.);
CUDA_1D_KERNEL_LOOP(i, NC_num * length) {
int NC_id = i / length;
int length_id = i % length;
if (axis == 2) {
val = input[NC_id * HW_num + index * W + length_id];
} else if (axis == 3) {
val = input[NC_id * HW_num + length_id * W + index];
}
if (val > max_val[i]) {
max_val[i] = val;
max_ind[i] = index;
}
__syncthreads();
}
}
template <typename T>
__global__ void ScatterAddOnAxis(const T* input, const int start, const int* max_ind, const int NC_num, const int H, const int W, const int axis, T* output) {
int length = axis == 2 ? W : H;
int HW_num = H * W;
CUDA_1D_KERNEL_LOOP(i, NC_num * length) {
int NC_id = i / length;
int length_id = i % length;
int id_ = max_ind[i];
if (axis == 2) {
platform::CudaAtomicAdd(output + NC_id * HW_num + id_ * W + length_id, input[NC_id * HW_num + start * W + length_id]);
//output[NC_id * HW_num + id_ * W + length_id] += input[NC_id * HW_num + start * W + length_id];
} else if (axis == 3) {
platform::CudaAtomicAdd(output + NC_id * HW_num + length_id * W + id_, input[NC_id * HW_num + length_id * W + start]);
//output[NC_id * HW_num + length_id * W + id_] += input[NC_id * HW_num + length_id * W + start];
}
__syncthreads();
}
}
template <typename T>
__global__ void GetMaxInfo(const T* input, const int NC_num,
const int H, const int W, const int axis,
const bool reverse, T* max_val, int* max_ind,
int* max_map) {
int start = 0;
int end = axis == 2 ? H: W;
int s = reverse ? end-1 : start;
int e = reverse ? start-1 : end;
int step = reverse ? -1 : 1;
int len = axis == 2 ? W : H;
int loc = 0;
T val = static_cast<T>(0.);
for (int i = s; ; ) {
if (i == s) {
CUDA_1D_KERNEL_LOOP(j, NC_num * len) {
int NC_id = j / len;
int len_id = j % len;
if (axis == 2) {
loc = NC_id * H * W + i * W + len_id;
} else if (axis == 3){
loc = NC_id * H * W + len_id * W + i;
}
max_ind[j] = i;
max_map[loc] = max_ind[j];
max_val[j] = input[loc];
__syncthreads();
}
} else {
CUDA_1D_KERNEL_LOOP(j, NC_num * len) {
int NC_id = j / len;
int len_id = j % len;
if (axis == 2) {
loc = NC_id * H * W + i * W + len_id;
} else if (axis == 3){
loc = NC_id * H * W + len_id * W + i;
}
val = input[loc];
T max_v = max_val[j];
if (val > max_v) {
max_val[j] = val;
max_map[loc] = i;
max_ind[j] = i;
} else {
max_map[loc] = max_ind[j];
}
__syncthreads();
}
}
i += step;
if (s < e && i >= e) break;
if (s > e && i <= e) break;
}
}
template <typename T>
__global__ void ScatterAddFw(const T* input, const int* max_map, const int NC_num, const int H, const int W, const int axis, T* output){
CUDA_1D_KERNEL_LOOP(i, NC_num * H * W) {
int loc = max_map[i];
int NC_id = i / (H * W);
int len_id = 0;
if (axis == 2) {
len_id = i % W;
output[i] = input[NC_id * H * W + loc * W + len_id];
} else {
len_id = i % (H * W) / W;
output[i] = input[NC_id * H * W + len_id * W + loc];
}
}
}
template <typename T>
__global__ void ScatterAddBw(const T* input, const int* max_map, const int NC_num, const int H, const int W, const int axis, T* output){
CUDA_1D_KERNEL_LOOP(i, NC_num * H * W) {
int loc = max_map[i];
int NC_id = i / (H * W);
int len_id = 0;
int offset = 0;
if (axis == 2) {
len_id = i % W;
offset = NC_id * H * W + loc * W + len_id;
} else {
len_id = i % (H * W) / W;
offset = NC_id * H * W + len_id * W + loc;
}
platform::CudaAtomicAdd(output + offset, input[i]);
}
}
} // namespace operators
} // namespace paddle
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import unittest
import numpy as np
import paddle.fluid as fluid
import os
import sys
# add python path of PadleDetection to sys.path
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4)))
if parent_path not in sys.path:
sys.path.append(parent_path)
from ppdet.ext_op import cornerpool_lib
def bottom_pool_np(x):
height = x.shape[2]
output = x.copy()
for ind in range(height):
cur = output[:, :, ind:height, :]
next = output[:, :, :height - ind, :]
output[:, :, ind:height, :] = np.maximum(cur, next)
return output
def top_pool_np(x):
height = x.shape[2]
output = x.copy()
for ind in range(height):
cur = output[:, :, :height - ind, :]
next = output[:, :, ind:height, :]
output[:, :, :height - ind, :] = np.maximum(cur, next)
return output
def right_pool_np(x):
width = x.shape[3]
output = x.copy()
for ind in range(width):
cur = output[:, :, :, ind:width]
next = output[:, :, :, :width - ind]
output[:, :, :, ind:width] = np.maximum(cur, next)
return output
def left_pool_np(x):
width = x.shape[3]
output = x.copy()
for ind in range(width):
cur = output[:, :, :, :width - ind]
next = output[:, :, :, ind:width]
output[:, :, :, :width - ind] = np.maximum(cur, next)
return output
class TestRightPoolOp(unittest.TestCase):
def funcmap(self):
self.func_map = {
'bottom_x': [cornerpool_lib.bottom_pool, bottom_pool_np],
'top_x': [cornerpool_lib.top_pool, top_pool_np],
'right_x': [cornerpool_lib.right_pool, right_pool_np],
'left_x': [cornerpool_lib.left_pool, left_pool_np]
}
def setup(self):
self.name = 'right_x'
def test_check_output(self):
self.funcmap()
self.setup()
x_shape = (2, 10, 16, 16)
x_type = "float64"
sp = fluid.Program()
tp = fluid.Program()
place = fluid.CUDAPlace(0)
with fluid.program_guard(tp, sp):
x = fluid.data(name=self.name, shape=x_shape, dtype=x_type)
y = self.func_map[self.name][0](x)
np.random.seed(0)
x_np = np.random.uniform(-1000, 1000, x_shape).astype(x_type)
out_np = self.func_map[self.name][1](x_np)
exe = fluid.Executor(place)
outs = exe.run(tp, feed={self.name: x_np}, fetch_list=[y])
self.assertTrue(np.allclose(outs, out_np))
class TestTopPoolOp(TestRightPoolOp):
def setup(self):
self.name = 'top_x'
class TestBottomPoolOp(TestRightPoolOp):
def setup(self):
self.name = 'bottom_x'
class TestLeftPoolOp(TestRightPoolOp):
def setup(self):
self.name = 'left_x'
if __name__ == "__main__":
unittest.main()
import six
import os
import numpy as np
from numba import jit
from .bbox import nms
@jit
def box_decoder(deltas, boxes, weights, bbox_clip=4.13):
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
wx, wy, ww, wh = weights
dx = deltas[:, 0::4] * wx
dy = deltas[:, 1::4] * wy
dw = deltas[:, 2::4] * ww
dh = deltas[:, 3::4] * wh
# Prevent sending too large values into np.exp()
dw = np.minimum(dw, bbox_clip)
dh = np.minimum(dh, bbox_clip)
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
# x1
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
# y1
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
# x2 (note: "- 1" is correct; don't be fooled by the asymmetry)
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
# y2 (note: "- 1" is correct; don't be fooled by the asymmetry)
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1
return pred_boxes
@jit
def clip_tiled_boxes(boxes, im_shape):
"""Clip boxes to image boundaries. im_shape is [height, width] and boxes
has shape (N, 4 * num_tiled_boxes)."""
assert boxes.shape[1] % 4 == 0, \
'boxes.shape[1] is {:d}, but must be divisible by 4.'.format(
boxes.shape[1]
)
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
#@jit
def get_nmsed_box(rpn_rois,
confs,
locs,
class_nums,
im_info,
bbox_reg_weights=[0.1, 0.1, 0.2, 0.2],
score_thresh=0.05,
nms_thresh=0.5,
detections_per_im=100):
box_nums = [0, rpn_rois.shape[0]]
variance_v = np.array(bbox_reg_weights)
rpn_rois_v = np.array(rpn_rois)
confs_v = np.array(confs)
locs_v = np.array(locs)
im_results = [[] for _ in range(len(box_nums) - 1)]
new_box_nums = [0]
for i in range(len(box_nums) - 1):
start = box_nums[i]
end = box_nums[i + 1]
if start == end:
continue
locs_n = locs_v[start:end, :] # box delta
rois_n = rpn_rois_v[start:end, :] # box
rois_n = rois_n / im_info[i][2] # scale
rois_n = box_decoder(locs_n, rois_n, variance_v)
rois_n = clip_tiled_boxes(rois_n, im_info[i][:2] / im_info[i][2])
cls_boxes = [[] for _ in range(class_nums)]
scores_n = confs_v[start:end, :]
for j in range(1, class_nums):
inds = np.where(scores_n[:, j] > TEST.score_thresh)[0]
scores_j = scores_n[inds, j]
rois_j = rois_n[inds, j * 4:(j + 1) * 4]
dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(
np.float32, copy=False)
keep = nms(dets_j, TEST.nms_thresh)
nms_dets = dets_j[keep, :]
#add labels
label = np.array([j for _ in range(len(keep))])
nms_dets = np.hstack((label[:, np.newaxis], nms_dets)).astype(
np.float32, copy=False)
cls_boxes[j] = nms_dets
# Limit to max_per_image detections **over all classes**
image_scores = np.hstack(
[cls_boxes[j][:, 1] for j in range(1, class_nums)])
if len(image_scores) > detections_per_im:
image_thresh = np.sort(image_scores)[-detections_per_im]
for j in range(1, class_nums):
keep = np.where(cls_boxes[j][:, 1] >= image_thresh)[0]
cls_boxes[j] = cls_boxes[j][keep, :]
im_results_n = np.vstack([cls_boxes[j] for j in range(1, class_nums)])
im_results[i] = im_results_n
new_box_nums.append(len(im_results_n) + new_box_nums[-1])
labels = im_results_n[:, 0]
scores = im_results_n[:, 1]
boxes = im_results_n[:, 2:]
im_results = np.vstack([im_results[k] for k in range(len(box_nums) - 1)])
return new_box_nums, im_results
@jit
def get_dt_res(batch_size, box_nums, nmsed_out, data, num_id_to_cat_id_map):
dts_res = []
nmsed_out_v = np.array(nmsed_out)
if nmsed_out_v.shape == (
1,
1, ):
return dts_res
assert (len(box_nums) == batch_size + 1), \
"Error Tensor offset dimension. Box Nums({}) vs. batch_size({})"\
.format(len(box_nums), batch_size)
k = 0
for i in range(batch_size):
dt_num_this_img = box_nums[i + 1] - box_nums[i]
image_id = int(data[i][-1])
image_width = int(data[i][1][1])
image_height = int(data[i][1][2])
for j in range(dt_num_this_img):
dt = nmsed_out_v[k]
k = k + 1
num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
category_id = num_id_to_cat_id_map[num_id]
w = xmax - xmin + 1
h = ymax - ymin + 1
bbox = [xmin, ymin, w, h]
dt_res = {
'image_id': image_id,
'category_id': category_id,
'bbox': bbox,
'score': score
}
dts_res.append(dt_res)
return dts_res
@jit
def get_segms_res(batch_size, box_nums, segms_out, data, num_id_to_cat_id_map):
segms_res = []
segms_out_v = np.array(segms_out)
k = 0
for i in range(batch_size):
dt_num_this_img = box_nums[i + 1] - box_nums[i]
image_id = int(data[i][-1])
for j in range(dt_num_this_img):
dt = segms_out_v[k]
k = k + 1
segm, num_id, score = dt.tolist()
cat_id = num_id_to_cat_id_map[num_id]
if six.PY3:
if 'counts' in segm:
segm['counts'] = segm['counts'].decode("utf8")
segm_res = {
'image_id': image_id,
'category_id': cat_id,
'segmentation': segm,
'score': score
}
segms_res.append(segm_res)
return segms_res
import numpy as np
class BufferDict(dict):
def __init__(self, **kwargs):
super(BufferDict, self).__init__(**kwargs)
def __getitem__(self, key):
if key in self.keys():
return super(BufferDict, self).__getitem__(key)
else:
raise Exception("The %s is not in global inputs dict" % key)
def __setitem__(self, key, value):
if key not in self.keys():
super(BufferDict, self).__setitem__(key, value)
else:
raise Exception("The %s is already in global inputs dict" % key)
def update(self, *args, **kwargs):
for k, v in dict(*args, **kwargs).items():
self[k] = v
def update_v(self, key, value):
if key in self.keys():
super(BufferDict, self).__setitem__(key, value)
else:
raise Exception("The %s is not in global inputs dict" % key)
def get(self, key):
return self.__getitem__(key)
def set(self, key, value):
return self.__setitem__(key, value)
def debug(self, dshape=True, dvalue=True, dtype=False):
if self['open_debug']:
if 'debug_names' not in self.keys():
ditems = self.keys()
else:
ditems = self['debug_names']
infos = {}
for k in ditems:
if type(k) is dict:
i_d = {}
for i, j in k.items():
if type(j) is list:
for jj in j:
i_d[jj] = self.get_debug_info(self[i][jj])
infos[i] = i_d
else:
infos[k] = self.get_debug_info(self[k])
print(infos)
def get_debug_info(self, v, dshape=True, dvalue=True, dtype=False):
info = []
if dshape == True and hasattr(v, 'shape'):
info.append(v.shape)
if dvalue == True and hasattr(v, 'numpy'):
info.append(np.mean(np.abs(v.numpy())))
if dtype == True:
info.append(type(v))
return info
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import sys
import numpy as np
from .coco_eval import bbox2out
import logging
logger = logging.getLogger(__name__)
__all__ = ['bbox2out', 'get_category_info']
def get_category_info(anno_file=None,
with_background=True,
use_default_label=False):
clsid2catid = {k: k for k in range(1, 501)}
catid2name = {
0: "background",
1: "Infant bed",
2: "Rose",
3: "Flag",
4: "Flashlight",
5: "Sea turtle",
6: "Camera",
7: "Animal",
8: "Glove",
9: "Crocodile",
10: "Cattle",
11: "House",
12: "Guacamole",
13: "Penguin",
14: "Vehicle registration plate",
15: "Bench",
16: "Ladybug",
17: "Human nose",
18: "Watermelon",
19: "Flute",
20: "Butterfly",
21: "Washing machine",
22: "Raccoon",
23: "Segway",
24: "Taco",
25: "Jellyfish",
26: "Cake",
27: "Pen",
28: "Cannon",
29: "Bread",
30: "Tree",
31: "Shellfish",
32: "Bed",
33: "Hamster",
34: "Hat",
35: "Toaster",
36: "Sombrero",
37: "Tiara",
38: "Bowl",
39: "Dragonfly",
40: "Moths and butterflies",
41: "Antelope",
42: "Vegetable",
43: "Torch",
44: "Building",
45: "Power plugs and sockets",
46: "Blender",
47: "Billiard table",
48: "Cutting board",
49: "Bronze sculpture",
50: "Turtle",
51: "Broccoli",
52: "Tiger",
53: "Mirror",
54: "Bear",
55: "Zucchini",
56: "Dress",
57: "Volleyball",
58: "Guitar",
59: "Reptile",
60: "Golf cart",
61: "Tart",
62: "Fedora",
63: "Carnivore",
64: "Car",
65: "Lighthouse",
66: "Coffeemaker",
67: "Food processor",
68: "Truck",
69: "Bookcase",
70: "Surfboard",
71: "Footwear",
72: "Bench",
73: "Necklace",
74: "Flower",
75: "Radish",
76: "Marine mammal",
77: "Frying pan",
78: "Tap",
79: "Peach",
80: "Knife",
81: "Handbag",
82: "Laptop",
83: "Tent",
84: "Ambulance",
85: "Christmas tree",
86: "Eagle",
87: "Limousine",
88: "Kitchen & dining room table",
89: "Polar bear",
90: "Tower",
91: "Football",
92: "Willow",
93: "Human head",
94: "Stop sign",
95: "Banana",
96: "Mixer",
97: "Binoculars",
98: "Dessert",
99: "Bee",
100: "Chair",
101: "Wood-burning stove",
102: "Flowerpot",
103: "Beaker",
104: "Oyster",
105: "Woodpecker",
106: "Harp",
107: "Bathtub",
108: "Wall clock",
109: "Sports uniform",
110: "Rhinoceros",
111: "Beehive",
112: "Cupboard",
113: "Chicken",
114: "Man",
115: "Blue jay",
116: "Cucumber",
117: "Balloon",
118: "Kite",
119: "Fireplace",
120: "Lantern",
121: "Missile",
122: "Book",
123: "Spoon",
124: "Grapefruit",
125: "Squirrel",
126: "Orange",
127: "Coat",
128: "Punching bag",
129: "Zebra",
130: "Billboard",
131: "Bicycle",
132: "Door handle",
133: "Mechanical fan",
134: "Ring binder",
135: "Table",
136: "Parrot",
137: "Sock",
138: "Vase",
139: "Weapon",
140: "Shotgun",
141: "Glasses",
142: "Seahorse",
143: "Belt",
144: "Watercraft",
145: "Window",
146: "Giraffe",
147: "Lion",
148: "Tire",
149: "Vehicle",
150: "Canoe",
151: "Tie",
152: "Shelf",
153: "Picture frame",
154: "Printer",
155: "Human leg",
156: "Boat",
157: "Slow cooker",
158: "Croissant",
159: "Candle",
160: "Pancake",
161: "Pillow",
162: "Coin",
163: "Stretcher",
164: "Sandal",
165: "Woman",
166: "Stairs",
167: "Harpsichord",
168: "Stool",
169: "Bus",
170: "Suitcase",
171: "Human mouth",
172: "Juice",
173: "Skull",
174: "Door",
175: "Violin",
176: "Chopsticks",
177: "Digital clock",
178: "Sunflower",
179: "Leopard",
180: "Bell pepper",
181: "Harbor seal",
182: "Snake",
183: "Sewing machine",
184: "Goose",
185: "Helicopter",
186: "Seat belt",
187: "Coffee cup",
188: "Microwave oven",
189: "Hot dog",
190: "Countertop",
191: "Serving tray",
192: "Dog bed",
193: "Beer",
194: "Sunglasses",
195: "Golf ball",
196: "Waffle",
197: "Palm tree",
198: "Trumpet",
199: "Ruler",
200: "Helmet",
201: "Ladder",
202: "Office building",
203: "Tablet computer",
204: "Toilet paper",
205: "Pomegranate",
206: "Skirt",
207: "Gas stove",
208: "Cookie",
209: "Cart",
210: "Raven",
211: "Egg",
212: "Burrito",
213: "Goat",
214: "Kitchen knife",
215: "Skateboard",
216: "Salt and pepper shakers",
217: "Lynx",
218: "Boot",
219: "Platter",
220: "Ski",
221: "Swimwear",
222: "Swimming pool",
223: "Drinking straw",
224: "Wrench",
225: "Drum",
226: "Ant",
227: "Human ear",
228: "Headphones",
229: "Fountain",
230: "Bird",
231: "Jeans",
232: "Television",
233: "Crab",
234: "Microphone",
235: "Home appliance",
236: "Snowplow",
237: "Beetle",
238: "Artichoke",
239: "Jet ski",
240: "Stationary bicycle",
241: "Human hair",
242: "Brown bear",
243: "Starfish",
244: "Fork",
245: "Lobster",
246: "Corded phone",
247: "Drink",
248: "Saucer",
249: "Carrot",
250: "Insect",
251: "Clock",
252: "Castle",
253: "Tennis racket",
254: "Ceiling fan",
255: "Asparagus",
256: "Jaguar",
257: "Musical instrument",
258: "Train",
259: "Cat",
260: "Rifle",
261: "Dumbbell",
262: "Mobile phone",
263: "Taxi",
264: "Shower",
265: "Pitcher",
266: "Lemon",
267: "Invertebrate",
268: "Turkey",
269: "High heels",
270: "Bust",
271: "Elephant",
272: "Scarf",
273: "Barrel",
274: "Trombone",
275: "Pumpkin",
276: "Box",
277: "Tomato",
278: "Frog",
279: "Bidet",
280: "Human face",
281: "Houseplant",
282: "Van",
283: "Shark",
284: "Ice cream",
285: "Swim cap",
286: "Falcon",
287: "Ostrich",
288: "Handgun",
289: "Whiteboard",
290: "Lizard",
291: "Pasta",
292: "Snowmobile",
293: "Light bulb",
294: "Window blind",
295: "Muffin",
296: "Pretzel",
297: "Computer monitor",
298: "Horn",
299: "Furniture",
300: "Sandwich",
301: "Fox",
302: "Convenience store",
303: "Fish",
304: "Fruit",
305: "Earrings",
306: "Curtain",
307: "Grape",
308: "Sofa bed",
309: "Horse",
310: "Luggage and bags",
311: "Desk",
312: "Crutch",
313: "Bicycle helmet",
314: "Tick",
315: "Airplane",
316: "Canary",
317: "Spatula",
318: "Watch",
319: "Lily",
320: "Kitchen appliance",
321: "Filing cabinet",
322: "Aircraft",
323: "Cake stand",
324: "Candy",
325: "Sink",
326: "Mouse",
327: "Wine",
328: "Wheelchair",
329: "Goldfish",
330: "Refrigerator",
331: "French fries",
332: "Drawer",
333: "Treadmill",
334: "Picnic basket",
335: "Dice",
336: "Cabbage",
337: "Football helmet",
338: "Pig",
339: "Person",
340: "Shorts",
341: "Gondola",
342: "Honeycomb",
343: "Doughnut",
344: "Chest of drawers",
345: "Land vehicle",
346: "Bat",
347: "Monkey",
348: "Dagger",
349: "Tableware",
350: "Human foot",
351: "Mug",
352: "Alarm clock",
353: "Pressure cooker",
354: "Human hand",
355: "Tortoise",
356: "Baseball glove",
357: "Sword",
358: "Pear",
359: "Miniskirt",
360: "Traffic sign",
361: "Girl",
362: "Roller skates",
363: "Dinosaur",
364: "Porch",
365: "Human beard",
366: "Submarine sandwich",
367: "Screwdriver",
368: "Strawberry",
369: "Wine glass",
370: "Seafood",
371: "Racket",
372: "Wheel",
373: "Sea lion",
374: "Toy",
375: "Tea",
376: "Tennis ball",
377: "Waste container",
378: "Mule",
379: "Cricket ball",
380: "Pineapple",
381: "Coconut",
382: "Doll",
383: "Coffee table",
384: "Snowman",
385: "Lavender",
386: "Shrimp",
387: "Maple",
388: "Cowboy hat",
389: "Goggles",
390: "Rugby ball",
391: "Caterpillar",
392: "Poster",
393: "Rocket",
394: "Organ",
395: "Saxophone",
396: "Traffic light",
397: "Cocktail",
398: "Plastic bag",
399: "Squash",
400: "Mushroom",
401: "Hamburger",
402: "Light switch",
403: "Parachute",
404: "Teddy bear",
405: "Winter melon",
406: "Deer",
407: "Musical keyboard",
408: "Plumbing fixture",
409: "Scoreboard",
410: "Baseball bat",
411: "Envelope",
412: "Adhesive tape",
413: "Briefcase",
414: "Paddle",
415: "Bow and arrow",
416: "Telephone",
417: "Sheep",
418: "Jacket",
419: "Boy",
420: "Pizza",
421: "Otter",
422: "Office supplies",
423: "Couch",
424: "Cello",
425: "Bull",
426: "Camel",
427: "Ball",
428: "Duck",
429: "Whale",
430: "Shirt",
431: "Tank",
432: "Motorcycle",
433: "Accordion",
434: "Owl",
435: "Porcupine",
436: "Sun hat",
437: "Nail",
438: "Scissors",
439: "Swan",
440: "Lamp",
441: "Crown",
442: "Piano",
443: "Sculpture",
444: "Cheetah",
445: "Oboe",
446: "Tin can",
447: "Mango",
448: "Tripod",
449: "Oven",
450: "Mouse",
451: "Barge",
452: "Coffee",
453: "Snowboard",
454: "Common fig",
455: "Salad",
456: "Marine invertebrates",
457: "Umbrella",
458: "Kangaroo",
459: "Human arm",
460: "Measuring cup",
461: "Snail",
462: "Loveseat",
463: "Suit",
464: "Teapot",
465: "Bottle",
466: "Alpaca",
467: "Kettle",
468: "Trousers",
469: "Popcorn",
470: "Centipede",
471: "Spider",
472: "Sparrow",
473: "Plate",
474: "Bagel",
475: "Personal care",
476: "Apple",
477: "Brassiere",
478: "Bathroom cabinet",
479: "studio couch",
480: "Computer keyboard",
481: "Table tennis racket",
482: "Sushi",
483: "Cabinetry",
484: "Street light",
485: "Towel",
486: "Nightstand",
487: "Rabbit",
488: "Dolphin",
489: "Dog",
490: "Jug",
491: "Wok",
492: "Fire hydrant",
493: "Human eye",
494: "Skyscraper",
495: "Backpack",
496: "Potato",
497: "Paper towel",
498: "Lifejacket",
499: "Bicycle wheel",
500: "Toilet",
}
if not with_background:
clsid2catid = {k - 1: v for k, v in clsid2catid.items()}
return clsid2catid, catid2name
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import numpy as np
from ppdet.data.source.widerface import widerface_label
from ppdet.utils.coco_eval import bbox2out
import logging
logger = logging.getLogger(__name__)
__all__ = [
'get_shrink', 'bbox_vote', 'save_widerface_bboxes', 'save_fddb_bboxes',
'to_chw_bgr', 'bbox2out', 'get_category_info'
]
def to_chw_bgr(image):
"""
Transpose image from HWC to CHW and from RBG to BGR.
Args:
image (np.array): an image with HWC and RBG layout.
"""
# HWC to CHW
if len(image.shape) == 3:
image = np.swapaxes(image, 1, 2)
image = np.swapaxes(image, 1, 0)
# RBG to BGR
image = image[[2, 1, 0], :, :]
return image
def bbox_vote(det):
order = det[:, 4].ravel().argsort()[::-1]
det = det[order, :]
if det.shape[0] == 0:
dets = np.array([[10, 10, 20, 20, 0.002]])
det = np.empty(shape=[0, 5])
while det.shape[0] > 0:
# IOU
area = (det[:, 2] - det[:, 0] + 1) * (det[:, 3] - det[:, 1] + 1)
xx1 = np.maximum(det[0, 0], det[:, 0])
yy1 = np.maximum(det[0, 1], det[:, 1])
xx2 = np.minimum(det[0, 2], det[:, 2])
yy2 = np.minimum(det[0, 3], det[:, 3])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
o = inter / (area[0] + area[:] - inter)
# nms
merge_index = np.where(o >= 0.3)[0]
det_accu = det[merge_index, :]
det = np.delete(det, merge_index, 0)
if merge_index.shape[0] <= 1:
if det.shape[0] == 0:
try:
dets = np.row_stack((dets, det_accu))
except:
dets = det_accu
continue
det_accu[:, 0:4] = det_accu[:, 0:4] * np.tile(det_accu[:, -1:], (1, 4))
max_score = np.max(det_accu[:, 4])
det_accu_sum = np.zeros((1, 5))
det_accu_sum[:, 0:4] = np.sum(det_accu[:, 0:4],
axis=0) / np.sum(det_accu[:, -1:])
det_accu_sum[:, 4] = max_score
try:
dets = np.row_stack((dets, det_accu_sum))
except:
dets = det_accu_sum
dets = dets[0:750, :]
# Only keep 0.3 or more
keep_index = np.where(dets[:, 4] >= 0.01)[0]
dets = dets[keep_index, :]
return dets
def get_shrink(height, width):
"""
Args:
height (int): image height.
width (int): image width.
"""
# avoid out of memory
max_shrink_v1 = (0x7fffffff / 577.0 / (height * width))**0.5
max_shrink_v2 = ((678 * 1024 * 2.0 * 2.0) / (height * width))**0.5
def get_round(x, loc):
str_x = str(x)
if '.' in str_x:
str_before, str_after = str_x.split('.')
len_after = len(str_after)
if len_after >= 3:
str_final = str_before + '.' + str_after[0:loc]
return float(str_final)
else:
return x
max_shrink = get_round(min(max_shrink_v1, max_shrink_v2), 2) - 0.3
if max_shrink >= 1.5 and max_shrink < 2:
max_shrink = max_shrink - 0.1
elif max_shrink >= 2 and max_shrink < 3:
max_shrink = max_shrink - 0.2
elif max_shrink >= 3 and max_shrink < 4:
max_shrink = max_shrink - 0.3
elif max_shrink >= 4 and max_shrink < 5:
max_shrink = max_shrink - 0.4
elif max_shrink >= 5:
max_shrink = max_shrink - 0.5
elif max_shrink <= 0.1:
max_shrink = 0.1
shrink = max_shrink if max_shrink < 1 else 1
return shrink, max_shrink
def save_widerface_bboxes(image_path, bboxes_scores, output_dir):
image_name = image_path.split('/')[-1]
image_class = image_path.split('/')[-2]
odir = os.path.join(output_dir, image_class)
if not os.path.exists(odir):
os.makedirs(odir)
ofname = os.path.join(odir, '%s.txt' % (image_name[:-4]))
f = open(ofname, 'w')
f.write('{:s}\n'.format(image_class + '/' + image_name))
f.write('{:d}\n'.format(bboxes_scores.shape[0]))
for box_score in bboxes_scores:
xmin, ymin, xmax, ymax, score = box_score
f.write('{:.1f} {:.1f} {:.1f} {:.1f} {:.3f}\n'.format(xmin, ymin, (
xmax - xmin + 1), (ymax - ymin + 1), score))
f.close()
logger.info("The predicted result is saved as {}".format(ofname))
def save_fddb_bboxes(bboxes_scores,
output_dir,
output_fname='pred_fddb_res.txt'):
if not os.path.exists(output_dir):
os.makedirs(output_dir)
predict_file = os.path.join(output_dir, output_fname)
f = open(predict_file, 'w')
for image_path, dets in bboxes_scores.iteritems():
f.write('{:s}\n'.format(image_path))
f.write('{:d}\n'.format(dets.shape[0]))
for box_score in dets:
xmin, ymin, xmax, ymax, score = box_score
width, height = xmax - xmin, ymax - ymin
f.write('{:.1f} {:.1f} {:.1f} {:.1f} {:.3f}\n'
.format(xmin, ymin, width, height, score))
logger.info("The predicted result is saved as {}".format(predict_file))
return predict_file
def get_category_info(anno_file=None,
with_background=True,
use_default_label=False):
if use_default_label or anno_file is None \
or not os.path.exists(anno_file):
logger.info("Not found annotation file {}, load "
"wider-face categories.".format(anno_file))
return widerfaceall_category_info(with_background)
else:
logger.info("Load categories from {}".format(anno_file))
return get_category_info_from_anno(anno_file, with_background)
def get_category_info_from_anno(anno_file, with_background=True):
"""
Get class id to category id map and category id
to category name map from annotation file.
Args:
anno_file (str): annotation file path
with_background (bool, default True):
whether load background as class 0.
"""
cats = []
with open(anno_file) as f:
for line in f.readlines():
cats.append(line.strip())
if cats[0] != 'background' and with_background:
cats.insert(0, 'background')
if cats[0] == 'background' and not with_background:
cats = cats[1:]
clsid2catid = {i: i for i in range(len(cats))}
catid2name = {i: name for i, name in enumerate(cats)}
return clsid2catid, catid2name
def widerfaceall_category_info(with_background=True):
"""
Get class id to category id map and category id
to category name map of mixup wider_face dataset
Args:
with_background (bool, default True):
whether load background as class 0.
"""
label_map = widerface_label(with_background)
label_map = sorted(label_map.items(), key=lambda x: x[1])
cats = [l[0] for l in label_map]
if with_background:
cats.insert(0, 'background')
clsid2catid = {i: i for i in range(len(cats))}
catid2name = {i: name for i, name in enumerate(cats)}
return clsid2catid, catid2name
......@@ -29,6 +29,7 @@ import random
import datetime
import numpy as np
from collections import deque
import paddle
from ppdet.core.workspace import load_config, merge_config, create
from ppdet.utils.stats import TrainingStats
......@@ -37,6 +38,7 @@ from ppdet.utils.cli import ArgsParser
from ppdet.utils.checkpoint import load_weight, load_pretrain_weight, save_model
from export_model import dygraph_to_static
from paddle.distributed import ParallelEnv
import logging
FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
......@@ -71,16 +73,6 @@ def parse_args():
default=None,
type=str,
help="Evaluation directory, default is current directory.")
parser.add_argument(
"--use_tb",
type=bool,
default=False,
help="whether to record the data to Tensorboard.")
parser.add_argument(
'--tb_log_dir',
type=str,
default="tb_log_dir/scalar",
help='Tensorboard logging directory for scalar.')
parser.add_argument(
"--enable_ce",
type=bool,
......@@ -89,13 +81,6 @@ def parse_args():
"This flag is only used for internal test.")
parser.add_argument(
"--use_gpu", action='store_true', default=False, help="data parallel")
parser.add_argument(
'--is_profiler',
type=int,
default=0,
help='The switch of profiler tools. (used for benchmark)')
args = parser.parse_args()
return args
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册