Clean code (#1895)

* Clean code * rm py_op/post_processing.py and utils/data_structure.py

Clean code (#1895)
* Clean code * rm py_op/post_processing.py and utils/data_structure.py
eb8b4899 · qingqing01 · GitHub · ad353419 · eb8b4899 · ad353419
28 changed file
--- a/README.md
+++ b/README.md
-简体中文 | [English](README_en.md)
-
-文档：[https://paddledetection.readthedocs.io](https://paddledetection.readthedocs.io)
-
 # PaddleDetection

-飞桨推出的PaddleDetection是端到端目标检测开发套件，旨在帮助开发者更快更好地完成检测模型的训练、精度速度优化到部署全流程。PaddleDetection以模块化的设计实现了多种主流目标检测算法，并且提供了丰富的数据增强、网络组件、损失函数等模块，集成了模型压缩和跨平台高性能部署能力。目前基于PaddleDetection已经完成落地的项目涉及工业质检、遥感图像检测、无人巡检等多个领域。
-
-**目前检测库下模型均要求使用PaddlePaddle 1.7及以上版本或适当的develop版本。**
-
-<div align="center">
-  <img src="docs/images/000000570688.jpg" />
-</div>
-
-
-## 简介
-
-特性：
-
- 模型丰富：
-
-  PaddleDetection提供了丰富的模型，包含目标检测、实例分割、人脸检测等100+个预训练模型，涵盖多种数据集竞赛冠军方案、适合云端/边缘端设备部署的检测方案。
-
- 易部署:
-
-  PaddleDetection的模型中使用的核心算子均通过C++或CUDA实现，同时基于PaddlePaddle的高性能推理引擎可以方便地部署在多种硬件平台上。
-
- 高灵活度：
-
-  PaddleDetection通过模块化设计来解耦各个组件，基于配置文件可以轻松地搭建各种检测模型。
-
- 高性能：
-
-  基于PaddlePaddle框架的高性能内核，在模型训练速度、显存占用上有一定的优势。例如，YOLOv3的训练速度快于其他框架，在Tesla V100 16GB环境下，Mask-RCNN(ResNet50)可以单卡Batch Size可以达到4 (甚至到5)。
-
+动态图版本的PaddleDetection, 支持的模型:

-支持的模型结构：
-
-|                    | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt-vd | SENet | MobileNet |  HRNet | Res2Net |
-|--------------------|:------:|------------------------------:|:----------:|:-----:|:---------:|:------:| :--:    |
-| Faster R-CNN       | ✓      |                             ✓ | x          | ✓     | ✗         |  ✗     |  ✗      |
-| Faster R-CNN + FPN | ✓      |                             ✓ | ✓          | ✓     | ✗         |  ✓     |  ✓      |
-| Mask R-CNN         | ✓      |                             ✓ | x          | ✓     | ✗         |  ✗     |  ✗      |
-| Mask R-CNN + FPN   | ✓      |                             ✓ | ✓          | ✓     | ✗         |  ✗     |  ✓      |
-| Cascade Faster-RCNN | ✓     |                             ✓ | ✓          | ✗     | ✗         |  ✗     |  ✗      |
-| Cascade Mask-RCNN  | ✓      |                             ✗ | ✗          | ✓     | ✗         |  ✗     |  ✗      |
-| Libra R-CNN        | ✗      |                             ✓ | ✗          | ✗     | ✗         |  ✗     |  ✗      |
-| RetinaNet          | ✓      |                             ✗ | ✓          | ✗     | ✗         |  ✗     |  ✗      |
-| YOLOv3             | ✓      |                             ✗ | ✗          | ✗     | ✓         |  ✗     |  ✗      |
-| SSD                | ✗      |                             ✗ | ✗          | ✗     | ✓         |  ✗     |  ✗      |
-| BlazeFace          | ✗      |                             ✗ | ✗          | ✗     | ✗         |  ✗     |  ✗      |
-| Faceboxes          | ✗      |                             ✗ | ✗          | ✗     | ✗         |  ✗     |  ✗      |
-
-<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) 模型预测速度基本不变的情况下提高了精度。
-
-更多的模型:
-
- EfficientDet
- FCOS
- CornerNet-Squeeze
- YOLOv4
-
-更多的Backone：
-
- DarkNet
- VGG
- GCNet
- CBNet
- Hourglass
+- Faster-RCNN (FPN)
+- Mask-RCNN (FPN)
+- Cascade RCNN
+- YOLOv3

 扩展特性：

@@ -74,45 +13,15 @@
 - [x] **Group Norm**
 - [x] **Modulated Deformable Convolution**
 - [x] **Deformable PSRoI Pooling**
- [x] **Non-local和GCNet**
-
-**注意:** Synchronized batch normalization 只能在多GPU环境下使用，不能在CPU环境或者单GPU环境下使用。
-
-以下为选取各模型结构和骨干网络的代表模型COCO数据集精度mAP和单卡Tesla V100上预测速度(FPS)关系图。
-
-<div align="center">
-  <img src="docs/images/map_fps.png" />
-</div>
-
-**说明：**
- `CBResNet`为`Cascade-Faster-RCNN-CBResNet200vd-FPN`模型，COCO数据集mAP高达53.3%
- `Cascade-Faster-RCNN`为`Cascade-Faster-RCNN-ResNet50vd-DCN`，PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS
- PaddleDetection增强版`YOLOv3-ResNet50vd-DCN`在COCO数据集mAP高于原作10.6个绝对百分点，推理速度为61.3FPS，快于原作约70%
- 图中模型均可在[模型库](#模型库)中获取

 ## 文档教程

-### 入门教程
+### 教程

 - [安装说明](docs/tutorials/INSTALL_cn.md)
 - [快速开始](docs/tutorials/QUICK_STARTED_cn.md)
 - [训练/评估/预测流程](docs/tutorials/GETTING_STARTED_cn.md)
 - [常见问题汇总](docs/FAQ.md)
-
-### 进阶教程
- [数据预处理及自定义数据集](docs/advanced_tutorials/READER.md)
- [搭建模型步骤](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- [模型参数配置](docs/advanced_tutorials/config_doc):
-  - [配置模块设计和介绍](docs/advanced_tutorials/config_doc/CONFIG_cn.md)
-  - [RCNN模型参数说明](docs/advanced_tutorials/config_doc/RCNN_PARAMS_DOC.md)
- [迁移学习教程](docs/advanced_tutorials/TRANSFER_LEARNING_cn.md)
- [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
- [模型压缩](slim)
-    - [压缩benchmark](slim)
-    - [量化](slim/quantization)
-    - [剪枝](slim/prune)
-    - [蒸馏](slim/distillation)
-    - [神经网络搜索](slim/nas)
 - [推理部署](deploy)
    - [模型导出教程](docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
    - [Python端推理部署](deploy/python)
@@ -120,25 +29,7 @@
    - [推理Benchmark](docs/advanced_tutorials/deploy/BENCHMARK_INFER_cn.md)

 ## 模型库
-
 - [模型库](docs/MODEL_ZOO_cn.md)
- [移动端模型](configs/mobile/README.md)
- [Anchor free模型](configs/anchor_free/README.md)
- [人脸检测模型](docs/featured_model/FACE_DETECTION.md)
- [YOLOv3增强模型](docs/featured_model/YOLOv3_ENHANCEMENT.md): COCO mAP高达43.6%，原论文精度为33.0%
- [行人检测预训练模型](docs/featured_model/CONTRIB_cn.md)
- [车辆检测预训练模型](docs/featured_model/CONTRIB_cn.md)
- [Objects365 2019 Challenge夺冠模型](docs/featured_model/champion_model/CACascadeRCNN.md)
- [Open Images 2019-Object Detction比赛最佳单模型](docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
- [服务器端实用目标检测模型](configs/rcnn_enhance/README.md): V100上速度20FPS时，COCO mAP高达47.8%。
-

 ## 许可证书
 本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
-
-## 版本更新
-v0.3.0版本已经在`05/2020`发布，增加Anchor-free、EfficientDet和YOLOv4等多个模型，推出移动端、服务器端实用高效多个模型，例如移动端将YOLOv3-MobileNetv3加速3.5倍，服务器端优化两阶段模型，速度和精度具备较高性价比。重构预测部署功能，提升易用性，修复已知诸多bug等，详细内容请参考[版本更新文档](docs/CHANGELOG.md)。
-
-## 如何贡献代码
-
-我们非常欢迎你可以为PaddleDetection提供代码，也十分感谢你的反馈。
--- a/dataset/fddb/download.sh
+++ b/dataset/fddb/download.sh
-# All rights `PaddleDetection` reserved
-# References:
-#   @TechReport{fddbTech,
-#      author = {Vidit Jain and Erik Learned-Miller},
-#      title =  {FDDB: A Benchmark for Face Detection in Unconstrained Settings},
-#      institution =  {University of Massachusetts, Amherst},
-#      year = {2010},
-#      number = {UM-CS-2010-009}
-#   }
-
-DIR="$( cd "$(dirname "$0")" ; pwd -P )"
-cd "$DIR"
-
-# Download the data.
-echo "Downloading..."
-# external link to the Faces in the Wild data set and annotations file
-wget http://tamaraberg.com/faceDataset/originalPics.tar.gz
-wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz
-wget http://vis-www.cs.umass.edu/fddb/evaluation.tgz
-
-# Extract the data.
-echo "Extracting..."
-tar -zxf originalPics.tar.gz
-tar -zxf FDDB-folds.tgz
-tar -zxf evaluation.tgz
-
-# Generate full image path list and groundtruth in FDDB-folds:
-cd FDDB-folds
-cat `ls|grep -v"ellipse"` > filePath.txt && cat *ellipse* > fddb_annotFile.txt
-cd ..
-echo "-------------   All done!   --------------"
--- a/dataset/fruit/download_fruit.py
+++ b/dataset/fruit/download_fruit.py
-# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import sys
-import os.path as osp
-import logging
-# add python path of PadleDetection to sys.path
-parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3)))
-if parent_path not in sys.path:
-    sys.path.append(parent_path)
-
-from ppdet.utils.download import download_dataset
-
-logging.basicConfig(level=logging.INFO)
-
-download_path = osp.split(osp.realpath(sys.argv[0]))[0]
-download_dataset(download_path, 'fruit')
--- a/dataset/fruit/label_list.txt
+++ b/dataset/fruit/label_list.txt
-apple
-banana
-orange
--- a/dataset/wider_face/download.sh
+++ b/dataset/wider_face/download.sh
-# All rights `PaddleDetection` reserved
-# References:
-#   @inproceedings{yang2016wider,
-#   Author = {Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou},
-#   Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
-#   Title = {WIDER FACE: A Face Detection Benchmark},
-#   Year = {2016}}
-
-DIR="$( cd "$(dirname "$0")" ; pwd -P )"
-cd "$DIR"
-
-# Download the data.
-echo "Downloading..."
-wget https://dataset.bj.bcebos.com/wider_face/WIDER_train.zip
-wget https://dataset.bj.bcebos.com/wider_face/WIDER_val.zip
-wget https://dataset.bj.bcebos.com/wider_face/wider_face_split.zip
-# Extract the data.
-echo "Extracting..."
-unzip WIDER_train.zip
-unzip WIDER_val.zip
-unzip wider_face_split.zip
--- a/ppdet/data/tests/test.yml
+++ b/ppdet/data/tests/test.yml
-TrainReader:
-  inputs_def:
-    fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_mask']
-  dataset:
-    !COCODataSet
-    image_dir: val2017
-    anno_path: annotations/instances_val2017.json
-    dataset_dir: dataset/coco
-    sample_num: 10
-  sample_transforms:
-  - !DecodeImage
-    to_rgb: true
-    with_mixup: false
-  - !RandomFlipImage
-    is_mask_flip: true
-    is_normalized: false
-    prob: 0.5
-  - !NormalizeImage
-    is_channel_first: false
-    is_scale: true
-    mean: [0.485,0.456,0.406]
-    std: [0.229, 0.224,0.225]
-  - !ResizeImage
-    interp: 1
-    max_size: 1333
-    target_size: 800
-    use_cv2: true
-  - !Permute
-    channel_first: true
-    to_bgr: false
-  batch_transforms:
-  - !PadBatch
-    pad_to_stride: 32
-    use_padded_im_info: false
-  batch_size: 1
-  shuffle: true
-  worker_num: 2
-  drop_last: false
-  use_process: false
-
-EvalReader:
-  inputs_def:
-    fields: ['image', 'im_info', 'im_id']
-  dataset:
-    !COCODataSet
-    image_dir: val2017
-    anno_path: annotations/instances_val2017.json
-    dataset_dir: dataset/coco
-    sample_num: 10
-  sample_transforms:
-  - !DecodeImage
-    to_rgb: true
-    with_mixup: false
-  - !NormalizeImage
-    is_channel_first: false
-    is_scale: true
-    mean: [0.485,0.456,0.406]
-    std: [0.229, 0.224,0.225]
-  - !ResizeImage
-    interp: 1
-    max_size: 1333
-    target_size: 800
-    use_cv2: true
-  - !Permute
-    channel_first: true
-    to_bgr: false
-  batch_transforms:
-  - !PadBatch
-    pad_to_stride: 32
-    use_padded_im_info: true
-  batch_size: 1
-  shuffle: false
-  drop_last: false
--- a/ppdet/data/tests/test_dataset.py
+++ b/ppdet/data/tests/test_dataset.py
-#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import os
-import time
-import unittest
-import sys
-import logging
-import random
-import copy
-# add python path of PadleDetection to sys.path
-parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4)))
-if parent_path not in sys.path:
-    sys.path.append(parent_path)
-
-from ppdet.data.parallel_map import ParallelMap
-
-
-class MemorySource(object):
-    """ memory data source for testing
-    """
-
-    def __init__(self, samples):
-        self._epoch = -1
-
-        self._pos = -1
-        self._drained = False
-        self._samples = samples
-
-    def __iter__(self):
-        return self
-
-    def __next__(self):
-        return self.next()
-
-    def next(self):
-        if self._epoch < 0:
-            self.reset()
-
-        if self._pos >= self.size():
-            self._drained = True
-            raise StopIteration("no more data in " + str(self))
-        else:
-            sample = copy.deepcopy(self._samples[self._pos])
-            self._pos += 1
-            return sample
-
-    def reset(self):
-        if self._epoch < 0:
-            self._epoch = 0
-        else:
-            self._epoch += 1
-
-        self._pos = 0
-        self._drained = False
-        random.shuffle(self._samples)
-
-    def size(self):
-        return len(self._samples)
-
-    def drained(self):
-        assert self._epoch >= 0, "the first epoch has not started yet"
-        return self._pos >= self.size()
-
-    def epoch_id(self):
-        return self._epoch
-
-
-class TestDataset(unittest.TestCase):
-    """Test cases for ppdet.data.dataset
-    """
-
-    @classmethod
-    def setUpClass(cls):
-        """ setup
-        """
-        pass
-
-    @classmethod
-    def tearDownClass(cls):
-        """ tearDownClass """
-        pass
-
-    def test_next(self):
-        """ test next
-        """
-        samples = list(range(10))
-        mem_sc = MemorySource(samples)
-
-        for i, d in enumerate(mem_sc):
-            self.assertTrue(d in samples)
-
-    def test_transform_with_abnormal_worker(self):
-        """ test dataset transform with abnormally exit process
-        """
-        samples = list(range(20))
-        mem_sc = MemorySource(samples)
-
-        def _worker(sample):
-            if sample == 3:
-                sys.exit(1)
-
-            return 2 * sample
-
-        test_worker = ParallelMap(
-            mem_sc, _worker, worker_num=2, use_process=True, memsize='2M')
-
-        ct = 0
-        for i, d in enumerate(test_worker):
-            ct += 1
-            self.assertTrue(d / 2 in samples)
-
-        self.assertEqual(len(samples) - 1, ct)
-
-    def test_transform_with_delay_worker(self):
-        """ test dataset transform with delayed process
-        """
-        samples = list(range(20))
-        mem_sc = MemorySource(samples)
-
-        def _worker(sample):
-            if sample == 3:
-                time.sleep(30)
-
-            return 2 * sample
-
-        test_worker = ParallelMap(
-            mem_sc, _worker, worker_num=2, use_process=True, memsize='2M')
-
-        ct = 0
-        for i, d in enumerate(test_worker):
-            ct += 1
-            self.assertTrue(d / 2 in samples)
-
-        self.assertEqual(len(samples), ct)
-
-
-if __name__ == '__main__':
-    logging.basicConfig()
-    unittest.main()
--- a/ppdet/data/tests/test_loader.py
+++ b/ppdet/data/tests/test_loader.py
-#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import unittest
-import os
-import sys
-# add python path of PadleDetection to sys.path
-parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4)))
-if parent_path not in sys.path:
-    sys.path.append(parent_path)
-
-from ppdet.data.source.coco import COCODataSet
-from ppdet.data.reader import Reader
-from ppdet.utils.download import get_path
-from ppdet.utils.download import DATASET_HOME
-
-from ppdet.data.transform.operators import DecodeImage, ResizeImage, Permute
-from ppdet.data.transform.batch_operators import PadBatch
-
-COCO_VAL_URL = 'http://images.cocodataset.org/zips/val2017.zip'
-COCO_VAL_MD5SUM = '442b8da7639aecaf257c1dceb8ba8c80'
-COCO_ANNO_URL = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
-COCO_ANNO_MD5SUM = 'f4bbac642086de4f52a3fdda2de5fa2c'
-
-
-class TestReader(unittest.TestCase):
-    @classmethod
-    def setUpClass(cls):
-        """ setup
-        """
-        root_path = os.path.join(DATASET_HOME, 'coco')
-        _, _ = get_path(COCO_VAL_URL, root_path, COCO_VAL_MD5SUM)
-        _, _ = get_path(COCO_ANNO_URL, root_path, COCO_ANNO_MD5SUM)
-        cls.anno_path = 'annotations/instances_val2017.json'
-        cls.image_dir = 'val2017'
-        cls.root_path = root_path
-
-    @classmethod
-    def tearDownClass(cls):
-        """ tearDownClass """
-        pass
-
-    def test_loader(self):
-        coco_loader = COCODataSet(
-            dataset_dir=self.root_path,
-            image_dir=self.image_dir,
-            anno_path=self.anno_path,
-            sample_num=10)
-        sample_trans = [
-            DecodeImage(to_rgb=True), ResizeImage(
-                target_size=800, max_size=1333, interp=1), Permute(to_bgr=False)
-        ]
-        batch_trans = [PadBatch(pad_to_stride=32, use_padded_im_info=True), ]
-
-        inputs_def = {
-            'fields': [
-                'image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd',
-                'gt_mask'
-            ],
-        }
-        data_loader = Reader(
-            coco_loader,
-            sample_transforms=sample_trans,
-            batch_transforms=batch_trans,
-            batch_size=2,
-            shuffle=True,
-            drop_empty=True,
-            inputs_def=inputs_def)()
-        for i in range(2):
-            for samples in data_loader:
-                for sample in samples:
-                    im_shape = sample[0].shape
-                    self.assertEqual(im_shape[0], 3)
-                    self.assertEqual(im_shape[1] % 32, 0)
-                    self.assertEqual(im_shape[2] % 32, 0)
-
-                    im_info_shape = sample[1].shape
-                    self.assertEqual(im_info_shape[-1], 3)
-
-                    im_id_shape = sample[2].shape
-                    self.assertEqual(im_id_shape[-1], 1)
-
-                    gt_bbox_shape = sample[3].shape
-                    self.assertEqual(gt_bbox_shape[-1], 4)
-
-                    gt_class_shape = sample[4].shape
-                    self.assertEqual(gt_class_shape[-1], 1)
-                    self.assertEqual(gt_class_shape[0], gt_bbox_shape[0])
-
-                    is_crowd_shape = sample[5].shape
-                    self.assertEqual(is_crowd_shape[-1], 1)
-                    self.assertEqual(is_crowd_shape[0], gt_bbox_shape[0])
-
-                    mask = sample[6]
-                    self.assertEqual(len(mask), gt_bbox_shape[0])
-                    self.assertEqual(mask[0][0].shape[-1], 2)
-            data_loader.reset()
-
-    def test_loader_multi_threads(self):
-        coco_loader = COCODataSet(
-            dataset_dir=self.root_path,
-            image_dir=self.image_dir,
-            anno_path=self.anno_path,
-            sample_num=10)
-        sample_trans = [
-            DecodeImage(to_rgb=True), ResizeImage(
-                target_size=800, max_size=1333, interp=1), Permute(to_bgr=False)
-        ]
-        batch_trans = [PadBatch(pad_to_stride=32, use_padded_im_info=True), ]
-
-        inputs_def = {
-            'fields': [
-                'image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd',
-                'gt_mask'
-            ],
-        }
-        data_loader = Reader(
-            coco_loader,
-            sample_transforms=sample_trans,
-            batch_transforms=batch_trans,
-            batch_size=2,
-            shuffle=True,
-            drop_empty=True,
-            worker_num=2,
-            use_process=False,
-            bufsize=8,
-            inputs_def=inputs_def)()
-        for i in range(2):
-            for samples in data_loader:
-                for sample in samples:
-                    im_shape = sample[0].shape
-                    self.assertEqual(im_shape[0], 3)
-                    self.assertEqual(im_shape[1] % 32, 0)
-                    self.assertEqual(im_shape[2] % 32, 0)
-
-                    im_info_shape = sample[1].shape
-                    self.assertEqual(im_info_shape[-1], 3)
-
-                    im_id_shape = sample[2].shape
-                    self.assertEqual(im_id_shape[-1], 1)
-
-                    gt_bbox_shape = sample[3].shape
-                    self.assertEqual(gt_bbox_shape[-1], 4)
-
-                    gt_class_shape = sample[4].shape
-                    self.assertEqual(gt_class_shape[-1], 1)
-                    self.assertEqual(gt_class_shape[0], gt_bbox_shape[0])
-
-                    is_crowd_shape = sample[5].shape
-                    self.assertEqual(is_crowd_shape[-1], 1)
-                    self.assertEqual(is_crowd_shape[0], gt_bbox_shape[0])
-
-                    mask = sample[6]
-                    self.assertEqual(len(mask), gt_bbox_shape[0])
-                    self.assertEqual(mask[0][0].shape[-1], 2)
-            data_loader.reset()
-
-
-if __name__ == '__main__':
-    unittest.main()
--- a/ppdet/data/tests/test_loader_yaml.py
+++ b/ppdet/data/tests/test_loader_yaml.py
-#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import unittest
-import os
-import yaml
-import logging
-import sys
-# add python path of PadleDetection to sys.path
-parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4)))
-if parent_path not in sys.path:
-    sys.path.append(parent_path)
-
-from ppdet.utils.download import get_path
-from ppdet.utils.download import DATASET_HOME
-from ppdet.core.workspace import load_config, merge_config
-
-from ppdet.data.reader import create_reader
-
-COCO_VAL_URL = 'http://images.cocodataset.org/zips/val2017.zip'
-COCO_VAL_MD5SUM = '442b8da7639aecaf257c1dceb8ba8c80'
-COCO_ANNO_URL = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
-COCO_ANNO_MD5SUM = 'f4bbac642086de4f52a3fdda2de5fa2c'
-
-FORMAT = '[%(asctime)s-%(filename)s-%(levelname)s:%(message)s]'
-logging.basicConfig(level=logging.INFO, format=FORMAT)
-logger = logging.getLogger(__name__)
-
-
-class TestReaderYAML(unittest.TestCase):
-    @classmethod
-    def setUpClass(cls):
-        """ setup
-        """
-        root_path = os.path.join(DATASET_HOME, 'coco')
-        _, _ = get_path(COCO_VAL_URL, root_path, COCO_VAL_MD5SUM)
-        _, _ = get_path(COCO_ANNO_URL, root_path, COCO_ANNO_MD5SUM)
-        cls.anno_path = 'annotations/instances_val2017.json'
-        cls.image_dir = 'val2017'
-        cls.root_path = root_path
-
-    @classmethod
-    def tearDownClass(cls):
-        """ tearDownClass """
-        pass
-
-    def test_loader_yaml(self):
-        cfg_file = 'ppdet/data/tests/test.yml'
-        cfg = load_config(cfg_file)
-        data_cfg = '[!COCODataSet {{image_dir: {0}, dataset_dir: {1}, ' \
-            'anno_path: {2}, sample_num: 10}}]'.format(
-                self.image_dir, self.root_path, self.anno_path)
-        dataset_ins = yaml.load(data_cfg, Loader=yaml.Loader)
-        update_train_cfg = {'TrainReader': {'dataset': dataset_ins[0]}}
-        update_test_cfg = {'EvalReader': {'dataset': dataset_ins[0]}}
-        merge_config(update_train_cfg)
-        merge_config(update_test_cfg)
-
-        reader = create_reader(cfg['TrainReader'], 10)()
-        for samples in reader:
-            for sample in samples:
-                im_shape = sample[0].shape
-                self.assertEqual(im_shape[0], 3)
-                self.assertEqual(im_shape[1] % 32, 0)
-                self.assertEqual(im_shape[2] % 32, 0)
-
-                im_info_shape = sample[1].shape
-                self.assertEqual(im_info_shape[-1], 3)
-
-                im_id_shape = sample[2].shape
-                self.assertEqual(im_id_shape[-1], 1)
-
-                gt_bbox_shape = sample[3].shape
-                self.assertEqual(gt_bbox_shape[-1], 4)
-
-                gt_class_shape = sample[4].shape
-                self.assertEqual(gt_class_shape[-1], 1)
-                self.assertEqual(gt_class_shape[0], gt_bbox_shape[0])
-
-                is_crowd_shape = sample[5].shape
-                self.assertEqual(is_crowd_shape[-1], 1)
-                self.assertEqual(is_crowd_shape[0], gt_bbox_shape[0])
-
-                mask = sample[6]
-                self.assertEqual(len(mask), gt_bbox_shape[0])
-                self.assertEqual(mask[0][0].shape[-1], 2)
-
-        reader = create_reader(cfg['EvalReader'], 10)()
-        for samples in reader:
-            for sample in samples:
-                im_shape = sample[0].shape
-                self.assertEqual(im_shape[0], 3)
-                self.assertEqual(im_shape[1] % 32, 0)
-                self.assertEqual(im_shape[2] % 32, 0)
-
-                im_info_shape = sample[1].shape
-                self.assertEqual(im_info_shape[-1], 3)
-
-                im_id_shape = sample[2].shape
-                self.assertEqual(im_id_shape[-1], 1)
-
-
-if __name__ == '__main__':
-    unittest.main()
--- a/ppdet/ext_op/README.md
+++ b/ppdet/ext_op/README.md
-# 自定义OP的编译过程
-
-**注意：** 编译自定义OP使用的gcc版本须与Paddle编译使用gcc版本一致，Paddle develop每日版本目前采用**gcc 4.8.2**版本编译，若使用每日版本，请使用**gcc 4.8.2**版本编译自定义OP，否则可能出现兼容性问题。
-
-## 代码结构
-
-  - src: 扩展OP C++/CUDA 源码
-  - cornerpool_lib.py: Python API封装
-  - tests: 各OP单测程序
-
-
-## 编译自定义OP
-
-自定义op需要将实现的C++、CUDA代码编译成动态库，```src/mask.sh```中通过g++/nvcc编译，当然您也可以写Makefile或者CMake。
-
-编译需要include PaddlePaddle的相关头文件，链接PaddlePaddle的lib库。 头文件和lib库可通过下面命令获取到:
-
-```
-# python
->>> import paddle
->>> print(paddle.sysconfig.get_include())
-/paddle/pyenv/local/lib/python2.7/site-packages/paddle/include
->>> print(paddle.sysconfig.get_lib())
-/paddle/pyenv/local/lib/python2.7/site-packages/paddle/libs
-```
-
-我们提供动态库编译脚本如下：
-
-```
-cd src
-sh make.sh
-```
-
-最终编译会产出`cornerpool_lib.so`
-
-**说明：** 若使用源码编译安装PaddlePaddle的方式，编译过程中`cmake`未设置`WITH_MKLDNN`的方式，
-编译自定义OP时会报错找不到`mkldnn.h`等文件，可在`make.sh`中删除编译命令中的`-DPADDLE_WITH_MKLDNN`选项。
-
-
-## 设置环境变量
-
-需要将Paddle的核心库设置到`LD_LIBRARY_PATH`里, 先运行下面程序获取路径:
-
-```
-import paddle
-print(paddle.sysconfig.get_lib())
-```
-
-可通过如下方式添加动态库路径:
-
-```
-export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`python -c 'import paddle; print(paddle.sysconfig.get_lib())'`
-```
-
-
-
-## 执行单测
-
-执行下列单测，确保自定义算子可在网络中正确使用：
-
-```
-# 回到 ext_op 目录，运行单测
-cd ..
-python test/test_corner_pool.py
-```
-
-单测运行成功会输出提示信息，如下所示：
-
-```
-.
----------------------------------------------------------------------
-Ran 4 test in 2.858s
-
-OK
-```
-
-更多关于如何在框架外部自定义 C++ OP，可阅读[官网说明文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_usage/index_cn.html)
--- a/ppdet/ext_op/__init__.py
+++ b/ppdet/ext_op/__init__.py
-#  Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
-#Licensed under the Apache License, Version 2.0 (the "License");
-#you may not use this file except in compliance with the License.
-#You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-#Unless required by applicable law or agreed to in writing, software
-#distributed under the License is distributed on an "AS IS" BASIS,
-#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#See the License for the specific language governing permissions and
-#limitations under the License.
-
-from . import cornerpool_lib
-from .cornerpool_lib import *
-
-__all__ = cornerpool_lib.__all__
--- a/ppdet/ext_op/cornerpool_lib.py
+++ b/ppdet/ext_op/cornerpool_lib.py
-import os
-import paddle.fluid as fluid
-
-use_cpp = False
-
-file_dir = os.path.dirname(os.path.abspath(__file__))
-try:
-    fluid.load_op_library(os.path.join(file_dir, 'src/cornerpool_lib.so'))
-    use_cpp = True
-except:
-    print(
-        'Warning: cornerpool_lib.so not found, use python version instead which may drop the inference speed. Compile in ppdet/ext_op at first if you need cpp version.'
-    )
-
-from paddle.fluid.layer_helper import LayerHelper
-
-__all__ = [
-    'bottom_pool',
-    'top_pool',
-    'right_pool',
-    'left_pool',
-]
-
-
-def cornerpool_op(layer_type, input, name):
-    helper = LayerHelper(layer_type, input=input, name=name)
-    dtype = helper.input_dtype()
-    output = helper.create_variable_for_type_inference(dtype)
-    max_map = helper.create_variable_for_type_inference(dtype)
-    helper.append_op(
-        type=layer_type,
-        inputs={"X": input},
-        outputs={"Output": output,
-                 "MaxMap": max_map})
-    return output
-
-
-def bottom_pool(input, is_test=False, name=None):
-    """
-    This layer calculates the bottom pooling output based on the input.
-    Scan the input from top to bottm for the vertical max-pooling.
-    The output has the same shape with input.
-    Args:
-        input(Variable): This input is a Tensor with shape [N, C, H, W].
-            The data type is float32 or float64.
-    Returns:
-        Variable(Tensor): The output of bottom_pool, with shape [N, C, H, W].
-        The data type is float32 or float64.
-    Examples:
-        ..code-block:: python
-            import paddle.fluid as fluid
-            import cornerpool_lib
-            input = fluid.data(
-                name='input', shape=[2, 64, 10, 10], dtype='float32')
-            output = corner_pool.bottom_pool(input)
-    """
-    if is_test:
-        if use_cpp:
-            output = cornerpool_op("bottom_pool", input, name)
-            return output
-
-        def cond(i, output):
-            return i < H
-
-        def body(i, output):
-            cur = fluid.layers.slice(output, [2], [i], [H])
-            next = fluid.layers.slice(output, [2], [0], [H - i])
-            max_v = fluid.layers.elementwise_max(cur, next)
-            orig = fluid.layers.slice(output, [2], [0], [i])
-            output = fluid.layers.concat([orig, max_v], axis=2)
-            i = i * 2
-            return [i, output]
-
-        H = fluid.layers.shape(input)[2]
-        i = fluid.layers.fill_constant(shape=[1], dtype='int32', value=1)
-        output = input
-        output = fluid.layers.while_loop(cond, body, [i, output])
-        return output[-1]
-
-    H = input.shape[2]
-    i = 1
-    output = input
-    while i < H:
-        cur = output[:, :, i:, :]
-        next = output[:, :, :H - i, :]
-        max_v = fluid.layers.elementwise_max(cur, next)
-        output = fluid.layers.concat([output[:, :, :i, :], max_v], axis=2)
-        i *= 2
-
-    return output
-
-
-def top_pool(input, is_test=False, name=None):
-    """
-    This layer calculates the top pooling output based on the input.
-    Scan the input from bottom to top for the vertical max-pooling.
-    The output has the same shape with input.
-    Args:
-        input(Variable): This input is a Tensor with shape [N, C, H, W].
-            The data type is float32 or float64.
-    Returns:
-        Variable(Tensor): The output of top_pool, with shape [N, C, H, W].
-        The data type is float32 or float64.
-    Examples:
-        ..code-block:: python
-            import paddle.fluid as fluid
-            import cornerpool_lib
-            input = fluid.data(
-                name='input', shape=[2, 64, 10, 10], dtype='float32')
-            output = corner_pool.top_pool(input)
-    """
-    if is_test:
-        if use_cpp:
-            output = cornerpool_op("top_pool", input, name)
-            return output
-
-        def cond(i, output):
-            return i < H
-
-        def body(i, output):
-            cur = fluid.layers.slice(output, [2], [0], [H - i])
-            next = fluid.layers.slice(output, [2], [i], [H])
-            max_v = fluid.layers.elementwise_max(cur, next)
-            orig = fluid.layers.slice(output, [2], [H - i], [H])
-            output = fluid.layers.concat([max_v, orig], axis=2)
-            i = i * 2
-            return [i, output]
-
-        H = fluid.layers.shape(input)[2]
-        i = fluid.layers.fill_constant(shape=[1], dtype='int32', value=1)
-        output = input
-        output = fluid.layers.while_loop(cond, body, [i, output])
-        return output[-1]
-
-    H = input.shape[2]
-    i = 1
-    output = input
-    while i < H:
-        cur = output[:, :, :H - i, :]
-        next = output[:, :, i:, :]
-        max_v = fluid.layers.elementwise_max(cur, next)
-        output = fluid.layers.concat([max_v, output[:, :, H - i:, :]], axis=2)
-        i *= 2
-
-    return output
-
-
-def right_pool(input, is_test=False, name=None):
-    """
-    This layer calculates the right pooling output based on the input.
-    Scan the input from left to right for the horizontal max-pooling.
-    The output has the same shape with input.
-    Args:
-        input(Variable): This input is a Tensor with shape [N, C, H, W].
-            The data type is float32 or float64.
-    Returns:
-        Variable(Tensor): The output of right_pool, with shape [N, C, H, W].
-        The data type is float32 or float64.
-    Examples:
-        ..code-block:: python
-            import paddle.fluid as fluid
-            import cornerpool_lib
-            input = fluid.data(
-                name='input', shape=[2, 64, 10, 10], dtype='float32')
-            output = corner_pool.right_pool(input)
-    """
-    if is_test:
-        if use_cpp:
-            output = cornerpool_op("right_pool", input, name)
-            return output
-
-        def cond(i, output):
-            return i < W
-
-        def body(i, output):
-            cur = fluid.layers.slice(output, [3], [i], [W])
-            next = fluid.layers.slice(output, [3], [0], [W - i])
-            max_v = fluid.layers.elementwise_max(cur, next)
-            orig = fluid.layers.slice(output, [3], [0], [i])
-            output = fluid.layers.concat([orig, max_v], axis=-1)
-            i = i * 2
-            return [i, output]
-
-        W = fluid.layers.shape(input)[3]
-        i = fluid.layers.fill_constant(shape=[1], dtype='int32', value=1)
-        output = input
-        output = fluid.layers.while_loop(cond, body, [i, output])
-        return output[-1]
-
-    W = input.shape[3]
-    i = 1
-    output = input
-    while i < W:
-        cur = output[:, :, :, i:]
-        next = output[:, :, :, :W - i]
-        max_v = fluid.layers.elementwise_max(cur, next)
-        output = fluid.layers.concat([output[:, :, :, :i], max_v], axis=-1)
-        i *= 2
-
-    return output
-
-
-def left_pool(input, is_test=False, name=None):
-    """
-    This layer calculates the left pooling output based on the input.
-    Scan the input from right to left for the horizontal max-pooling.
-    The output has the same shape with input.
-    Args:
-        input(Variable): This input is a Tensor with shape [N, C, H, W].
-            The data type is float32 or float64.
-    Returns:
-        Variable(Tensor): The output of left_pool, with shape [N, C, H, W].
-        The data type is float32 or float64.
-    Examples:
-        ..code-block:: python
-            import paddle.fluid as fluid
-            import cornerpool_lib
-            input = fluid.data(
-                name='input', shape=[2, 64, 10, 10], dtype='float32')
-            output = corner_pool.left_pool(input)
-    """
-    if is_test:
-        if use_cpp:
-            output = cornerpool_op("left_pool", input, name)
-            return output
-
-        def cond(i, output):
-            return i < W
-
-        def body(i, output):
-            cur = fluid.layers.slice(output, [3], [0], [W - i])
-            next = fluid.layers.slice(output, [3], [i], [W])
-            max_v = fluid.layers.elementwise_max(cur, next)
-            orig = fluid.layers.slice(output, [3], [W - i], [W])
-            output = fluid.layers.concat([max_v, orig], axis=-1)
-            i = i * 2
-            return [i, output]
-
-        W = fluid.layers.shape(input)[3]
-        i = fluid.layers.fill_constant(shape=[1], dtype='int32', value=1)
-        output = input
-        output = fluid.layers.while_loop(cond, body, [i, output])
-        return output[-1]
-
-    W = input.shape[3]
-    i = 1
-    output = input
-    while i < W:
-        cur = output[:, :, :, :W - i]
-        next = output[:, :, :, i:]
-        max_v = fluid.layers.elementwise_max(cur, next)
-        output = fluid.layers.concat([max_v, output[:, :, :, W - i:]], axis=-1)
-        i *= 2
-
-    return output
--- a/ppdet/ext_op/src/bottom_pool_op.cc
+++ b/ppdet/ext_op/src/bottom_pool_op.cc
-/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License. */
-
-#include "paddle/fluid/framework/op_registry.h"
-namespace paddle {
-namespace operators {
-
-using Tensor = framework::Tensor;
-
-class BottomPoolOp : public framework::OperatorWithKernel {
-public:
-  using framework::OperatorWithKernel::OperatorWithKernel;
-
-  void InferShape(framework::InferShapeContext* ctx) const override {
-    PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
-    ctx->ShareDim("X", /*->*/ "MaxMap");
-    ctx->ShareDim("X", /*->*/ "Output");
-  }
-
-protected:
-  framework::OpKernelType GetExpectedKernelType(
-      const framework::ExecutionContext& ctx) const override {
-    return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
-                                   ctx.GetPlace());
-  }
-};
-
-class BottomPoolOpMaker : public framework::OpProtoAndCheckerMaker {
-public:
-  void Make() override {
-    AddInput("X",
-             "Input with shape (batch, C, H, W)");
-    AddOutput("MaxMap", "Max map with index of maximum value of input");
-    AddOutput("Output", "output with same shape as input(X)");
-    AddComment(
-        R"Doc(
-This operatio calculates the bottom pooling output based on the input.
-Scan the input from top to bottom for the vertical max-pooling.
-The output has the same shape with input.
-        )Doc");
-  }
-};
-
-class BottomPoolOpGrad : public framework::OperatorWithKernel {
-public:
-  using framework::OperatorWithKernel::OperatorWithKernel;
-
-protected:
-  void InferShape(framework::InferShapeContext* ctx) const override {
-    PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
-    PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
-    PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
-                   "Input(Output@GRAD) should not be null");
-    auto out_grad_name = framework::GradVarName("Output");
-    ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
-  }
-
-  framework::OpKernelType GetExpectedKernelType(
-      const framework::ExecutionContext& ctx) const override {
-    return framework::OpKernelType(
-        ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
-        ctx.GetPlace());
-  }
-};
-
-template <typename T>
-class BottomPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
-public:
-  using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
-
-protected:
-  void Apply(GradOpPtr<T> op) const override {
-    op->SetType("bottom_pool_grad");
-    op->SetInput("X", this->Input("X"));
-    op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
-    op->SetInput("MaxMap", this->Output("MaxMap"));
-    op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
-    op->SetAttrMap(this->Attrs());
-  }
-};
-
-}  // namespace operators
-}  // namespace paddle
-
-namespace ops = paddle::operators;
-REGISTER_OPERATOR(bottom_pool,
-                  ops::BottomPoolOp,
-                  ops::BottomPoolOpMaker,
-                  ops::BottomPoolGradDescMaker<paddle::framework::OpDesc>,
-                  ops::BottomPoolGradDescMaker<paddle::imperative::OpBase>);
-REGISTER_OPERATOR(bottom_pool_grad, ops::BottomPoolOpGrad);
--- a/ppdet/ext_op/src/bottom_pool_op.cu
+++ b/ppdet/ext_op/src/bottom_pool_op.cu
-/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License. */
-
-#include "paddle/fluid/framework/op_registry.h"
-#include "paddle/fluid/platform/cuda_primitives.h"
-#include "paddle/fluid/memory/memory.h"
-#include <vector>
-#include "util.cu.h"
-
-namespace paddle {
-namespace operators {
-
-using Tensor = framework::Tensor;
-
-static constexpr int kNumCUDAThreads = 512;
-static constexpr int kNumMaximumNumBlocks = 4096;
-
-static inline int NumBlocks(const int N) {
-  return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
-                  kNumMaximumNumBlocks);
-}
-
-template <typename T>
-class BottomPoolOpCUDAKernel : public framework::OpKernel<T> {
-public:
-  void Compute(const framework::ExecutionContext &ctx) const override {
-    PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
-                   "This kernel only runs on GPU device.");
-    auto *x = ctx.Input<Tensor>("X");
-    auto *max_map = ctx.Output<Tensor>("MaxMap");
-    auto *output = ctx.Output<Tensor>("Output");
-    auto *x_data = x->data<T>();
-    auto x_dims = x->dims();
-    int NC_num = x_dims[0] * x_dims[1];
-    int height = x_dims[2];
-    int width = x_dims[3];
-    int num = x->numel();
-    auto& dev_ctx = ctx.cuda_device_context();
-
-    int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
-    T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
-    auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
-
-    int threads = kNumCUDAThreads;
-    int blocks = NumBlocks(num / height);
-  
-    auto max_val_ptr = memory::Alloc(gpu_place, num / height * sizeof(T));
-    T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
-    auto max_ind_ptr = memory::Alloc(gpu_place, num / height * sizeof(int));
-    int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
-
-    GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 2, false, max_val_data, max_ind_data, max_map_data);
-
-    blocks = NumBlocks(num);
-    ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 2, output_data);
-  }
-};
-
-template <typename T>
-class BottomPoolGradOpCUDAKernel : public framework::OpKernel<T> {
- public:
-  void Compute(const framework::ExecutionContext& ctx) const override {
-    auto* x = ctx.Input<Tensor>("X");
-    auto* max_map = ctx.Input<Tensor>("MaxMap");
-    auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
-    auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
-    auto x_dims = x->dims();
-    
-    auto& dev_ctx = ctx.cuda_device_context();
-    T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
-    auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
-    
-    int threads = kNumCUDAThreads;
-    int NC_num = x_dims[0] * x_dims[1];
-    int height = x_dims[2];
-    int width = x_dims[3];
-    int grad_num = in_grad->numel();
-    int blocks = NumBlocks(grad_num);
-    FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
-
-    ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 2, in_grad_data);
-  }
-};
-
-}  // namespace operators
-}  // namespace paddle
-
-namespace ops = paddle::operators;
-REGISTER_OP_CUDA_KERNEL(bottom_pool,
-                        ops::BottomPoolOpCUDAKernel<float>,
-                        ops::BottomPoolOpCUDAKernel<double>);
-REGISTER_OP_CUDA_KERNEL(bottom_pool_grad,
-                        ops::BottomPoolGradOpCUDAKernel<float>,
-                        ops::BottomPoolGradOpCUDAKernel<double>);
--- a/ppdet/ext_op/src/left_pool_op.cc
+++ b/ppdet/ext_op/src/left_pool_op.cc
-/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License. */
-
-#include "paddle/fluid/framework/op_registry.h"
-namespace paddle {
-namespace operators {
-
-using Tensor = framework::Tensor;
-
-class LeftPoolOp : public framework::OperatorWithKernel {
-public:
-  using framework::OperatorWithKernel::OperatorWithKernel;
-
-  void InferShape(framework::InferShapeContext* ctx) const override {
-    PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
-    ctx->ShareDim("X", /*->*/ "MaxMap");
-    ctx->ShareDim("X", /*->*/ "Output");
-  }
-
-protected:
-  framework::OpKernelType GetExpectedKernelType(
-      const framework::ExecutionContext& ctx) const override {
-    return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
-                                   ctx.GetPlace());
-  }
-};
-
-class LeftPoolOpMaker : public framework::OpProtoAndCheckerMaker {
-public:
-  void Make() override {
-    AddInput("X",
-             "Input with shape (batch, C, H, W)");
-    AddOutput("MaxMap", "Max map with index of maximum value of input");
-    AddOutput("Output", "output with same shape as input(X)");
-    AddComment(
-        R"Doc(
-This operatio calculates the left pooling output based on the input.
-Scan the input from right to left for the horizontal max-pooling.
-The output has the same shape with input.
-        )Doc");
-  }
-};
-
-class LeftPoolOpGrad : public framework::OperatorWithKernel {
-public:
-  using framework::OperatorWithKernel::OperatorWithKernel;
-
-protected:
-  void InferShape(framework::InferShapeContext* ctx) const override {
-    PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
-    PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
-    PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
-                   "Input(Output@GRAD) should not be null");
-    auto out_grad_name = framework::GradVarName("Output");
-    ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
-  }
-
-  framework::OpKernelType GetExpectedKernelType(
-      const framework::ExecutionContext& ctx) const override {
-    return framework::OpKernelType(
-        ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
-        ctx.GetPlace());
-  }
-};
-
-template <typename T>
-class LeftPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
-public:
-  using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
-
-protected:
-  void Apply(GradOpPtr<T> op) const override {
-    op->SetType("left_pool_grad");
-    op->SetInput("X", this->Input("X"));
-    op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
-    op->SetInput("MaxMap", this->Output("MaxMap"));
-    op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
-    op->SetAttrMap(this->Attrs());
-  }
-};
-
-}  // namespace operators
-}  // namespace paddle
-
-namespace ops = paddle::operators;
-REGISTER_OPERATOR(left_pool,
-                  ops::LeftPoolOp,
-                  ops::LeftPoolOpMaker,
-                  ops::LeftPoolGradDescMaker<paddle::framework::OpDesc>,
-                  ops::LeftPoolGradDescMaker<paddle::imperative::OpBase>);
-REGISTER_OPERATOR(left_pool_grad, ops::LeftPoolOpGrad);
--- a/ppdet/ext_op/src/left_pool_op.cu
+++ b/ppdet/ext_op/src/left_pool_op.cu
-/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License. */
-
-#include "paddle/fluid/framework/op_registry.h"
-#include "paddle/fluid/platform/cuda_primitives.h"
-#include "paddle/fluid/memory/memory.h"
-#include <vector>
-#include "util.cu.h"
-
-namespace paddle {
-namespace operators {
-
-using Tensor = framework::Tensor;
-
-static constexpr int kNumCUDAThreads = 512;
-static constexpr int kNumMaximumNumBlocks = 4096;
-
-static inline int NumBlocks(const int N) {
-  return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
-                  kNumMaximumNumBlocks);
-}
-
-template <typename T>
-class LeftPoolOpCUDAKernel : public framework::OpKernel<T> {
-public:
-  void Compute(const framework::ExecutionContext &ctx) const override {
-    PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
-                   "This kernel only runs on GPU device.");
-    auto *x = ctx.Input<Tensor>("X");
-    auto *max_map = ctx.Output<Tensor>("MaxMap");
-    auto *output = ctx.Output<Tensor>("Output");
-    auto *x_data = x->data<T>();
-    auto x_dims = x->dims();
-    int NC_num = x_dims[0] * x_dims[1];
-    int height = x_dims[2];
-    int width = x_dims[3];
-    int num = x->numel();
-    auto& dev_ctx = ctx.cuda_device_context();
-
-    int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
-    T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
-    auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
-    
-    int threads = kNumCUDAThreads;
-    int blocks = NumBlocks(num / width);
-
-    auto max_val_ptr = memory::Alloc(gpu_place, num / width * sizeof(T));
-    T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
-    auto max_ind_ptr = memory::Alloc(gpu_place, num / width * sizeof(int));
-    int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
-
-    GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 3, true, max_val_data, max_ind_data, max_map_data);
-
-    blocks = NumBlocks(num);
-    ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 3, output_data);
-
-  }
-};
-
-template <typename T>
-class LeftPoolGradOpCUDAKernel : public framework::OpKernel<T> {
- public:
-  void Compute(const framework::ExecutionContext& ctx) const override {
-    auto* x = ctx.Input<Tensor>("X");
-    auto* max_map = ctx.Input<Tensor>("MaxMap");
-    auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
-    auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
-    auto x_dims = x->dims();
-    
-    auto& dev_ctx = ctx.cuda_device_context();
-    T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
-    auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
-    
-    int threads = kNumCUDAThreads;
-    int NC_num = x_dims[0] * x_dims[1];
-    int height = x_dims[2]; 
-    int width = x_dims[3];
-    int grad_num = in_grad->numel();
-    int blocks = NumBlocks(grad_num);
-    FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
-
-    ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 3, in_grad_data);
-  }
-};
-
-
-}  // namespace operators
-}  // namespace paddle
-
-namespace ops = paddle::operators;
-REGISTER_OP_CUDA_KERNEL(left_pool,
-                        ops::LeftPoolOpCUDAKernel<float>,
-                        ops::LeftPoolOpCUDAKernel<double>);
-REGISTER_OP_CUDA_KERNEL(left_pool_grad,
-                        ops::LeftPoolGradOpCUDAKernel<float>,
-                        ops::LeftPoolGradOpCUDAKernel<double>);
--- a/ppdet/ext_op/src/make.sh
+++ b/ppdet/ext_op/src/make.sh
-include_dir=$( python -c 'import paddle; print(paddle.sysconfig.get_include())' )
-lib_dir=$( python -c 'import paddle; print(paddle.sysconfig.get_lib())' )
-
-echo $include_dir
-echo $lib_dir
-
-OPS='bottom_pool_op top_pool_op right_pool_op left_pool_op'
-for op in ${OPS}
-do
-nvcc ${op}.cu -c -o ${op}.cu.o -ccbin cc -DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO -DPADDLE_WITH_MKLDNN -Xcompiler -fPIC -std=c++11 -Xcompiler -fPIC -w --expt-relaxed-constexpr -O0 -g -DNVCC \
-    -I ${include_dir}/third_party/ \
-    -I ${include_dir}
-done
-
-g++ bottom_pool_op.cc bottom_pool_op.cu.o top_pool_op.cc top_pool_op.cu.o right_pool_op.cc right_pool_op.cu.o left_pool_op.cc left_pool_op.cu.o -o cornerpool_lib.so -DPADDLE_WITH_MKLDNN -shared -fPIC -std=c++11 -O0 -g \
-  -I ${include_dir}/third_party/ \
-  -I ${include_dir} \
-  -L ${lib_dir} \
-  -L /usr/local/cuda/lib64 -lpaddle_framework -lcudart
-
-rm *.cu.o
--- a/ppdet/ext_op/src/right_pool_op.cc
+++ b/ppdet/ext_op/src/right_pool_op.cc
-/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License. */
-
-#include "paddle/fluid/framework/op_registry.h"
-namespace paddle {
-namespace operators {
-
-using Tensor = framework::Tensor;
-
-class RightPoolOp : public framework::OperatorWithKernel {
-public:
-  using framework::OperatorWithKernel::OperatorWithKernel;
-
-  void InferShape(framework::InferShapeContext* ctx) const override {
-    PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
-    ctx->ShareDim("X", /*->*/ "MaxMap");
-    ctx->ShareDim("X", /*->*/ "Output");
-  }
-
-protected:
-  framework::OpKernelType GetExpectedKernelType(
-      const framework::ExecutionContext& ctx) const override {
-    return framework::OpKernelType(ctx.Input<Tensor>("X")->type(),
-                                   ctx.GetPlace());
-  }
-};
-
-class RightPoolOpMaker : public framework::OpProtoAndCheckerMaker {
-public:
-  void Make() override {
-    AddInput("X",
-             "Input with shape (batch, C, H, W)");
-    AddOutput("MaxMap", "Max map with index of maximum value of input"); 
-    AddOutput("Output", "output with same shape as input(X)");
-    AddComment(
-        R"Doc(
-This operatio calculates the right pooling output based on the input.
-Scan the input from left to right or the horizontal max-pooling.
-The output has the same shape with input.        
-        )Doc");
-  }
-};
-
-class RightPoolOpGrad : public framework::OperatorWithKernel {
-public:
-  using framework::OperatorWithKernel::OperatorWithKernel;
-
-protected:
-  void InferShape(framework::InferShapeContext* ctx) const override {
-    PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
-    PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
-    PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
-                   "Input(Output@GRAD) should not be null");
-    auto out_grad_name = framework::GradVarName("Output");
-    ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
-  }
-
-  framework::OpKernelType GetExpectedKernelType(
-      const framework::ExecutionContext& ctx) const override {
-    return framework::OpKernelType(
-        ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
-        ctx.GetPlace());
-  }
-};
-
-template <typename T>
-class RightPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
-public:
-  using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
-
-protected:
-  void Apply(GradOpPtr<T> op) const override {
-    op->SetType("right_pool_grad");
-    op->SetInput("X", this->Input("X"));
-    op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
-    op->SetInput("MaxMap", this->Output("MaxMap"));
-    op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
-    op->SetAttrMap(this->Attrs());
-  }
-};
-
-}  // namespace operators
-}  // namespace paddle
-
-namespace ops = paddle::operators;
-REGISTER_OPERATOR(right_pool,
-                  ops::RightPoolOp,
-                  ops::RightPoolOpMaker,
-                  ops::RightPoolGradDescMaker<paddle::framework::OpDesc>,
-                  ops::RightPoolGradDescMaker<paddle::imperative::OpBase>);
-REGISTER_OPERATOR(right_pool_grad, ops::RightPoolOpGrad);
--- a/ppdet/ext_op/src/right_pool_op.cu
+++ b/ppdet/ext_op/src/right_pool_op.cu
-/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License. */
-
-#include "paddle/fluid/framework/op_registry.h"
-#include "paddle/fluid/platform/cuda_primitives.h"
-#include "paddle/fluid/memory/memory.h"
-#include <vector>
-#include "util.cu.h"
-
-namespace paddle {
-namespace operators {
-
-using Tensor = framework::Tensor;
-
-static constexpr int kNumCUDAThreads = 512;
-static constexpr int kNumMaximumNumBlocks = 4096;
-
-static inline int NumBlocks(const int N) {
-  return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
-                  kNumMaximumNumBlocks);
-}
-
-template <typename T>
-class RightPoolOpCUDAKernel : public framework::OpKernel<T> {
-public:
-  void Compute(const framework::ExecutionContext &ctx) const override {
-    PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
-                   "This kernel only runs on GPU device.");
-    auto *x = ctx.Input<Tensor>("X");
-    auto *max_map = ctx.Output<Tensor>("MaxMap");
-    auto *output = ctx.Output<Tensor>("Output");
-    auto *x_data = x->data<T>();
-    auto x_dims = x->dims();
-    int NC_num = x_dims[0] * x_dims[1];
-    int height = x_dims[2];
-    int width = x_dims[3];
-    int num = x->numel();
-    auto& dev_ctx = ctx.cuda_device_context();
-
-    int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
-    T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
-    auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
-    
-    int threads = kNumCUDAThreads;
-    int blocks = NumBlocks(num / width);
-  
-    auto max_val_ptr = memory::Alloc(gpu_place, num / width * sizeof(T));
-    T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
-    auto max_ind_ptr = memory::Alloc(gpu_place, num / width * sizeof(int));
-    int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
-
-    GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 3, false, max_val_data, max_ind_data, max_map_data);
-
-    blocks = NumBlocks(num);
-    ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 3, output_data);
-
-  }
-};
-
-template <typename T>
-class RightPoolGradOpCUDAKernel : public framework::OpKernel<T> {
- public:
-  void Compute(const framework::ExecutionContext& ctx) const override {
-    auto* x = ctx.Input<Tensor>("X");
-    auto* max_map = ctx.Input<Tensor>("MaxMap");
-    auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
-    auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
-    auto x_dims = x->dims();
-    
-    auto& dev_ctx = ctx.cuda_device_context();
-    T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
-    auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
-    
-    int threads = kNumCUDAThreads;
-    int NC_num = x_dims[0] * x_dims[1];
-    int height = x_dims[2];
-    int width = x_dims[3];
-    int grad_num = in_grad->numel();
-    int blocks = NumBlocks(grad_num);
-    FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
-
-    ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 3, in_grad_data);
-  }
-};
-
-}  // namespace operators
-}  // namespace paddle
-
-namespace ops = paddle::operators;
-REGISTER_OP_CUDA_KERNEL(right_pool,
-                        ops::RightPoolOpCUDAKernel<float>,
-                        ops::RightPoolOpCUDAKernel<double>);
-REGISTER_OP_CUDA_KERNEL(right_pool_grad,
-                        ops::RightPoolGradOpCUDAKernel<float>,
-                        ops::RightPoolGradOpCUDAKernel<double>);
--- a/ppdet/ext_op/src/top_pool_op.cc
+++ b/ppdet/ext_op/src/top_pool_op.cc
-/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License. */
-
-#include "paddle/fluid/framework/op_registry.h"
-namespace paddle {
-namespace operators {
-
-using Tensor = framework::Tensor;
-
-class TopPoolOp : public framework::OperatorWithKernel {
-public:
-  using framework::OperatorWithKernel::OperatorWithKernel;
-
-  void InferShape(framework::InferShapeContext* ctx) const override {
-    PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
-    ctx->ShareDim("X", /*->*/ "MaxMap");
-    ctx->ShareDim("X", /*->*/ "Output");
-  }
-
-protected:
-  framework::OpKernelType GetExpectedKernelType(
-      const framework::ExecutionContext& ctx) const override {
-    return framework::OpKernelType(ctx.Input<Tensor>("X")->type(), 
-                                   ctx.GetPlace());
-  }
-};
-
-class TopPoolOpMaker : public framework::OpProtoAndCheckerMaker {
-public:
-  void Make() override {
-    AddInput("X",
-             "Input with shape (batch, C, H, W)");
-    AddOutput("MaxMap", "Max map with index of maximum value of input");
-    AddOutput("Output", "Output with same shape as input(X)");
-    AddComment(
-        R"Doc(
-This operatio calculates the top pooling output based on the input.
-Scan the input from bottom to top for the vertical max-pooling.
-The output has the same shape with input.
-        )Doc");
-  }
-};
-
-class TopPoolOpGrad : public framework::OperatorWithKernel {
-public:
-  using framework::OperatorWithKernel::OperatorWithKernel;
-
-protected:
-  void InferShape(framework::InferShapeContext* ctx) const override {
-    PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) should not be null");
-    PADDLE_ENFORCE(ctx->HasInput("MaxMap"), "Input(MaxMap) should not be null");
-    PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Output")),
-                   "Input(Output@GRAD) should not be null");
-    
-    auto out_grad_name = framework::GradVarName("Output");
-    ctx->ShareDim(out_grad_name, framework::GradVarName("X"));
-  }
-
-  framework::OpKernelType GetExpectedKernelType(
-      const framework::ExecutionContext& ctx) const override {
-    return framework::OpKernelType(
-        ctx.Input<Tensor>(framework::GradVarName("Output"))->type(),
-        ctx.GetPlace());
-  }
-};
-
-template <typename T>
-class TopPoolGradDescMaker : public framework::SingleGradOpMaker<T> {
- public:
-  using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
-
- protected:
-  void Apply(GradOpPtr<T> op) const override {
-    op->SetType("top_pool_grad");
-    op->SetInput("X", this->Input("X"));
-    op->SetInput(framework::GradVarName("Output"), this->OutputGrad("Output"));
-    op->SetInput("MaxMap", this->Output("MaxMap"));
-    op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
-    op->SetAttrMap(this->Attrs());
-  }
-};
-
-}  // namespace operators
-}  // namespace paddle
-
-namespace ops = paddle::operators;
-REGISTER_OPERATOR(top_pool,
-                  ops::TopPoolOp,
-                  ops::TopPoolOpMaker,
-                  ops::TopPoolGradDescMaker<paddle::framework::OpDesc>,
-                  ops::TopPoolGradDescMaker<paddle::imperative::OpBase>);
-REGISTER_OPERATOR(top_pool_grad, ops::TopPoolOpGrad);
--- a/ppdet/ext_op/src/top_pool_op.cu
+++ b/ppdet/ext_op/src/top_pool_op.cu
-/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-GUnless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License. */
-
-#include "paddle/fluid/framework/op_registry.h"
-#include "paddle/fluid/platform/cuda_primitives.h"
-#include "paddle/fluid/memory/memory.h"
-#include <vector>
-#include "util.cu.h"
-
-namespace paddle {
-namespace operators {
-
-using Tensor = framework::Tensor;
-
-static constexpr int kNumCUDAThreads = 512;
-static constexpr int kNumMaximumNumBlocks = 4096;
-
-static inline int NumBlocks(const int N) {
-  return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
-                  kNumMaximumNumBlocks);
-}
-
-template <typename T>
-class TopPoolOpCUDAKernel : public framework::OpKernel<T> {
-public:
-  void Compute(const framework::ExecutionContext &ctx) const override {
-    PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
-                   "This kernel only runs on GPU device.");
-    auto *x = ctx.Input<Tensor>("X");
-    auto *max_map = ctx.Output<Tensor>("MaxMap");
-    auto *output = ctx.Output<Tensor>("Output");
-    auto *x_data = x->data<T>();
-    auto x_dims = x->dims();
-    int NC_num = x_dims[0] * x_dims[1];
-    int height = x_dims[2];
-    int width = x_dims[3];
-    int num = x->numel();
-    auto& dev_ctx = ctx.cuda_device_context();
-
-    int *max_map_data = max_map->mutable_data<int>(x_dims, dev_ctx.GetPlace());
-    T *output_data = output->mutable_data<T>(x_dims, dev_ctx.GetPlace());
-    auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
-    
-    int threads = kNumCUDAThreads;
-    int blocks = NumBlocks(num / height);
-  
-    auto max_val_ptr = memory::Alloc(gpu_place, num / height * sizeof(T));
-    T* max_val_data = reinterpret_cast<T*>(max_val_ptr->ptr());
-    auto max_ind_ptr = memory::Alloc(gpu_place, num / height * sizeof(int));
-    int* max_ind_data = reinterpret_cast<int*>(max_ind_ptr->ptr());
-
-    GetMaxInfo<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), NC_num, height, width, 2, true, max_val_data, max_ind_data, max_map_data);
-
-    blocks = NumBlocks(num);
-    ScatterAddFw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(x->data<T>(), max_map_data, NC_num, height, width, 2, output_data);
-  }
-};
-
-template <typename T>
-class TopPoolGradOpCUDAKernel : public framework::OpKernel<T> {
- public:
-  void Compute(const framework::ExecutionContext& ctx) const override {
-    auto* x = ctx.Input<Tensor>("X");
-    auto* max_map = ctx.Input<Tensor>("MaxMap");
-    auto* out_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
-    auto* in_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
-    auto x_dims = x->dims();
-    auto& dev_ctx = ctx.cuda_device_context();
-    T* in_grad_data = in_grad->mutable_data<T>(x_dims, dev_ctx.GetPlace());
-    auto gpu_place = boost::get<platform::CUDAPlace>(dev_ctx.GetPlace());
-    
-    int threads = kNumCUDAThreads;
-    int NC_num = x_dims[0] * x_dims[1];
-    int height = x_dims[2];
-    int width = x_dims[3];
-    int grad_num = in_grad->numel();
-    int blocks = NumBlocks(grad_num);
-    FillConstant<T><<<blocks, threads, 0, dev_ctx.stream()>>>(in_grad_data, 0, grad_num);
-
-    ScatterAddBw<T><<<blocks, threads, 0, dev_ctx.stream()>>>(out_grad->data<T>(), max_map->data<int>(), NC_num, height, width, 2, in_grad_data);
-  }
-};
-
-}  // namespace operators
-}  // namespace paddle
-
-namespace ops = paddle::operators;
-REGISTER_OP_CUDA_KERNEL(top_pool,
-                        ops::TopPoolOpCUDAKernel<float>,
-                        ops::TopPoolOpCUDAKernel<double>);
-REGISTER_OP_CUDA_KERNEL(top_pool_grad,
-                        ops::TopPoolGradOpCUDAKernel<float>,
-                        ops::TopPoolGradOpCUDAKernel<double>);
--- a/ppdet/ext_op/src/util.cu.h
+++ b/ppdet/ext_op/src/util.cu.h
-/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License. */
-
-#include "paddle/fluid/framework/op_registry.h"
-#include "paddle/fluid/platform/cuda_primitives.h"
-#include "paddle/fluid/memory/memory.h"
-#include <vector>
-
-namespace paddle {
-namespace operators {
-
-using framework::Tensor;
-
-#define CUDA_1D_KERNEL_LOOP(i, n)                              \
-  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
-       i += blockDim.x * gridDim.x)
-
-template <typename T>
-__global__ void FillConstant(T* x, int num, int fill_num) {
-  CUDA_1D_KERNEL_LOOP(i, fill_num) {
-    x[i] = static_cast<T>(num);
-  }
-}
-
-template <typename T>
-__global__ void SliceOnAxis(const T* x, const int NC_num, const int H, const int W,
-                   const int axis, const int start, const int end, 
-                   T* output) {
-  int HW_num = H * W;
-  int length = axis == 2 ? W : H;
-  int sliced_len = end - start;
-  int cur_HW_num = length * sliced_len;
-  // slice input on H or W (axis is 2 or 3)
-  CUDA_1D_KERNEL_LOOP(i, NC_num * cur_HW_num) {
-    int NC_id = i / cur_HW_num;
-    int HW_id = i % cur_HW_num;
-    if (axis == 2){
-      output[i] = x[NC_id * HW_num + start * W + HW_id];
-    } else if (axis == 3) {
-      int col = HW_id % sliced_len;
-      int row = HW_id / sliced_len;
-      output[i] = x[NC_id * HW_num + row * W + start + col];
-    }
-  } 
-}
-
-template <typename T>
-__global__  void MaxOut(const T* input, const int next_ind, const int NC_num,
-                        const int H, const int W, const int axis, 
-                        const int start, const int end, T* output) {
-  int HW_num = H * W;
-  int length = axis == 2 ? W : H; 
-  T cur = static_cast<T>(0.);
-  T next = static_cast<T>(0.);
-  T max_v = static_cast<T>(0.);
-  int sliced_len = end - start;
-  int cur_HW_num = length * sliced_len;
-  // compare cur and next and assign max values to output
-  CUDA_1D_KERNEL_LOOP(i, NC_num * cur_HW_num) {
-    int NC_id = i / cur_HW_num;
-    int HW_id = i % cur_HW_num;
-   
-    if (axis == 2){
-      cur = input[NC_id * HW_num + start * W + HW_id];
-      next = input[NC_id * HW_num + next_ind * W + HW_id];
-      max_v = cur > next ? cur : next; 
-      output[NC_id * HW_num + start * W + HW_id] = max_v;
-    } else if (axis == 3) {
-      int col = HW_id % sliced_len;
-      int row = HW_id / sliced_len;
-      cur = input[NC_id * HW_num + row * W + start + col];
-      next = input[NC_id * HW_num + row * W + next_ind + col];
-      max_v = cur > next ? cur : next;
-      output[NC_id * HW_num + row * W + start + col] = max_v;
-    }
-    __syncthreads();
-  }
-}
-
-template <typename T>
-__global__  void UpdateMaxInfo(const T* input, const int NC_num, 
-                               const int H, const int W, const int axis, 
-                               const int index, T* max_val, int* max_ind) {
-  int length = axis == 2 ? W : H;
-  int HW_num = H * W; 
-  T val = static_cast<T>(0.);
-  CUDA_1D_KERNEL_LOOP(i, NC_num * length) {
-    int NC_id = i / length;
-    int length_id = i % length;
-    if (axis == 2) {
-      val = input[NC_id * HW_num + index * W + length_id];
-    } else if (axis == 3) {
-      val = input[NC_id * HW_num + length_id * W + index];
-    }
-    if (val > max_val[i]) {
-      max_val[i] = val;
-      max_ind[i] = index;
-    }
-    __syncthreads();
-  }
-}
-
-template <typename T>
-__global__  void ScatterAddOnAxis(const T* input, const int start, const int* max_ind, const int NC_num, const int H, const int W, const int axis, T* output) {
-  int length = axis == 2 ? W : H;
-  int HW_num = H * W;
-  CUDA_1D_KERNEL_LOOP(i, NC_num * length) { 
-    int NC_id = i / length;
-    int length_id = i % length;
-    int id_ = max_ind[i];
-    if (axis == 2) {
-      platform::CudaAtomicAdd(output + NC_id * HW_num + id_ * W + length_id, input[NC_id * HW_num + start * W + length_id]);
-      //output[NC_id * HW_num + id_ * W + length_id] += input[NC_id * HW_num + start * W + length_id];
-    } else if (axis == 3) {
-      platform::CudaAtomicAdd(output + NC_id * HW_num + length_id * W + id_, input[NC_id * HW_num + length_id * W + start]);
-      //output[NC_id * HW_num + length_id * W + id_] += input[NC_id * HW_num + length_id * W + start];
-    }
-    __syncthreads();
-  }
-}
-
-template <typename T>
-__global__ void GetMaxInfo(const T* input, const int NC_num,
-                           const int H, const int W, const int axis,
-                           const bool reverse, T* max_val, int* max_ind,
-                           int* max_map) {
-   int start = 0;
-   int end = axis == 2 ? H: W;
-   int s = reverse ? end-1 : start;
-   int e = reverse ? start-1 : end;
-   int step = reverse ? -1 : 1;
-   int len = axis == 2 ? W : H;
-   int loc = 0;
-   T val = static_cast<T>(0.);
-   for (int i = s; ; ) {
-     if (i == s) {
-       CUDA_1D_KERNEL_LOOP(j, NC_num * len) {
-         int NC_id = j / len;
-         int len_id = j % len;
-         if (axis == 2) {
-           loc = NC_id * H * W + i * W + len_id;
-         }  else if (axis == 3){
-           loc = NC_id * H * W + len_id * W + i;
-         }
-         max_ind[j] = i;
-         max_map[loc] = max_ind[j];
-         max_val[j] = input[loc];   
-         __syncthreads();
-       }
-     } else {
-       CUDA_1D_KERNEL_LOOP(j, NC_num * len) {
-         int NC_id = j / len;
-         int len_id = j % len;
-       
-         if (axis == 2) {
-           loc = NC_id * H * W + i * W + len_id;
-         } else if (axis == 3){
-           loc = NC_id * H * W + len_id * W + i;
-         }
-         val = input[loc];
-         T max_v = max_val[j];
-         if (val > max_v) {
-           max_val[j] = val;
-           max_map[loc] = i;
-           max_ind[j] = i;
-         } else {
-           max_map[loc] = max_ind[j];
-         }
-         __syncthreads();
-       }
-     }
-     i += step;
-     if (s < e && i >= e) break;
-     if (s > e && i <= e) break;
-   }
-}
-
-template <typename T>
-__global__ void ScatterAddFw(const T* input, const int* max_map, const int NC_num, const int H, const int W, const int axis, T* output){
-  CUDA_1D_KERNEL_LOOP(i, NC_num * H * W) {
-    int loc = max_map[i];
-    int NC_id = i / (H * W);
-    int len_id = 0;
-    if (axis == 2) {
-      len_id = i % W;
-      output[i] = input[NC_id * H * W + loc * W + len_id];
-    } else {
-      len_id = i % (H * W) / W;
-      output[i] = input[NC_id * H * W + len_id * W + loc];
-    }
-  }
-}
-
-template <typename T>
-__global__ void ScatterAddBw(const T* input, const int* max_map, const int NC_num, const int H, const int W, const int axis, T* output){
-  CUDA_1D_KERNEL_LOOP(i, NC_num * H * W) {
-    int loc = max_map[i];
-    int NC_id = i / (H * W);
-    int len_id = 0;
-    int offset = 0;
-    if (axis == 2) {
-      len_id = i % W;
-      offset = NC_id * H * W + loc * W + len_id;
-    } else {
-      len_id = i % (H * W) / W;
-      offset = NC_id * H * W + len_id * W + loc;
-    }
-    platform::CudaAtomicAdd(output + offset, input[i]);
-  }
-}
-
-}  // namespace operators
-}  // namespace paddle
--- a/ppdet/ext_op/test/test_corner_pool.py
+++ b/ppdet/ext_op/test/test_corner_pool.py
-#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import print_function
-
-import unittest
-import numpy as np
-import paddle.fluid as fluid
-import os
-import sys
-# add python path of PadleDetection to sys.path
-parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4)))
-if parent_path not in sys.path:
-    sys.path.append(parent_path)
-
-from ppdet.ext_op import cornerpool_lib
-
-
-def bottom_pool_np(x):
-    height = x.shape[2]
-    output = x.copy()
-    for ind in range(height):
-        cur = output[:, :, ind:height, :]
-        next = output[:, :, :height - ind, :]
-        output[:, :, ind:height, :] = np.maximum(cur, next)
-    return output
-
-
-def top_pool_np(x):
-    height = x.shape[2]
-    output = x.copy()
-    for ind in range(height):
-        cur = output[:, :, :height - ind, :]
-        next = output[:, :, ind:height, :]
-        output[:, :, :height - ind, :] = np.maximum(cur, next)
-    return output
-
-
-def right_pool_np(x):
-    width = x.shape[3]
-    output = x.copy()
-    for ind in range(width):
-        cur = output[:, :, :, ind:width]
-        next = output[:, :, :, :width - ind]
-        output[:, :, :, ind:width] = np.maximum(cur, next)
-    return output
-
-
-def left_pool_np(x):
-    width = x.shape[3]
-    output = x.copy()
-    for ind in range(width):
-        cur = output[:, :, :, :width - ind]
-        next = output[:, :, :, ind:width]
-        output[:, :, :, :width - ind] = np.maximum(cur, next)
-    return output
-
-
-class TestRightPoolOp(unittest.TestCase):
-    def funcmap(self):
-        self.func_map = {
-            'bottom_x': [cornerpool_lib.bottom_pool, bottom_pool_np],
-            'top_x': [cornerpool_lib.top_pool, top_pool_np],
-            'right_x': [cornerpool_lib.right_pool, right_pool_np],
-            'left_x': [cornerpool_lib.left_pool, left_pool_np]
-        }
-
-    def setup(self):
-        self.name = 'right_x'
-
-    def test_check_output(self):
-        self.funcmap()
-        self.setup()
-        x_shape = (2, 10, 16, 16)
-        x_type = "float64"
-
-        sp = fluid.Program()
-        tp = fluid.Program()
-        place = fluid.CUDAPlace(0)
-
-        with fluid.program_guard(tp, sp):
-            x = fluid.data(name=self.name, shape=x_shape, dtype=x_type)
-            y = self.func_map[self.name][0](x)
-
-            np.random.seed(0)
-            x_np = np.random.uniform(-1000, 1000, x_shape).astype(x_type)
-
-        out_np = self.func_map[self.name][1](x_np)
-
-        exe = fluid.Executor(place)
-        outs = exe.run(tp, feed={self.name: x_np}, fetch_list=[y])
-
-        self.assertTrue(np.allclose(outs, out_np))
-
-
-class TestTopPoolOp(TestRightPoolOp):
-    def setup(self):
-        self.name = 'top_x'
-
-
-class TestBottomPoolOp(TestRightPoolOp):
-    def setup(self):
-        self.name = 'bottom_x'
-
-
-class TestLeftPoolOp(TestRightPoolOp):
-    def setup(self):
-        self.name = 'left_x'
-
-
-if __name__ == "__main__":
-    unittest.main()
--- a/ppdet/py_op/post_processing.py
+++ b/ppdet/py_op/post_processing.py
-import six
-import os
-import numpy as np
-from numba import jit
-from .bbox import nms
-
-
-@jit
-def box_decoder(deltas, boxes, weights, bbox_clip=4.13):
-    if boxes.shape[0] == 0:
-        return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
-    boxes = boxes.astype(deltas.dtype, copy=False)
-
-    widths = boxes[:, 2] - boxes[:, 0] + 1.0
-    heights = boxes[:, 3] - boxes[:, 1] + 1.0
-    ctr_x = boxes[:, 0] + 0.5 * widths
-    ctr_y = boxes[:, 1] + 0.5 * heights
-
-    wx, wy, ww, wh = weights
-    dx = deltas[:, 0::4] * wx
-    dy = deltas[:, 1::4] * wy
-    dw = deltas[:, 2::4] * ww
-    dh = deltas[:, 3::4] * wh
-
-    # Prevent sending too large values into np.exp()
-    dw = np.minimum(dw, bbox_clip)
-    dh = np.minimum(dh, bbox_clip)
-
-    pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
-    pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
-    pred_w = np.exp(dw) * widths[:, np.newaxis]
-    pred_h = np.exp(dh) * heights[:, np.newaxis]
-
-    pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
-    # x1
-    pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
-    # y1
-    pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
-    # x2 (note: "- 1" is correct; don't be fooled by the asymmetry)
-    pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
-    # y2 (note: "- 1" is correct; don't be fooled by the asymmetry)
-    pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1
-
-    return pred_boxes
-
-
-@jit
-def clip_tiled_boxes(boxes, im_shape):
-    """Clip boxes to image boundaries. im_shape is [height, width] and boxes
-    has shape (N, 4 * num_tiled_boxes)."""
-    assert boxes.shape[1] % 4 == 0, \
-        'boxes.shape[1] is {:d}, but must be divisible by 4.'.format(
-        boxes.shape[1]
-    )
-    # x1 >= 0
-    boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
-    # y1 >= 0
-    boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
-    # x2 < im_shape[1]
-    boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
-    # y2 < im_shape[0]
-    boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
-    return boxes
-
-
-#@jit 
-def get_nmsed_box(rpn_rois,
-                  confs,
-                  locs,
-                  class_nums,
-                  im_info,
-                  bbox_reg_weights=[0.1, 0.1, 0.2, 0.2],
-                  score_thresh=0.05,
-                  nms_thresh=0.5,
-                  detections_per_im=100):
-    box_nums = [0, rpn_rois.shape[0]]
-    variance_v = np.array(bbox_reg_weights)
-    rpn_rois_v = np.array(rpn_rois)
-    confs_v = np.array(confs)
-    locs_v = np.array(locs)
-
-    im_results = [[] for _ in range(len(box_nums) - 1)]
-    new_box_nums = [0]
-    for i in range(len(box_nums) - 1):
-        start = box_nums[i]
-        end = box_nums[i + 1]
-        if start == end:
-            continue
-
-        locs_n = locs_v[start:end, :]  # box delta 
-        rois_n = rpn_rois_v[start:end, :]  # box 
-        rois_n = rois_n / im_info[i][2]  # scale 
-        rois_n = box_decoder(locs_n, rois_n, variance_v)
-        rois_n = clip_tiled_boxes(rois_n, im_info[i][:2] / im_info[i][2])
-        cls_boxes = [[] for _ in range(class_nums)]
-        scores_n = confs_v[start:end, :]
-        for j in range(1, class_nums):
-            inds = np.where(scores_n[:, j] > TEST.score_thresh)[0]
-            scores_j = scores_n[inds, j]
-            rois_j = rois_n[inds, j * 4:(j + 1) * 4]
-            dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(
-                np.float32, copy=False)
-            keep = nms(dets_j, TEST.nms_thresh)
-            nms_dets = dets_j[keep, :]
-            #add labels
-            label = np.array([j for _ in range(len(keep))])
-            nms_dets = np.hstack((label[:, np.newaxis], nms_dets)).astype(
-                np.float32, copy=False)
-            cls_boxes[j] = nms_dets
-
-        # Limit to max_per_image detections **over all classes**
-        image_scores = np.hstack(
-            [cls_boxes[j][:, 1] for j in range(1, class_nums)])
-        if len(image_scores) > detections_per_im:
-            image_thresh = np.sort(image_scores)[-detections_per_im]
-            for j in range(1, class_nums):
-                keep = np.where(cls_boxes[j][:, 1] >= image_thresh)[0]
-                cls_boxes[j] = cls_boxes[j][keep, :]
-        im_results_n = np.vstack([cls_boxes[j] for j in range(1, class_nums)])
-        im_results[i] = im_results_n
-        new_box_nums.append(len(im_results_n) + new_box_nums[-1])
-        labels = im_results_n[:, 0]
-        scores = im_results_n[:, 1]
-        boxes = im_results_n[:, 2:]
-    im_results = np.vstack([im_results[k] for k in range(len(box_nums) - 1)])
-    return new_box_nums, im_results
-
-
-@jit
-def get_dt_res(batch_size, box_nums, nmsed_out, data, num_id_to_cat_id_map):
-    dts_res = []
-    nmsed_out_v = np.array(nmsed_out)
-    if nmsed_out_v.shape == (
-            1,
-            1, ):
-        return dts_res
-    assert (len(box_nums) == batch_size + 1), \
-      "Error Tensor offset dimension. Box Nums({}) vs. batch_size({})"\
-                    .format(len(box_nums), batch_size)
-    k = 0
-    for i in range(batch_size):
-        dt_num_this_img = box_nums[i + 1] - box_nums[i]
-        image_id = int(data[i][-1])
-        image_width = int(data[i][1][1])
-        image_height = int(data[i][1][2])
-        for j in range(dt_num_this_img):
-            dt = nmsed_out_v[k]
-            k = k + 1
-            num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
-            category_id = num_id_to_cat_id_map[num_id]
-            w = xmax - xmin + 1
-            h = ymax - ymin + 1
-            bbox = [xmin, ymin, w, h]
-            dt_res = {
-                'image_id': image_id,
-                'category_id': category_id,
-                'bbox': bbox,
-                'score': score
-            }
-            dts_res.append(dt_res)
-    return dts_res
-
-
-@jit
-def get_segms_res(batch_size, box_nums, segms_out, data, num_id_to_cat_id_map):
-    segms_res = []
-    segms_out_v = np.array(segms_out)
-    k = 0
-    for i in range(batch_size):
-        dt_num_this_img = box_nums[i + 1] - box_nums[i]
-        image_id = int(data[i][-1])
-        for j in range(dt_num_this_img):
-            dt = segms_out_v[k]
-            k = k + 1
-            segm, num_id, score = dt.tolist()
-            cat_id = num_id_to_cat_id_map[num_id]
-            if six.PY3:
-                if 'counts' in segm:
-                    segm['counts'] = segm['counts'].decode("utf8")
-            segm_res = {
-                'image_id': image_id,
-                'category_id': cat_id,
-                'segmentation': segm,
-                'score': score
-            }
-            segms_res.append(segm_res)
-    return segms_res
--- a/ppdet/utils/data_structure.py
+++ b/ppdet/utils/data_structure.py
-import numpy as np
-
-
-class BufferDict(dict):
-    def __init__(self, **kwargs):
-        super(BufferDict, self).__init__(**kwargs)
-
-    def __getitem__(self, key):
-        if key in self.keys():
-            return super(BufferDict, self).__getitem__(key)
-        else:
-            raise Exception("The %s is not in global inputs dict" % key)
-
-    def __setitem__(self, key, value):
-        if key not in self.keys():
-            super(BufferDict, self).__setitem__(key, value)
-        else:
-            raise Exception("The %s is already in global inputs dict" % key)
-
-    def update(self, *args, **kwargs):
-        for k, v in dict(*args, **kwargs).items():
-            self[k] = v
-
-    def update_v(self, key, value):
-        if key in self.keys():
-            super(BufferDict, self).__setitem__(key, value)
-        else:
-            raise Exception("The %s is not in global inputs dict" % key)
-
-    def get(self, key):
-        return self.__getitem__(key)
-
-    def set(self, key, value):
-        return self.__setitem__(key, value)
-
-    def debug(self, dshape=True, dvalue=True, dtype=False):
-        if self['open_debug']:
-            if 'debug_names' not in self.keys():
-                ditems = self.keys()
-            else:
-                ditems = self['debug_names']
-
-            infos = {}
-            for k in ditems:
-                if type(k) is dict:
-                    i_d = {}
-                    for i, j in k.items():
-                        if type(j) is list:
-                            for jj in j:
-                                i_d[jj] = self.get_debug_info(self[i][jj])
-                        infos[i] = i_d
-                else:
-                    infos[k] = self.get_debug_info(self[k])
-            print(infos)
-
-    def get_debug_info(self, v, dshape=True, dvalue=True, dtype=False):
-        info = []
-        if dshape == True and hasattr(v, 'shape'):
-            info.append(v.shape)
-        if dvalue == True and hasattr(v, 'numpy'):
-            info.append(np.mean(np.abs(v.numpy())))
-        if dtype == True:
-            info.append(type(v))
-        return info
--- a/ppdet/utils/oid_eval.py
+++ b/ppdet/utils/oid_eval.py
-# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from __future__ import unicode_literals
-
-import os
-import sys
-import numpy as np
-
-from .coco_eval import bbox2out
-
-import logging
-logger = logging.getLogger(__name__)
-
-__all__ = ['bbox2out', 'get_category_info']
-
-
-def get_category_info(anno_file=None,
-                      with_background=True,
-                      use_default_label=False):
-    clsid2catid = {k: k for k in range(1, 501)}
-
-    catid2name = {
-        0: "background",
-        1: "Infant bed",
-        2: "Rose",
-        3: "Flag",
-        4: "Flashlight",
-        5: "Sea turtle",
-        6: "Camera",
-        7: "Animal",
-        8: "Glove",
-        9: "Crocodile",
-        10: "Cattle",
-        11: "House",
-        12: "Guacamole",
-        13: "Penguin",
-        14: "Vehicle registration plate",
-        15: "Bench",
-        16: "Ladybug",
-        17: "Human nose",
-        18: "Watermelon",
-        19: "Flute",
-        20: "Butterfly",
-        21: "Washing machine",
-        22: "Raccoon",
-        23: "Segway",
-        24: "Taco",
-        25: "Jellyfish",
-        26: "Cake",
-        27: "Pen",
-        28: "Cannon",
-        29: "Bread",
-        30: "Tree",
-        31: "Shellfish",
-        32: "Bed",
-        33: "Hamster",
-        34: "Hat",
-        35: "Toaster",
-        36: "Sombrero",
-        37: "Tiara",
-        38: "Bowl",
-        39: "Dragonfly",
-        40: "Moths and butterflies",
-        41: "Antelope",
-        42: "Vegetable",
-        43: "Torch",
-        44: "Building",
-        45: "Power plugs and sockets",
-        46: "Blender",
-        47: "Billiard table",
-        48: "Cutting board",
-        49: "Bronze sculpture",
-        50: "Turtle",
-        51: "Broccoli",
-        52: "Tiger",
-        53: "Mirror",
-        54: "Bear",
-        55: "Zucchini",
-        56: "Dress",
-        57: "Volleyball",
-        58: "Guitar",
-        59: "Reptile",
-        60: "Golf cart",
-        61: "Tart",
-        62: "Fedora",
-        63: "Carnivore",
-        64: "Car",
-        65: "Lighthouse",
-        66: "Coffeemaker",
-        67: "Food processor",
-        68: "Truck",
-        69: "Bookcase",
-        70: "Surfboard",
-        71: "Footwear",
-        72: "Bench",
-        73: "Necklace",
-        74: "Flower",
-        75: "Radish",
-        76: "Marine mammal",
-        77: "Frying pan",
-        78: "Tap",
-        79: "Peach",
-        80: "Knife",
-        81: "Handbag",
-        82: "Laptop",
-        83: "Tent",
-        84: "Ambulance",
-        85: "Christmas tree",
-        86: "Eagle",
-        87: "Limousine",
-        88: "Kitchen & dining room table",
-        89: "Polar bear",
-        90: "Tower",
-        91: "Football",
-        92: "Willow",
-        93: "Human head",
-        94: "Stop sign",
-        95: "Banana",
-        96: "Mixer",
-        97: "Binoculars",
-        98: "Dessert",
-        99: "Bee",
-        100: "Chair",
-        101: "Wood-burning stove",
-        102: "Flowerpot",
-        103: "Beaker",
-        104: "Oyster",
-        105: "Woodpecker",
-        106: "Harp",
-        107: "Bathtub",
-        108: "Wall clock",
-        109: "Sports uniform",
-        110: "Rhinoceros",
-        111: "Beehive",
-        112: "Cupboard",
-        113: "Chicken",
-        114: "Man",
-        115: "Blue jay",
-        116: "Cucumber",
-        117: "Balloon",
-        118: "Kite",
-        119: "Fireplace",
-        120: "Lantern",
-        121: "Missile",
-        122: "Book",
-        123: "Spoon",
-        124: "Grapefruit",
-        125: "Squirrel",
-        126: "Orange",
-        127: "Coat",
-        128: "Punching bag",
-        129: "Zebra",
-        130: "Billboard",
-        131: "Bicycle",
-        132: "Door handle",
-        133: "Mechanical fan",
-        134: "Ring binder",
-        135: "Table",
-        136: "Parrot",
-        137: "Sock",
-        138: "Vase",
-        139: "Weapon",
-        140: "Shotgun",
-        141: "Glasses",
-        142: "Seahorse",
-        143: "Belt",
-        144: "Watercraft",
-        145: "Window",
-        146: "Giraffe",
-        147: "Lion",
-        148: "Tire",
-        149: "Vehicle",
-        150: "Canoe",
-        151: "Tie",
-        152: "Shelf",
-        153: "Picture frame",
-        154: "Printer",
-        155: "Human leg",
-        156: "Boat",
-        157: "Slow cooker",
-        158: "Croissant",
-        159: "Candle",
-        160: "Pancake",
-        161: "Pillow",
-        162: "Coin",
-        163: "Stretcher",
-        164: "Sandal",
-        165: "Woman",
-        166: "Stairs",
-        167: "Harpsichord",
-        168: "Stool",
-        169: "Bus",
-        170: "Suitcase",
-        171: "Human mouth",
-        172: "Juice",
-        173: "Skull",
-        174: "Door",
-        175: "Violin",
-        176: "Chopsticks",
-        177: "Digital clock",
-        178: "Sunflower",
-        179: "Leopard",
-        180: "Bell pepper",
-        181: "Harbor seal",
-        182: "Snake",
-        183: "Sewing machine",
-        184: "Goose",
-        185: "Helicopter",
-        186: "Seat belt",
-        187: "Coffee cup",
-        188: "Microwave oven",
-        189: "Hot dog",
-        190: "Countertop",
-        191: "Serving tray",
-        192: "Dog bed",
-        193: "Beer",
-        194: "Sunglasses",
-        195: "Golf ball",
-        196: "Waffle",
-        197: "Palm tree",
-        198: "Trumpet",
-        199: "Ruler",
-        200: "Helmet",
-        201: "Ladder",
-        202: "Office building",
-        203: "Tablet computer",
-        204: "Toilet paper",
-        205: "Pomegranate",
-        206: "Skirt",
-        207: "Gas stove",
-        208: "Cookie",
-        209: "Cart",
-        210: "Raven",
-        211: "Egg",
-        212: "Burrito",
-        213: "Goat",
-        214: "Kitchen knife",
-        215: "Skateboard",
-        216: "Salt and pepper shakers",
-        217: "Lynx",
-        218: "Boot",
-        219: "Platter",
-        220: "Ski",
-        221: "Swimwear",
-        222: "Swimming pool",
-        223: "Drinking straw",
-        224: "Wrench",
-        225: "Drum",
-        226: "Ant",
-        227: "Human ear",
-        228: "Headphones",
-        229: "Fountain",
-        230: "Bird",
-        231: "Jeans",
-        232: "Television",
-        233: "Crab",
-        234: "Microphone",
-        235: "Home appliance",
-        236: "Snowplow",
-        237: "Beetle",
-        238: "Artichoke",
-        239: "Jet ski",
-        240: "Stationary bicycle",
-        241: "Human hair",
-        242: "Brown bear",
-        243: "Starfish",
-        244: "Fork",
-        245: "Lobster",
-        246: "Corded phone",
-        247: "Drink",
-        248: "Saucer",
-        249: "Carrot",
-        250: "Insect",
-        251: "Clock",
-        252: "Castle",
-        253: "Tennis racket",
-        254: "Ceiling fan",
-        255: "Asparagus",
-        256: "Jaguar",
-        257: "Musical instrument",
-        258: "Train",
-        259: "Cat",
-        260: "Rifle",
-        261: "Dumbbell",
-        262: "Mobile phone",
-        263: "Taxi",
-        264: "Shower",
-        265: "Pitcher",
-        266: "Lemon",
-        267: "Invertebrate",
-        268: "Turkey",
-        269: "High heels",
-        270: "Bust",
-        271: "Elephant",
-        272: "Scarf",
-        273: "Barrel",
-        274: "Trombone",
-        275: "Pumpkin",
-        276: "Box",
-        277: "Tomato",
-        278: "Frog",
-        279: "Bidet",
-        280: "Human face",
-        281: "Houseplant",
-        282: "Van",
-        283: "Shark",
-        284: "Ice cream",
-        285: "Swim cap",
-        286: "Falcon",
-        287: "Ostrich",
-        288: "Handgun",
-        289: "Whiteboard",
-        290: "Lizard",
-        291: "Pasta",
-        292: "Snowmobile",
-        293: "Light bulb",
-        294: "Window blind",
-        295: "Muffin",
-        296: "Pretzel",
-        297: "Computer monitor",
-        298: "Horn",
-        299: "Furniture",
-        300: "Sandwich",
-        301: "Fox",
-        302: "Convenience store",
-        303: "Fish",
-        304: "Fruit",
-        305: "Earrings",
-        306: "Curtain",
-        307: "Grape",
-        308: "Sofa bed",
-        309: "Horse",
-        310: "Luggage and bags",
-        311: "Desk",
-        312: "Crutch",
-        313: "Bicycle helmet",
-        314: "Tick",
-        315: "Airplane",
-        316: "Canary",
-        317: "Spatula",
-        318: "Watch",
-        319: "Lily",
-        320: "Kitchen appliance",
-        321: "Filing cabinet",
-        322: "Aircraft",
-        323: "Cake stand",
-        324: "Candy",
-        325: "Sink",
-        326: "Mouse",
-        327: "Wine",
-        328: "Wheelchair",
-        329: "Goldfish",
-        330: "Refrigerator",
-        331: "French fries",
-        332: "Drawer",
-        333: "Treadmill",
-        334: "Picnic basket",
-        335: "Dice",
-        336: "Cabbage",
-        337: "Football helmet",
-        338: "Pig",
-        339: "Person",
-        340: "Shorts",
-        341: "Gondola",
-        342: "Honeycomb",
-        343: "Doughnut",
-        344: "Chest of drawers",
-        345: "Land vehicle",
-        346: "Bat",
-        347: "Monkey",
-        348: "Dagger",
-        349: "Tableware",
-        350: "Human foot",
-        351: "Mug",
-        352: "Alarm clock",
-        353: "Pressure cooker",
-        354: "Human hand",
-        355: "Tortoise",
-        356: "Baseball glove",
-        357: "Sword",
-        358: "Pear",
-        359: "Miniskirt",
-        360: "Traffic sign",
-        361: "Girl",
-        362: "Roller skates",
-        363: "Dinosaur",
-        364: "Porch",
-        365: "Human beard",
-        366: "Submarine sandwich",
-        367: "Screwdriver",
-        368: "Strawberry",
-        369: "Wine glass",
-        370: "Seafood",
-        371: "Racket",
-        372: "Wheel",
-        373: "Sea lion",
-        374: "Toy",
-        375: "Tea",
-        376: "Tennis ball",
-        377: "Waste container",
-        378: "Mule",
-        379: "Cricket ball",
-        380: "Pineapple",
-        381: "Coconut",
-        382: "Doll",
-        383: "Coffee table",
-        384: "Snowman",
-        385: "Lavender",
-        386: "Shrimp",
-        387: "Maple",
-        388: "Cowboy hat",
-        389: "Goggles",
-        390: "Rugby ball",
-        391: "Caterpillar",
-        392: "Poster",
-        393: "Rocket",
-        394: "Organ",
-        395: "Saxophone",
-        396: "Traffic light",
-        397: "Cocktail",
-        398: "Plastic bag",
-        399: "Squash",
-        400: "Mushroom",
-        401: "Hamburger",
-        402: "Light switch",
-        403: "Parachute",
-        404: "Teddy bear",
-        405: "Winter melon",
-        406: "Deer",
-        407: "Musical keyboard",
-        408: "Plumbing fixture",
-        409: "Scoreboard",
-        410: "Baseball bat",
-        411: "Envelope",
-        412: "Adhesive tape",
-        413: "Briefcase",
-        414: "Paddle",
-        415: "Bow and arrow",
-        416: "Telephone",
-        417: "Sheep",
-        418: "Jacket",
-        419: "Boy",
-        420: "Pizza",
-        421: "Otter",
-        422: "Office supplies",
-        423: "Couch",
-        424: "Cello",
-        425: "Bull",
-        426: "Camel",
-        427: "Ball",
-        428: "Duck",
-        429: "Whale",
-        430: "Shirt",
-        431: "Tank",
-        432: "Motorcycle",
-        433: "Accordion",
-        434: "Owl",
-        435: "Porcupine",
-        436: "Sun hat",
-        437: "Nail",
-        438: "Scissors",
-        439: "Swan",
-        440: "Lamp",
-        441: "Crown",
-        442: "Piano",
-        443: "Sculpture",
-        444: "Cheetah",
-        445: "Oboe",
-        446: "Tin can",
-        447: "Mango",
-        448: "Tripod",
-        449: "Oven",
-        450: "Mouse",
-        451: "Barge",
-        452: "Coffee",
-        453: "Snowboard",
-        454: "Common fig",
-        455: "Salad",
-        456: "Marine invertebrates",
-        457: "Umbrella",
-        458: "Kangaroo",
-        459: "Human arm",
-        460: "Measuring cup",
-        461: "Snail",
-        462: "Loveseat",
-        463: "Suit",
-        464: "Teapot",
-        465: "Bottle",
-        466: "Alpaca",
-        467: "Kettle",
-        468: "Trousers",
-        469: "Popcorn",
-        470: "Centipede",
-        471: "Spider",
-        472: "Sparrow",
-        473: "Plate",
-        474: "Bagel",
-        475: "Personal care",
-        476: "Apple",
-        477: "Brassiere",
-        478: "Bathroom cabinet",
-        479: "studio couch",
-        480: "Computer keyboard",
-        481: "Table tennis racket",
-        482: "Sushi",
-        483: "Cabinetry",
-        484: "Street light",
-        485: "Towel",
-        486: "Nightstand",
-        487: "Rabbit",
-        488: "Dolphin",
-        489: "Dog",
-        490: "Jug",
-        491: "Wok",
-        492: "Fire hydrant",
-        493: "Human eye",
-        494: "Skyscraper",
-        495: "Backpack",
-        496: "Potato",
-        497: "Paper towel",
-        498: "Lifejacket",
-        499: "Bicycle wheel",
-        500: "Toilet",
-    }
-
-    if not with_background:
-        clsid2catid = {k - 1: v for k, v in clsid2catid.items()}
-    return clsid2catid, catid2name
--- a/ppdet/utils/widerface_eval_utils.py
+++ b/ppdet/utils/widerface_eval_utils.py
-# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import numpy as np
-
-from ppdet.data.source.widerface import widerface_label
-from ppdet.utils.coco_eval import bbox2out
-
-import logging
-logger = logging.getLogger(__name__)
-
-__all__ = [
-    'get_shrink', 'bbox_vote', 'save_widerface_bboxes', 'save_fddb_bboxes',
-    'to_chw_bgr', 'bbox2out', 'get_category_info'
-]
-
-
-def to_chw_bgr(image):
-    """
-    Transpose image from HWC to CHW and from RBG to BGR.
-    Args:
-        image (np.array): an image with HWC and RBG layout.
-    """
-    # HWC to CHW
-    if len(image.shape) == 3:
-        image = np.swapaxes(image, 1, 2)
-        image = np.swapaxes(image, 1, 0)
-    # RBG to BGR
-    image = image[[2, 1, 0], :, :]
-    return image
-
-
-def bbox_vote(det):
-    order = det[:, 4].ravel().argsort()[::-1]
-    det = det[order, :]
-    if det.shape[0] == 0:
-        dets = np.array([[10, 10, 20, 20, 0.002]])
-        det = np.empty(shape=[0, 5])
-    while det.shape[0] > 0:
-        # IOU
-        area = (det[:, 2] - det[:, 0] + 1) * (det[:, 3] - det[:, 1] + 1)
-        xx1 = np.maximum(det[0, 0], det[:, 0])
-        yy1 = np.maximum(det[0, 1], det[:, 1])
-        xx2 = np.minimum(det[0, 2], det[:, 2])
-        yy2 = np.minimum(det[0, 3], det[:, 3])
-        w = np.maximum(0.0, xx2 - xx1 + 1)
-        h = np.maximum(0.0, yy2 - yy1 + 1)
-        inter = w * h
-        o = inter / (area[0] + area[:] - inter)
-
-        # nms
-        merge_index = np.where(o >= 0.3)[0]
-        det_accu = det[merge_index, :]
-        det = np.delete(det, merge_index, 0)
-        if merge_index.shape[0] <= 1:
-            if det.shape[0] == 0:
-                try:
-                    dets = np.row_stack((dets, det_accu))
-                except:
-                    dets = det_accu
-            continue
-        det_accu[:, 0:4] = det_accu[:, 0:4] * np.tile(det_accu[:, -1:], (1, 4))
-        max_score = np.max(det_accu[:, 4])
-        det_accu_sum = np.zeros((1, 5))
-        det_accu_sum[:, 0:4] = np.sum(det_accu[:, 0:4],
-                                      axis=0) / np.sum(det_accu[:, -1:])
-        det_accu_sum[:, 4] = max_score
-        try:
-            dets = np.row_stack((dets, det_accu_sum))
-        except:
-            dets = det_accu_sum
-    dets = dets[0:750, :]
-    # Only keep 0.3 or more
-    keep_index = np.where(dets[:, 4] >= 0.01)[0]
-    dets = dets[keep_index, :]
-    return dets
-
-
-def get_shrink(height, width):
-    """
-    Args:
-        height (int): image height.
-        width (int): image width.
-    """
-    # avoid out of memory
-    max_shrink_v1 = (0x7fffffff / 577.0 / (height * width))**0.5
-    max_shrink_v2 = ((678 * 1024 * 2.0 * 2.0) / (height * width))**0.5
-
-    def get_round(x, loc):
-        str_x = str(x)
-        if '.' in str_x:
-            str_before, str_after = str_x.split('.')
-            len_after = len(str_after)
-            if len_after >= 3:
-                str_final = str_before + '.' + str_after[0:loc]
-                return float(str_final)
-            else:
-                return x
-
-    max_shrink = get_round(min(max_shrink_v1, max_shrink_v2), 2) - 0.3
-    if max_shrink >= 1.5 and max_shrink < 2:
-        max_shrink = max_shrink - 0.1
-    elif max_shrink >= 2 and max_shrink < 3:
-        max_shrink = max_shrink - 0.2
-    elif max_shrink >= 3 and max_shrink < 4:
-        max_shrink = max_shrink - 0.3
-    elif max_shrink >= 4 and max_shrink < 5:
-        max_shrink = max_shrink - 0.4
-    elif max_shrink >= 5:
-        max_shrink = max_shrink - 0.5
-    elif max_shrink <= 0.1:
-        max_shrink = 0.1
-
-    shrink = max_shrink if max_shrink < 1 else 1
-    return shrink, max_shrink
-
-
-def save_widerface_bboxes(image_path, bboxes_scores, output_dir):
-    image_name = image_path.split('/')[-1]
-    image_class = image_path.split('/')[-2]
-    odir = os.path.join(output_dir, image_class)
-    if not os.path.exists(odir):
-        os.makedirs(odir)
-
-    ofname = os.path.join(odir, '%s.txt' % (image_name[:-4]))
-    f = open(ofname, 'w')
-    f.write('{:s}\n'.format(image_class + '/' + image_name))
-    f.write('{:d}\n'.format(bboxes_scores.shape[0]))
-    for box_score in bboxes_scores:
-        xmin, ymin, xmax, ymax, score = box_score
-        f.write('{:.1f} {:.1f} {:.1f} {:.1f} {:.3f}\n'.format(xmin, ymin, (
-            xmax - xmin + 1), (ymax - ymin + 1), score))
-    f.close()
-    logger.info("The predicted result is saved as {}".format(ofname))
-
-
-def save_fddb_bboxes(bboxes_scores,
-                     output_dir,
-                     output_fname='pred_fddb_res.txt'):
-    if not os.path.exists(output_dir):
-        os.makedirs(output_dir)
-    predict_file = os.path.join(output_dir, output_fname)
-    f = open(predict_file, 'w')
-    for image_path, dets in bboxes_scores.iteritems():
-        f.write('{:s}\n'.format(image_path))
-        f.write('{:d}\n'.format(dets.shape[0]))
-        for box_score in dets:
-            xmin, ymin, xmax, ymax, score = box_score
-            width, height = xmax - xmin, ymax - ymin
-            f.write('{:.1f} {:.1f} {:.1f} {:.1f} {:.3f}\n'
-                    .format(xmin, ymin, width, height, score))
-    logger.info("The predicted result is saved as {}".format(predict_file))
-    return predict_file
-
-
-def get_category_info(anno_file=None,
-                      with_background=True,
-                      use_default_label=False):
-    if use_default_label or anno_file is None \
-            or not os.path.exists(anno_file):
-        logger.info("Not found annotation file {}, load "
-                    "wider-face categories.".format(anno_file))
-        return widerfaceall_category_info(with_background)
-    else:
-        logger.info("Load categories from {}".format(anno_file))
-        return get_category_info_from_anno(anno_file, with_background)
-
-
-def get_category_info_from_anno(anno_file, with_background=True):
-    """
-    Get class id to category id map and category id
-    to category name map from annotation file.
-    Args:
-        anno_file (str): annotation file path
-        with_background (bool, default True):
-            whether load background as class 0.
-    """
-    cats = []
-    with open(anno_file) as f:
-        for line in f.readlines():
-            cats.append(line.strip())
-
-    if cats[0] != 'background' and with_background:
-        cats.insert(0, 'background')
-    if cats[0] == 'background' and not with_background:
-        cats = cats[1:]
-
-    clsid2catid = {i: i for i in range(len(cats))}
-    catid2name = {i: name for i, name in enumerate(cats)}
-
-    return clsid2catid, catid2name
-
-
-def widerfaceall_category_info(with_background=True):
-    """
-    Get class id to category id map and category id
-    to category name map of mixup wider_face dataset
-
-    Args:
-        with_background (bool, default True):
-            whether load background as class 0.
-    """
-    label_map = widerface_label(with_background)
-    label_map = sorted(label_map.items(), key=lambda x: x[1])
-    cats = [l[0] for l in label_map]
-
-    if with_background:
-        cats.insert(0, 'background')
-
-    clsid2catid = {i: i for i in range(len(cats))}
-    catid2name = {i: name for i, name in enumerate(cats)}
-
-    return clsid2catid, catid2name
--- a/tools/train.py
+++ b/tools/train.py
@@ -29,6 +29,7 @@ import random
 import datetime
 import numpy as np
 from collections import deque
+
 import paddle
 from ppdet.core.workspace import load_config, merge_config, create
 from ppdet.utils.stats import TrainingStats
@@ -37,6 +38,7 @@ from ppdet.utils.cli import ArgsParser
 from ppdet.utils.checkpoint import load_weight, load_pretrain_weight, save_model
 from export_model import dygraph_to_static
 from paddle.distributed import ParallelEnv
+
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
@@ -71,16 +73,6 @@ def parse_args():
        default=None,
        type=str,
        help="Evaluation directory, default is current directory.")
-    parser.add_argument(
-        "--use_tb",
-        type=bool,
-        default=False,
-        help="whether to record the data to Tensorboard.")
-    parser.add_argument(
-        '--tb_log_dir',
-        type=str,
-        default="tb_log_dir/scalar",
-        help='Tensorboard logging directory for scalar.')
    parser.add_argument(
        "--enable_ce",
        type=bool,
@@ -89,13 +81,6 @@ def parse_args():
        "This flag is only used for internal test.")
    parser.add_argument(
        "--use_gpu", action='store_true', default=False, help="data parallel")
-
-    parser.add_argument(
-        '--is_profiler',
-        type=int,
-        default=0,
-        help='The switch of profiler tools. (used for benchmark)')
-
    args = parser.parse_args()
    return args