Add PointRCNN model (#3967)

* add PointRCNN model by heavengate, FDInSky, tink2123

Add PointRCNN model (#3967)
* add PointRCNN model by heavengate, FDInSky, tink2123
9b67ef53 · Kaipeng Deng · GitHub · 3584aeec · 9b67ef53 · 9b67ef53
50 changed file
--- a/PaddleCV/Paddle3D/PointRCNN/.gitignore
+++ b/PaddleCV/Paddle3D/PointRCNN/.gitignore
+*log*
+checkpoints*
+build
+output
+result_dir
+pp_pointrcnn*
+data/gt_database
+utils/pts_utils/dist
+utils/pts_utils/build
+utils/pts_utils/pts_utils.egg-info
+utils/cyops/*.c
+utils/cyops/*.so
+ext_op/src/*.o
+ext_op/src/*.so
--- a/PaddleCV/Paddle3D/PointRCNN/README.md
+++ b/PaddleCV/Paddle3D/PointRCNN/README.md
+# PointRCNN 3D目标检测模型
+
+---
+## 内容
+
+- [简介](#简介)
+- [快速开始](#快速开始)
+- [参考文献](#参考文献)
+- [版本更新](#版本更新)
+
+## 简介
+
+[PointRCNN](https://arxiv.org/abs/1812.04244) 是 Shaoshuai Shi, Xiaogang Wang, Hongsheng Li. 等人提出的，是第一个仅使用原始点云的2-stage(两阶段)3D目标检测器，第一阶段将 Pointnet++ with MSG（Multi-scale Grouping）作为backbone，直接将原始点云数据分割为前景点和背景点，并利用前景点生成bounding box。第二阶段在标准坐标系中对生成对bounding box进一步筛选和优化。该模型还提出了基于bin的方式，把回归问题转化为分类问题，验证了在三维边界框回归中的有效性。PointRCNN在KITTI数据集上进行评估，论文发布时在KITTI 3D目标检测排行榜上获得了最佳性能。
+
+网络结构如下所示：
+
+<p align="center">
+<img src="images/teaser.png" height=300 width=800 hspace='10'/> <br />
+用于点云的目标检测器 PointNet++
+</p>
+
+**注意:** PointRCNN 模型构建依赖于自定义的 C++ 算子，目前仅支持GPU设备在Linux/Unix系统上进行编译，本模型**不能运行在Windows系统或CPU设备上**
+
+
+## 快速开始
+
+### 安装
+
+**安装 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle):**
+
+在当前目录下运行样例代码需要 PaddelPaddle Fluid [develop每日版本](https://www.paddlepaddle.org.cn/install/doc/tables#多版本whl包列表-dev-11)或使用PaddlePaddle [develop分支](https://github.com/PaddlePaddle/Paddle/tree/develop)源码编译安装. 
+
+为了使自定义算子与paddle版本兼容，建议您**优先使用源码编译paddle**，源码编译方式请参考[编译安装](https://www.paddlepaddle.org.cn/install/doc/source/ubuntu)
+
+**安装PointRCNN:**
+
+1. 下载[PaddlePaddle/models](https://github.com/PaddlePaddle/models)模型库
+
+通过如下命令下载Paddle models模型库：
+
+```
+git clone https://github.com/PaddlePaddle/models
+```
+
+2. 在`PaddleCV/Paddle3D/PointRCNN`目录下下载[pybind11](https://github.com/pybind/pybind11)
+
+`pts_utils`依赖`pybind11`编译，须在`PaddleCV/Paddle3D/PointRCNN`目录下下载`pybind11`子库，可使用如下命令下载：
+
+```
+cd PaddleCV/Paddle3D/PointRCNN
+git clone https://github.com/pybind/pybind11
+```
+
+3. 编译安装`pts_utils`, `kitti_utils`, `roipool3d_utils`, `iou_utils` 等模块
+
+使用如下命令编译安装`pts_utils`, `kitti_utils`, `roipool3d_utils`, `iou_utils` 等模块：
+```
+sh build_and_install.sh
+```
+
+4. 安装python依赖库
+
+使用如下命令安装python依赖库：
+
+```
+pip install -r requirement.txt
+```
+
+**注意：** KITTI mAP评估工具只能在python 3.6及以上版本中使用，且python3环境中需要安装`scikit-image`,`Numba`,`fire`等子库。
+`requirement.txt`中的`scikit-image`,`Numba`,`fire`即为KITTI mAP评估工具所需依赖库。
+
+### 编译自定义OP
+
+请确认Paddle版本为PaddelPaddle Fluid develop每日版本或基于Paddle develop分支源码编译安装，**推荐使用源码编译安装的方式**。
+
+自定义OP编译方式如下：
+
+    进入 `ext_op/src` 目录，执行编译脚本
+    ```
+    cd ext_op/src
+    sh make.sh
+    ```
+
+    成功编译后，`ext_op/src` 目录下将会生成 `pointnet2_lib.so` 
+
+    执行下列操作，确保自定义算子编译正确：
+
+    ```
+    # 设置动态库的路径到 LD_LIBRARY_PATH 中
+    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`python -c 'import paddle; print(paddle.sysconfig.get_lib())'`
+
+    # 回到 ext_op 目录，添加 PYTHONPATH
+    cd ..
+    export PYTHONPATH=$PYTHONPATH:`pwd`
+
+    # 运行单测 
+    python tests/test_farthest_point_sampling_op.py
+    python tests/test_gather_point_op.py
+    python tests/test_group_points_op.py
+    python tests/test_query_ball_op.py
+    python tests/test_three_interp_op.py
+    python tests/test_three_nn_op.py
+    ```
+    单测运行成功会输出提示信息，如下所示：
+
+    ```
+    .
+    ----------------------------------------------------------------------
+    Ran 1 test in 13.205s
+
+    OK
+    ```
+
+**说明：** 自定义OP编译与[PointNet++](../PointNet++)下一致，更多关于自定义OP的编译说明，请参考[自定义OP编译](../PointNet++/ext_op/README.md)
+
+### 数据准备
+
+**KITTI 3D object detection 数据集:**
+
+PointRCNN使用数据集[KITTI 3D object detection](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) 
+上进行训练。
+
+可通过如下方式下载数据集：
+
+```
+cd data/KITTI/object
+sh download.sh
+```
+
+此处的images只用做可视化，训练过程中使用[road planes](https://drive.google.com/file/d/1d5mq0RXRnvHPVeKx6Q612z0YRO1t2wAp/view?usp=sharing)数据来做训练时的数据增强，
+请下载并解压至`./data/KITTI/object/training`目录下。
+
+数据目录结构如下所示：
+
+```
+PointRCNN
+├── data
+│   ├── KITTI
+│   │   ├── ImageSets
+│   │   ├── object
+│   │   │   ├──training
+│   │   │   │  ├──calib & velodyne & label_2 & image_2 & planes
+│   │   │   ├──testing
+│   │   │   │  ├──calib & velodyne & image_2
+
+```
+
+
+### 训练
+
+**PointRCNN模型:**
+
+可通过如下方式启动 PointRCNN模型的训练：
+
+1. 指定单卡训练并设置动态库路径
+
+```
+# 指定单卡GPU训练
+export CUDA_VISIBLE_DEVICES=0
+
+# 设置动态库的路径到 LD_LIBRARY_PATH 中
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`python -c 'import paddle; print(paddle.sysconfig.get_lib())'`
+```
+
+2. 生成Groud Truth采样数据，命令如下：
+
+```
+python tools/generate_gt_database.py --class_name 'Car' --split train
+```
+
+3. 训练 RPN 模型
+
+```
+python train.py --cfg=./cfgs/default.yml \
+                --train_mode=rpn \
+                --batch_size=16 \
+                --epoch=200 \
+                --save_dir=checkpoints
+```
+
+RPN训练checkpoints默认保存在`checkpoints/rpn`目录，也可以通过`--save_dir`来指定。
+
+4. 生成增强离线场景数据并保存RPN模型的输出特征和ROI，用于离线训练 RCNN 模型
+
+生成增强的离线场景数据命令如下：
+
+```
+python tools/generate_aug_scene.py --class_name 'Car' --split train --aug_times 4
+```
+
+保存RPN模型对离线增强数据的输出特征和ROI，可以通过参数`--ckpt_dir`来指定RPN训练最终权重保存路径，RPN权重默认保存在`checkpoints/rpn`目录。
+保存输出特征和ROI时须指定`TEST.SPLIT`为`train_aug`，指定`TEST.RPN_POST_NMS_TOP_N`为`300`, `TEST.RPN_NMS_THRESH`为`0.85`。
+通过`--output_dir`指定保存输出特征和ROI的路径，默认保存到`./output`目录。
+
+```
+python eval.py --cfg=cfgs/default.yml  \
+               --eval_mode=rpn \
+               --ckpt_dir=./checkpoints/rpn/199 \
+               --save_rpn_feature \
+               --output_dir=output \
+               --set TEST.SPLIT train_aug TEST.RPN_POST_NMS_TOP_N 300 TEST.RPN_NMS_THRESH 0.85
+```
+
+`--output_dir`下保存的数据目录结构如下：
+
+```
+output
+├── detections 
+│   ├── data          # 保存ROI数据
+│   │   ├── 000000.txt
+│   │   ├── 000003.txt
+│   │   ├── ...
+├── features          # 保存输出特征
+│   ├── 000000_intensity.npy
+│   ├── 000000.npy
+│   ├── 000000_rawscore.npy
+│   ├── 000000_seg.npy
+│   ├── 000000_xyz.npy
+│   ├── ...
+├── seg_result        # 保存语义分割结果
+│   ├── 000000.npy
+│   ├── 000003.npy
+│   ├── ...
+```
+
+5. 离线训练RCNN，并且通过参数`--rcnn_training_roi_dir` and `--rcnn_training_feature_dir` 来指定 RPN 模型保存的输出特征和ROI路径。
+
+```
+python train.py --cfg=./cfgs/default.yml \
+                --train_mode=rcnn_offline \
+                --batch_size=4 \
+                --epoch=30 \
+                --save_dir=checkpoints \
+                --rcnn_training_roi_dir=output/detections/data \
+                --rcnn_training_feature_dir=output/features
+```
+
+RCNN模型训练权重默认保存在`checkpoints/rcnn`目录下，可通过`--save_dir`参数指定。
+
+**注意**: 最好的模型是通过保存RPN模型输出特征和ROI并离线数据增强的方式训练RCNN模型得出的，目前默认仅支持这种方式。
+
+
+### 模型评估
+
+**PointRCNN模型:**
+
+可通过如下方式启动 PointRCNN 模型的评估：
+
+1. 指定单卡训练并设置动态库路径
+
+```
+# 指定单卡GPU训练
+export CUDA_VISIBLE_DEVICES=0
+
+# 设置动态库的路径到 LD_LIBRARY_PATH 中
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`python -c 'import paddle; print(paddle.sysconfig.get_lib())'`
+
+```
+
+2. 保存RPN模型对评估数据的输出特征和ROI
+
+保存RPN模型对评估数据的输出特征和ROI命令如下，可以通过参数`--ckpt_dir`来指定RPN训练最终权重保存路径，RPN权重默认保存在`checkpoints/rpn`目录。
+通过`--output_dir`指定保存输出特征和ROI的路径，默认保存到`./output`目录。
+
+```
+python eval.py --cfg=cfgs/default.yml \
+               --eval_mode=rpn \
+               --ckpt_dir=./checkpoints/rpn/199 \
+               --save_rpn_feature \
+               --output_dir=output/val
+```
+
+保存RPN模型对评估数据的输出特征和ROI保存的目录结构与上述保存离线增强数据保存目录结构一致。
+
+3. 评估离线RCNN模型
+
+评估离线RCNN模型命令如下:
+
+```
+python eval.py --cfg=cfgs/default.yml \
+               --eval_mode=rcnn_offline \
+               --ckpt_dir=./checkpoints/rcnn_offline/29 \
+               --rcnn_eval_roi_dir=output/val/detections/data \
+               --rcnn_eval_feature_dir=output/val/features \
+               --save_result
+```
+
+最终目标检测结果文件保存在`./result_dir`目录下`final_result`文件夹下，同时可通过`--save_result`开启保存`roi_output`和`refine_output`结果文件。
+`result_dir`目录结构如下：
+
+```
+result_dir
+├── final_result
+│   ├── data          # 最终检测结果
+│   │   ├── 000001.txt
+│   │   ├── 000002.txt
+│   │   ├── ...
+├── roi_output
+│   ├── data          # RCNN模型输出检测ROI结果
+│   │   ├── 000001.txt
+│   │   ├── 000002.txt
+│   │   ├── ...
+├── refine_output
+│   ├── data          # 解码后的检测结果
+│   │   ├── 000001.txt
+│   │   ├── 000002.txt
+│   │   ├── ...
+```
+
+4. 使用KITTI mAP工具获得评估结果
+
+若在评估过程中使用的python版本为3.6及以上版本，则程序会自动运行KITTI mAP评估，若使用python版本低于3.6，
+由于KITTI mAP仅支持python 3.6及以上版本，须使用对应python版本通过如下命令进行评估：
+
+```
+python3 kitti_map.py
+```
+
+使用训练最终权重[RPN模型](https://paddlemodels.bj.bcebos.com/Paddle3D/pointrcnn_rpn.tar)和[RCNN模型](https://paddlemodels.bj.bcebos.com/Paddle3D/pointrcnn_rcnn_offline.tar)评估结果如下所示：
+
+|  Car AP@ | 0.70(easy) | 0.70(moderate) | 0.70(hard) |
+| :------- | :--------: | :------------: | :--------: |
+| bbox AP: |   90.20    |     88.85      |   88.59    |
+| bev  AP: |   89.50    |     86.97      |   85.58    |
+| 3d   AP: |   86.66    |     76.65      |   75.90    |
+| aos  AP: |   90.10    |     88.64      |   88.26    |
+
+
+## 参考文献
+
+- [PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud](https://arxiv.org/abs/1812.04244), Shaoshuai Shi, Xiaogang Wang, Hongsheng Li.
+- [PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space](https://arxiv.org/abs/1706.02413), Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas.
+- [PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation](https://www.semanticscholar.org/paper/PointNet%3A-Deep-Learning-on-Point-Sets-for-3D-and-Qi-Su/d997beefc0922d97202789d2ac307c55c2c52fba), Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas.
+
+## 版本更新
+
+- 11/2019, 新增 PointRCNN模型。
+
--- a/PaddleCV/Paddle3D/PointRCNN/build_and_install.sh
+++ b/PaddleCV/Paddle3D/PointRCNN/build_and_install.sh
+# compile cyops
+python utils/cyops/setup.py develop
+
+# compile and install pts_utils
+cd utils/pts_utils
+python setup.py install
+cd ../..
--- a/PaddleCV/Paddle3D/PointRCNN/cfgs/default.yml
+++ b/PaddleCV/Paddle3D/PointRCNN/cfgs/default.yml
+# This config is based on https://github.com/sshaoshuai/PointRCNN/blob/master/tools/cfgs/default.yaml
+CLASSES: Car
+
+INCLUDE_SIMILAR_TYPE: True
+
+# config of augmentation
+AUG_DATA: True
+AUG_METHOD_LIST: ['rotation', 'scaling', 'flip']
+AUG_METHOD_PROB: [1.0, 1.0, 0.5]
+AUG_ROT_RANGE: 18
+
+GT_AUG_ENABLED: True
+GT_EXTRA_NUM: 15
+GT_AUG_RAND_NUM: True
+GT_AUG_APPLY_PROB: 1.0
+GT_AUG_HARD_RATIO: 0.6
+
+PC_REDUCE_BY_RANGE: True
+PC_AREA_SCOPE: [[-40, 40], [-1,   3], [0, 70.4]]  # x, y, z scope in rect camera coords
+CLS_MEAN_SIZE: [[1.52563191462, 1.62856739989, 3.88311640418]]
+
+
+# 1. config of rpn network
+RPN:
+    ENABLED: True
+    FIXED: False
+
+    # config of input
+    USE_INTENSITY: False
+
+    # config of bin-based loss
+    LOC_XZ_FINE: True
+    LOC_SCOPE: 3.0
+    LOC_BIN_SIZE: 0.5
+    NUM_HEAD_BIN: 12
+
+    # config of network structure
+    BACKBONE: pointnet2_msg
+    USE_BN: True
+    NUM_POINTS: 16384
+
+    SA_CONFIG:
+        NPOINTS: [4096, 1024, 256, 64]
+        RADIUS: [[0.1, 0.5], [0.5, 1.0], [1.0, 2.0], [2.0, 4.0]]
+        NSAMPLE: [[16, 32], [16, 32], [16, 32], [16, 32]]
+        MLPS: [[[16, 16, 32], [32, 32, 64]],
+              [[64, 64, 128], [64, 96, 128]],
+              [[128, 196, 256], [128, 196, 256]],
+              [[256, 256, 512], [256, 384, 512]]]
+    FP_MLPS: [[128, 128], [256, 256], [512, 512], [512, 512]]
+    CLS_FC: [128]
+    REG_FC: [128]
+    DP_RATIO: 0.5
+
+    # config of training
+    LOSS_CLS: SigmoidFocalLoss
+    FG_WEIGHT: 15
+    FOCAL_ALPHA: [0.25, 0.75]
+    FOCAL_GAMMA: 2.0
+    REG_LOSS_WEIGHT: [1.0, 1.0, 1.0, 1.0]
+    LOSS_WEIGHT: [1.0, 1.0]
+    NMS_TYPE: normal
+
+    # config of testing
+    SCORE_THRESH: 0.3
+
+# 2. config of rcnn network
+RCNN:
+    ENABLED: True
+
+    # config of input
+    ROI_SAMPLE_JIT: False 
+    REG_AUG_METHOD: multiple  # multiple, single, normal
+    ROI_FG_AUG_TIMES: 10
+
+    USE_RPN_FEATURES: True
+    USE_MASK: True
+    MASK_TYPE: seg
+    USE_INTENSITY: False
+    USE_DEPTH: True
+    USE_SEG_SCORE: False
+
+    POOL_EXTRA_WIDTH: 1.0
+
+    # config of bin-based loss
+    LOC_SCOPE: 1.5
+    LOC_BIN_SIZE: 0.5
+    NUM_HEAD_BIN: 9
+    LOC_Y_BY_BIN: False
+    LOC_Y_SCOPE: 0.5
+    LOC_Y_BIN_SIZE: 0.25
+    SIZE_RES_ON_ROI: False
+
+    # config of network structure
+    USE_BN: False
+    DP_RATIO: 0.0
+
+    BACKBONE: pointnet  # pointnet
+    XYZ_UP_LAYER: [128, 128]
+
+    NUM_POINTS: 512
+    SA_CONFIG:
+        NPOINTS: [128, 32, -1]
+        RADIUS: [0.2, 0.4, 100]
+        NSAMPLE: [64, 64, 64]
+        MLPS: [[128, 128, 128],
+               [128, 128, 256],
+               [256, 256, 512]]
+    CLS_FC: [256, 256]
+    REG_FC: [256, 256]
+
+    # config of training
+    LOSS_CLS: BinaryCrossEntropy
+    FOCAL_ALPHA: [0.25, 0.75]
+    FOCAL_GAMMA: 2.0
+    CLS_WEIGHT: [1.0, 1.0, 1.0]
+    CLS_FG_THRESH: 0.6
+    CLS_BG_THRESH: 0.45
+    CLS_BG_THRESH_LO: 0.05
+    REG_FG_THRESH: 0.55
+    FG_RATIO: 0.5
+    ROI_PER_IMAGE: 64
+    HARD_BG_RATIO: 0.8
+
+    # config of testing
+    SCORE_THRESH: 0.3
+    NMS_THRESH: 0.1
+
+# general training config
+TRAIN:
+    SPLIT: train
+    VAL_SPLIT: smallval
+
+    LR: 0.002
+    LR_CLIP: 0.00001
+    LR_DECAY: 0.5
+    DECAY_STEP_LIST: [100, 150, 180, 200]
+    LR_WARMUP: True
+    WARMUP_MIN: 0.0002
+    WARMUP_EPOCH: 1
+
+    BN_MOMENTUM: 0.1
+    BN_DECAY: 0.5
+    BNM_CLIP: 0.01
+    BN_DECAY_STEP_LIST: [1000]
+
+    OPTIMIZER: adam # adam, adam_onecycle
+    WEIGHT_DECAY: 0.001  # L2 regularization
+    MOMENTUM: 0.9
+
+    MOMS: [0.95, 0.85]
+    DIV_FACTOR: 10.0
+    PCT_START: 0.4
+
+    GRAD_NORM_CLIP: 1.0
+
+    RPN_PRE_NMS_TOP_N: 9000
+    RPN_POST_NMS_TOP_N: 512
+    RPN_NMS_THRESH: 0.85
+    RPN_DISTANCE_BASED_PROPOSE: True
+
+TEST:
+    SPLIT: val
+    RPN_PRE_NMS_TOP_N: 9000
+    RPN_POST_NMS_TOP_N: 100
+    RPN_NMS_THRESH: 0.8
+    RPN_DISTANCE_BASED_PROPOSE: True
--- a/PaddleCV/Paddle3D/PointRCNN/data/KITTI/object/download.sh
+++ b/PaddleCV/Paddle3D/PointRCNN/data/KITTI/object/download.sh
+DIR="$( cd "$(dirname "$0")" ; pwd -P  )"
+cd "$DIR"
+
+echo "Downloading https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_velodyne.zip"
+wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_velodyne.zip
+echo "https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip"
+wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip
+echo "https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_calib.zip"
+wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_calib.zip
+echo "https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip"
+wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip
+
+echo "Decompressing data_object_velodyne.zip"
+unzip data_object_velodyne.zip
+echo "Decompressing data_object_image_2.zip"
+unzip "data_object_image_2.zip"
+echo "Decompressing data_object_calib.zip"
+unzip data_object_calib.zip
+echo "Decompressing data_object_label_2.zip"
+unzip data_object_label_2.zip
+
+echo "Download KITTI ImageSets"
+wget https://paddlemodels.bj.bcebos.com/Paddle3D/pointrcnn_kitti_imagesets.tar
+tar xf pointrcnn_kitti_imagesets.tar
+mv ImageSets ..
--- a/PaddleCV/Paddle3D/PointRCNN/data/__init__.py
+++ b/PaddleCV/Paddle3D/PointRCNN/data/__init__.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
--- a/PaddleCV/Paddle3D/PointRCNN/data/kitti_dataset.py
+++ b/PaddleCV/Paddle3D/PointRCNN/data/kitti_dataset.py
+"""
+This code is based on https://github.com/sshaoshuai/PointRCNN/blob/master/lib/datasets/kitti_dataset.py
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import cv2
+import numpy as np
+import utils.calibration as calibration
+from utils.object3d import get_objects_from_label
+from PIL import Image
+
+__all__ = ["KittiDataset"]
+
+
+class KittiDataset(object):
+    def __init__(self, data_dir, split='train'):
+        assert split in ['train', 'train_aug', 'val', 'test'], "unknown split {}".format(split)
+        self.split = split
+        self.is_test = self.split == 'test'
+        self.imageset_dir = os.path.join(data_dir, 'KITTI', 'object', 'testing' if self.is_test else 'training')
+
+        split_dir = os.path.join(data_dir, 'KITTI', 'ImageSets', split + '.txt')
+        self.image_idx_list = [x.strip() for x in open(split_dir).readlines()]
+        self.num_sample = self.image_idx_list.__len__()
+
+        self.image_dir = os.path.join(self.imageset_dir, 'image_2')
+        self.lidar_dir = os.path.join(self.imageset_dir, 'velodyne')
+        self.calib_dir = os.path.join(self.imageset_dir, 'calib')
+        self.label_dir = os.path.join(self.imageset_dir, 'label_2')
+        self.plane_dir = os.path.join(self.imageset_dir, 'planes')
+
+    def get_image(self, idx):
+        img_file = os.path.join(self.image_dir, '%06d.png' % idx)
+        assert os.path.exists(img_file)
+        return cv2.imread(img_file)  # (H, W, 3) BGR mode
+
+    def get_image_shape(self, idx):
+        img_file = os.path.join(self.image_dir, '%06d.png' % idx)
+        assert os.path.exists(img_file)
+        im = Image.open(img_file)
+        width, height = im.size
+        return height, width, 3
+
+    def get_lidar(self, idx):
+        lidar_file = os.path.join(self.lidar_dir, '%06d.bin' % idx)
+        assert os.path.exists(lidar_file)
+        return np.fromfile(lidar_file, dtype=np.float32).reshape(-1, 4)
+
+    def get_calib(self, idx):
+        calib_file = os.path.join(self.calib_dir, '%06d.txt' % idx)
+        assert os.path.exists(calib_file)
+        return calibration.Calibration(calib_file)
+
+    def get_label(self, idx):
+        label_file = os.path.join(self.label_dir, '%06d.txt' % idx)
+        assert os.path.exists(label_file)
+        # return kitti_utils.get_objects_from_label(label_file)
+        return get_objects_from_label(label_file)
+
+    def get_road_plane(self, idx):
+        plane_file = os.path.join(self.plane_dir, '%06d.txt' % idx)
+        with open(plane_file, 'r') as f:
+            lines = f.readlines()
+        lines = [float(i) for i in lines[3].split()]
+        plane = np.asarray(lines)
+
+        # Ensure normal is always facing up, this is in the rectified camera coordinate
+        if plane[1] > 0:
+            plane = -plane
+
+        norm = np.linalg.norm(plane[0:3])
+        plane = plane / norm
+        return plane
--- a/PaddleCV/Paddle3D/PointRCNN/data/kitti_rcnn_reader.py
+++ b/PaddleCV/Paddle3D/PointRCNN/data/kitti_rcnn_reader.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+"""
+This code is based on https://github.com/sshaoshuai/PointRCNN/blob/master/lib/datasets/kitti_rcnn_dataset.py
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import logging
+import multiprocessing
+import numpy as np
+import scipy
+from scipy.spatial import Delaunay
+try:
+    import cPickle as pickle
+except:
+    import pickle
+
+import pts_utils
+import utils.cyops.kitti_utils as kitti_utils
+import utils.cyops.roipool3d_utils as roipool3d_utils
+from data.kitti_dataset import KittiDataset
+from utils.config import cfg
+from collections import OrderedDict
+
+__all__ = ["KittiRCNNReader"]
+
+logger = logging.getLogger(__name__)
+
+
+def has_empty(data):
+    for d in data:
+        if isinstance(d, np.ndarray) and len(d) == 0:
+            return True
+    return False
+
+
+def in_hull(p, hull):
+    """
+    :param p: (N, K) test points
+    :param hull: (M, K) M corners of a box
+    :return (N) bool
+    """
+    try:
+        if not isinstance(hull, Delaunay):
+            hull = Delaunay(hull)
+        flag = hull.find_simplex(p) >= 0
+    except scipy.spatial.qhull.QhullError:
+        logger.debug('Warning: not a hull.')
+        flag = np.zeros(p.shape[0], dtype=np.bool)
+
+    return flag
+
+
+class KittiRCNNReader(KittiDataset):
+    def __init__(self, data_dir, npoints=16384, split='train', classes='Car', mode='TRAIN',
+                 random_select=True, rcnn_training_roi_dir=None, rcnn_training_feature_dir=None,
+                 rcnn_eval_roi_dir=None, rcnn_eval_feature_dir=None, gt_database_dir=None):
+        super(KittiRCNNReader, self).__init__(data_dir=data_dir, split=split)
+        if classes == 'Car':
+            self.classes = ('Background', 'Car')
+            aug_scene_data_dir = os.path.join(data_dir, 'KITTI', 'aug_scene')
+        elif classes == 'People':
+            self.classes = ('Background', 'Pedestrian', 'Cyclist')
+        elif classes == 'Pedestrian':
+            self.classes = ('Background', 'Pedestrian')
+            aug_scene_data_dir = os.path.join(data_dir, 'KITTI', 'aug_scene_ped')
+        elif classes == 'Cyclist':
+            self.classes = ('Background', 'Cyclist')
+            aug_scene_data_dir = os.path.join(data_dir, 'KITTI', 'aug_scene_cyclist')
+        else:
+            assert False, "Invalid classes: %s" % classes
+
+        self.num_classes = len(self.classes)
+
+        self.npoints = npoints
+        self.sample_id_list = []
+        self.random_select = random_select
+
+        if split == 'train_aug':
+            self.aug_label_dir = os.path.join(aug_scene_data_dir, 'training', 'aug_label')
+            self.aug_pts_dir = os.path.join(aug_scene_data_dir, 'training', 'rectified_data')
+        else:
+            self.aug_label_dir = os.path.join(aug_scene_data_dir, 'training', 'aug_label')
+            self.aug_pts_dir = os.path.join(aug_scene_data_dir, 'training', 'rectified_data')
+
+        # for rcnn training
+        self.rcnn_training_bbox_list = []
+        self.rpn_feature_list = {}
+        self.pos_bbox_list = []
+        self.neg_bbox_list = []
+        self.far_neg_bbox_list = []
+        self.rcnn_eval_roi_dir = rcnn_eval_roi_dir
+        self.rcnn_eval_feature_dir = rcnn_eval_feature_dir
+        self.rcnn_training_roi_dir = rcnn_training_roi_dir
+        self.rcnn_training_feature_dir = rcnn_training_feature_dir
+
+        self.gt_database = None
+
+        if not self.random_select:
+            logger.warning('random select is False')
+
+        assert mode in ['TRAIN', 'EVAL', 'TEST'], 'Invalid mode: %s' % mode
+        self.mode = mode
+
+        if cfg.RPN.ENABLED:
+            if gt_database_dir is not None:
+                self.gt_database = pickle.load(open(gt_database_dir, 'rb'))
+
+                if cfg.GT_AUG_HARD_RATIO > 0:
+                    easy_list, hard_list = [], []
+                    for k in range(self.gt_database.__len__()):
+                        obj = self.gt_database[k]
+                        if obj['points'].shape[0] > 100:
+                            easy_list.append(obj)
+                        else:
+                            hard_list.append(obj)
+                    self.gt_database = [easy_list, hard_list]
+                    logger.info('Loading gt_database(easy(pt_num>100): %d, hard(pt_num<=100): %d) from %s'
+                                % (len(easy_list), len(hard_list), gt_database_dir))
+                else:
+                    logger.info('Loading gt_database(%d) from %s' % (len(self.gt_database), gt_database_dir))
+
+            if mode == 'TRAIN':
+                self.preprocess_rpn_training_data()
+            else:
+                self.sample_id_list = [int(sample_id) for sample_id in self.image_idx_list]
+                logger.info('Load testing samples from %s' % self.imageset_dir)
+                logger.info('Done: total test samples %d' % len(self.sample_id_list))
+        elif cfg.RCNN.ENABLED:
+            for idx in range(0, self.num_sample):
+                sample_id = int(self.image_idx_list[idx])
+                obj_list = self.filtrate_objects(self.get_label(sample_id))
+                if len(obj_list) == 0:
+                    # logger.info('No gt classes: %06d' % sample_id)
+                    continue
+                self.sample_id_list.append(sample_id)
+
+            logger.info('Done: filter %s results for rcnn training: %d / %d\n' %
+                  (self.mode, len(self.sample_id_list), len(self.image_idx_list)))
+
+    def preprocess_rpn_training_data(self):
+        """
+        Discard samples which don't have current classes, which will not be used for training.
+        Valid sample_id is stored in self.sample_id_list
+        """
+        logger.info('Loading %s samples from %s ...' % (self.mode, self.label_dir))
+        for idx in range(0, self.num_sample):
+            sample_id = int(self.image_idx_list[idx])
+            obj_list = self.filtrate_objects(self.get_label(sample_id))
+            if len(obj_list) == 0:
+                logger.debug('No gt classes: %06d' % sample_id)
+                continue
+            self.sample_id_list.append(sample_id)
+
+        logger.info('Done: filter %s results: %d / %d\n' % (self.mode, len(self.sample_id_list),
+                                                                 len(self.image_idx_list)))
+
+    def get_label(self, idx):
+        if idx < 10000:
+            label_file = os.path.join(self.label_dir, '%06d.txt' % idx)
+        else:
+            label_file = os.path.join(self.aug_label_dir, '%06d.txt' % idx)
+
+        assert os.path.exists(label_file)
+        return kitti_utils.get_objects_from_label(label_file)
+
+    def get_image(self, idx):
+        return super(KittiRCNNReader, self).get_image(idx % 10000)
+
+    def get_image_shape(self, idx):
+        return super(KittiRCNNReader, self).get_image_shape(idx % 10000)
+
+    def get_calib(self, idx):
+        return super(KittiRCNNReader, self).get_calib(idx % 10000)
+
+    def get_road_plane(self, idx):
+        return super(KittiRCNNReader, self).get_road_plane(idx % 10000)
+
+    @staticmethod
+    def get_rpn_features(rpn_feature_dir, idx):
+        rpn_feature_file = os.path.join(rpn_feature_dir, '%06d.npy' % idx)
+        rpn_xyz_file = os.path.join(rpn_feature_dir, '%06d_xyz.npy' % idx)
+        rpn_intensity_file = os.path.join(rpn_feature_dir, '%06d_intensity.npy' % idx)
+        if cfg.RCNN.USE_SEG_SCORE:
+            rpn_seg_file = os.path.join(rpn_feature_dir, '%06d_rawscore.npy' % idx)
+            rpn_seg_score = np.load(rpn_seg_file).reshape(-1)
+            rpn_seg_score = torch.sigmoid(torch.from_numpy(rpn_seg_score)).numpy()
+        else:
+            rpn_seg_file = os.path.join(rpn_feature_dir, '%06d_seg.npy' % idx)
+            rpn_seg_score = np.load(rpn_seg_file).reshape(-1)
+        return np.load(rpn_xyz_file), np.load(rpn_feature_file), np.load(rpn_intensity_file).reshape(-1), rpn_seg_score
+
+    def filtrate_objects(self, obj_list):
+        """
+        Discard objects which are not in self.classes (or its similar classes)
+        :param obj_list: list
+        :return: list
+        """
+        type_whitelist = self.classes
+        if self.mode == 'TRAIN' and cfg.INCLUDE_SIMILAR_TYPE:
+            type_whitelist = list(self.classes)
+            if 'Car' in self.classes:
+                type_whitelist.append('Van')
+            if 'Pedestrian' in self.classes:  # or 'Cyclist' in self.classes:
+                type_whitelist.append('Person_sitting')
+
+        valid_obj_list = []
+        for obj in obj_list:
+            if obj.cls_type not in type_whitelist:  # rm Van, 20180928
+                continue
+            if self.mode == 'TRAIN' and cfg.PC_REDUCE_BY_RANGE and (self.check_pc_range(obj.pos) is False):
+                continue
+            valid_obj_list.append(obj)
+        return valid_obj_list
+
+    @staticmethod
+    def filtrate_dc_objects(obj_list):
+        valid_obj_list = []
+        for obj in obj_list:
+            if obj.cls_type in ['DontCare']:
+                continue
+            valid_obj_list.append(obj)
+
+        return valid_obj_list
+
+    @staticmethod
+    def check_pc_range(xyz):
+        """
+        :param xyz: [x, y, z]
+        :return:
+        """
+        x_range, y_range, z_range = cfg.PC_AREA_SCOPE
+        if (x_range[0] <= xyz[0] <= x_range[1]) and (y_range[0] <= xyz[1] <= y_range[1]) and \
+                (z_range[0] <= xyz[2] <= z_range[1]):
+            return True
+        return False
+
+    @staticmethod
+    def get_valid_flag(pts_rect, pts_img, pts_rect_depth, img_shape):
+        """
+        Valid point should be in the image (and in the PC_AREA_SCOPE)
+        :param pts_rect:
+        :param pts_img:
+        :param pts_rect_depth:
+        :param img_shape:
+        :return:
+        """
+        val_flag_1 = np.logical_and(pts_img[:, 0] >= 0, pts_img[:, 0] < img_shape[1])
+        val_flag_2 = np.logical_and(pts_img[:, 1] >= 0, pts_img[:, 1] < img_shape[0])
+        val_flag_merge = np.logical_and(val_flag_1, val_flag_2)
+        pts_valid_flag = np.logical_and(val_flag_merge, pts_rect_depth >= 0)
+
+        if cfg.PC_REDUCE_BY_RANGE:
+            x_range, y_range, z_range = cfg.PC_AREA_SCOPE
+            pts_x, pts_y, pts_z = pts_rect[:, 0], pts_rect[:, 1], pts_rect[:, 2]
+            range_flag = (pts_x >= x_range[0]) & (pts_x <= x_range[1]) \
+                         & (pts_y >= y_range[0]) & (pts_y <= y_range[1]) \
+                         & (pts_z >= z_range[0]) & (pts_z <= z_range[1])
+            pts_valid_flag = pts_valid_flag & range_flag
+        return pts_valid_flag
+
+    def get_rpn_sample(self, index):
+        sample_id = int(self.sample_id_list[index])
+        if sample_id < 10000:
+            calib = self.get_calib(sample_id)
+            # img = self.get_image(sample_id)
+            img_shape = self.get_image_shape(sample_id)
+            pts_lidar = self.get_lidar(sample_id)
+
+            # get valid point (projected points should be in image)
+            pts_rect = calib.lidar_to_rect(pts_lidar[:, 0:3])
+            pts_intensity = pts_lidar[:, 3]
+        else:
+            calib = self.get_calib(sample_id % 10000)
+            # img = self.get_image(sample_id % 10000)
+            img_shape = self.get_image_shape(sample_id % 10000)
+
+            pts_file = os.path.join(self.aug_pts_dir, '%06d.bin' % sample_id)
+            assert os.path.exists(pts_file), '%s' % pts_file
+            aug_pts = np.fromfile(pts_file, dtype=np.float32).reshape(-1, 4)
+            pts_rect, pts_intensity = aug_pts[:, 0:3], aug_pts[:, 3]
+
+        pts_img, pts_rect_depth = calib.rect_to_img(pts_rect)
+        pts_valid_flag = self.get_valid_flag(pts_rect, pts_img, pts_rect_depth, img_shape)
+
+        pts_rect = pts_rect[pts_valid_flag][:, 0:3]
+        pts_intensity = pts_intensity[pts_valid_flag]
+
+        if cfg.GT_AUG_ENABLED and self.mode == 'TRAIN':
+            # all labels for checking overlapping
+            all_gt_obj_list = self.filtrate_dc_objects(self.get_label(sample_id))
+            all_gt_boxes3d = kitti_utils.objs_to_boxes3d(all_gt_obj_list)
+
+            gt_aug_flag = False
+            if np.random.rand() < cfg.GT_AUG_APPLY_PROB:
+                # augment one scene
+                gt_aug_flag, pts_rect, pts_intensity, extra_gt_boxes3d, extra_gt_obj_list = \
+                    self.apply_gt_aug_to_one_scene(sample_id, pts_rect, pts_intensity, all_gt_boxes3d)
+
+        # generate inputs
+        if self.mode == 'TRAIN' or self.random_select:
+            if self.npoints < len(pts_rect):
+                pts_depth = pts_rect[:, 2]
+                pts_near_flag = pts_depth < 40.0
+                far_idxs_choice = np.where(pts_near_flag == 0)[0]
+                near_idxs = np.where(pts_near_flag == 1)[0]
+                near_idxs_choice = np.random.choice(near_idxs, self.npoints - len(far_idxs_choice), replace=False)
+
+                choice = np.concatenate((near_idxs_choice, far_idxs_choice), axis=0) \
+                    if len(far_idxs_choice) > 0 else near_idxs_choice
+                np.random.shuffle(choice)
+            else:
+                choice = np.arange(0, len(pts_rect), dtype=np.int32)
+                if self.npoints > len(pts_rect):
+                    extra_choice = np.random.choice(choice, self.npoints - len(pts_rect), replace=False)
+                    choice = np.concatenate((choice, extra_choice), axis=0)
+                np.random.shuffle(choice)
+
+            ret_pts_rect = pts_rect[choice, :]
+            ret_pts_intensity = pts_intensity[choice] - 0.5  # translate intensity to [-0.5, 0.5]
+        else:
+            ret_pts_rect = np.zeros((self.npoints, pts_rect.shape[1])).astype(pts_rect.dtype)
+            num_ = min(self.npoints, pts_rect.shape[0])
+            ret_pts_rect[:num_] = pts_rect[:num_]
+
+            ret_pts_intensity = pts_intensity - 0.5
+
+        pts_features = [ret_pts_intensity.reshape(-1, 1)]
+        ret_pts_features = np.concatenate(pts_features, axis=1) if pts_features.__len__() > 1 else pts_features[0]
+
+        sample_info = {'sample_id': sample_id, 'random_select': self.random_select}
+
+        if self.mode == 'TEST':
+            if cfg.RPN.USE_INTENSITY:
+                pts_input = np.concatenate((ret_pts_rect, ret_pts_features), axis=1)  # (N, C)
+            else:
+                pts_input = ret_pts_rect
+            sample_info['pts_input'] = pts_input
+            sample_info['pts_rect'] = ret_pts_rect
+            sample_info['pts_features'] = ret_pts_features
+            return sample_info
+
+        gt_obj_list = self.filtrate_objects(self.get_label(sample_id))
+        if cfg.GT_AUG_ENABLED and self.mode == 'TRAIN' and gt_aug_flag:
+            gt_obj_list.extend(extra_gt_obj_list)
+        gt_boxes3d = kitti_utils.objs_to_boxes3d(gt_obj_list)
+
+        gt_alpha = np.zeros((gt_obj_list.__len__()), dtype=np.float32)
+        for k, obj in enumerate(gt_obj_list):
+            gt_alpha[k] = obj.alpha
+
+        # data augmentation
+        aug_pts_rect = ret_pts_rect.copy()
+        aug_gt_boxes3d = gt_boxes3d.copy()
+        if cfg.AUG_DATA and self.mode == 'TRAIN':
+            aug_pts_rect, aug_gt_boxes3d, aug_method = self.data_augmentation(aug_pts_rect, aug_gt_boxes3d, gt_alpha,
+                                                                              sample_id)
+            sample_info['aug_method'] = aug_method
+
+        # prepare input
+        if cfg.RPN.USE_INTENSITY:
+            pts_input = np.concatenate((aug_pts_rect, ret_pts_features), axis=1)  # (N, C)
+        else:
+            pts_input = aug_pts_rect
+
+        if cfg.RPN.FIXED:
+            sample_info['pts_input'] = pts_input
+            sample_info['pts_rect'] = aug_pts_rect
+            sample_info['pts_features'] = ret_pts_features
+            sample_info['gt_boxes3d'] = aug_gt_boxes3d
+            return sample_info
+
+        if self.mode == 'EVAL' and aug_gt_boxes3d.shape[0] == 0:
+            aug_gt_boxes3d = np.zeros((1, aug_gt_boxes3d.shape[1]))
+
+        # generate training labels
+        rpn_cls_label, rpn_reg_label = self.generate_rpn_training_labels(aug_pts_rect, aug_gt_boxes3d)
+        sample_info['pts_input'] = pts_input
+        sample_info['pts_rect'] = aug_pts_rect
+        sample_info['pts_features'] = ret_pts_features
+        sample_info['rpn_cls_label'] = rpn_cls_label
+        sample_info['rpn_reg_label'] = rpn_reg_label
+        sample_info['gt_boxes3d'] = aug_gt_boxes3d
+        return sample_info
+
+    def apply_gt_aug_to_one_scene(self, sample_id, pts_rect, pts_intensity, all_gt_boxes3d):
+        """
+        :param pts_rect: (N, 3)
+        :param all_gt_boxex3d: (M2, 7)
+        :return:
+        """
+        assert self.gt_database is not None
+        # extra_gt_num = np.random.randint(10, 15)
+        # try_times = 50
+        if cfg.GT_AUG_RAND_NUM:
+            extra_gt_num = np.random.randint(10, cfg.GT_EXTRA_NUM)
+        else:
+            extra_gt_num = cfg.GT_EXTRA_NUM
+        try_times = 100
+        cnt = 0
+        cur_gt_boxes3d = all_gt_boxes3d.copy()
+        cur_gt_boxes3d[:, 4] += 0.5  # TODO: consider different objects
+        cur_gt_boxes3d[:, 5] += 0.5  # enlarge new added box to avoid too nearby boxes
+        cur_gt_corners = kitti_utils.boxes3d_to_corners3d(cur_gt_boxes3d)
+
+        extra_gt_obj_list = []
+        extra_gt_boxes3d_list = []
+        new_pts_list, new_pts_intensity_list = [], []
+        src_pts_flag = np.ones(pts_rect.shape[0], dtype=np.int32)
+
+        road_plane = self.get_road_plane(sample_id)
+        a, b, c, d = road_plane
+
+        while try_times > 0:
+            if cnt > extra_gt_num:
+                break
+
+            try_times -= 1
+            if cfg.GT_AUG_HARD_RATIO > 0:
+                p = np.random.rand()
+                if p > cfg.GT_AUG_HARD_RATIO:
+                    # use easy sample
+                    rand_idx = np.random.randint(0, len(self.gt_database[0]))
+                    new_gt_dict = self.gt_database[0][rand_idx]
+                else:
+                    # use hard sample
+                    rand_idx = np.random.randint(0, len(self.gt_database[1]))
+                    new_gt_dict = self.gt_database[1][rand_idx]
+            else:
+                rand_idx = np.random.randint(0, self.gt_database.__len__())
+                new_gt_dict = self.gt_database[rand_idx]
+
+            new_gt_box3d = new_gt_dict['gt_box3d'].copy()
+            new_gt_points = new_gt_dict['points'].copy()
+            new_gt_intensity = new_gt_dict['intensity'].copy()
+            new_gt_obj = new_gt_dict['obj']
+            center = new_gt_box3d[0:3]
+            if cfg.PC_REDUCE_BY_RANGE and (self.check_pc_range(center) is False):
+                continue
+
+            if new_gt_points.__len__() < 5:  # too few points
+                continue
+
+            # put it on the road plane
+            cur_height = (-d - a * center[0] - c * center[2]) / b
+            move_height = new_gt_box3d[1] - cur_height
+            new_gt_box3d[1] -= move_height
+            new_gt_points[:, 1] -= move_height
+            new_gt_obj.pos[1] -= move_height
+
+            new_enlarged_box3d = new_gt_box3d.copy()
+            new_enlarged_box3d[4] += 0.5
+            new_enlarged_box3d[5] += 0.5  # enlarge new added box to avoid too nearby boxes
+
+            cnt += 1
+            new_corners = kitti_utils.boxes3d_to_corners3d(new_enlarged_box3d.reshape(1, 7))
+            iou3d = kitti_utils.get_iou3d(new_corners, cur_gt_corners)
+            valid_flag = iou3d.max() < 1e-8
+            if not valid_flag:
+                continue
+
+            enlarged_box3d = new_gt_box3d.copy()
+            enlarged_box3d[3] += 2  # remove the points above and below the object
+
+            boxes_pts_mask_list = pts_utils.pts_in_boxes3d(pts_rect,
+                    enlarged_box3d.reshape(1, 7))
+            pt_mask_flag = (boxes_pts_mask_list[0] == 1)
+            src_pts_flag[pt_mask_flag] = 0  # remove the original points which are inside the new box
+
+            new_pts_list.append(new_gt_points)
+            new_pts_intensity_list.append(new_gt_intensity)
+            cur_gt_boxes3d = np.concatenate((cur_gt_boxes3d, new_enlarged_box3d.reshape(1, 7)), axis=0)
+            cur_gt_corners = np.concatenate((cur_gt_corners, new_corners), axis=0)
+            extra_gt_boxes3d_list.append(new_gt_box3d.reshape(1, 7))
+            extra_gt_obj_list.append(new_gt_obj)
+
+        if new_pts_list.__len__() == 0:
+            return False, pts_rect, pts_intensity, None, None
+
+        extra_gt_boxes3d = np.concatenate(extra_gt_boxes3d_list, axis=0)
+        # remove original points and add new points
+        pts_rect = pts_rect[src_pts_flag == 1]
+        pts_intensity = pts_intensity[src_pts_flag == 1]
+        new_pts_rect = np.concatenate(new_pts_list, axis=0)
+        new_pts_intensity = np.concatenate(new_pts_intensity_list, axis=0)
+        pts_rect = np.concatenate((pts_rect, new_pts_rect), axis=0)
+        pts_intensity = np.concatenate((pts_intensity, new_pts_intensity), axis=0)
+
+        return True, pts_rect, pts_intensity, extra_gt_boxes3d, extra_gt_obj_list
+
+    def rotate_box3d_along_y(self, box3d, rot_angle):
+        old_x, old_z, ry = box3d[0], box3d[2], box3d[6]
+        old_beta = np.arctan2(old_z, old_x)
+        alpha = -np.sign(old_beta) * np.pi / 2 + old_beta + ry
+        box3d = kitti_utils.rotate_pc_along_y(box3d.reshape(1, 7), rot_angle=rot_angle)[0]
+        new_x, new_z = box3d[0], box3d[2]
+        new_beta = np.arctan2(new_z, new_x)
+        box3d[6] = np.sign(new_beta) * np.pi / 2 + alpha - new_beta
+        return box3d
+
+    def data_augmentation(self, aug_pts_rect, aug_gt_boxes3d, gt_alpha, sample_id=None, mustaug=False, stage=1):
+        """
+        :param aug_pts_rect: (N, 3)
+        :param aug_gt_boxes3d: (N, 7)
+        :param gt_alpha: (N)
+        :return:
+        """
+        aug_list = cfg.AUG_METHOD_LIST
+        aug_enable = 1 - np.random.rand(3)
+        if mustaug is True:
+            aug_enable[0] = -1
+            aug_enable[1] = -1
+        aug_method = []
+        if 'rotation' in aug_list and aug_enable[0] < cfg.AUG_METHOD_PROB[0]:
+            angle = np.random.uniform(-np.pi / cfg.AUG_ROT_RANGE, np.pi / cfg.AUG_ROT_RANGE)
+            aug_pts_rect = kitti_utils.rotate_pc_along_y(aug_pts_rect, rot_angle=angle)
+            if stage == 1:
+                # xyz change, hwl unchange
+                aug_gt_boxes3d = kitti_utils.rotate_pc_along_y(aug_gt_boxes3d, rot_angle=angle)
+
+                # calculate the ry after rotation
+                x, z = aug_gt_boxes3d[:, 0], aug_gt_boxes3d[:, 2]
+                beta = np.arctan2(z, x)
+                new_ry = np.sign(beta) * np.pi / 2 + gt_alpha - beta
+                aug_gt_boxes3d[:, 6] = new_ry  # TODO: not in [-np.pi / 2, np.pi / 2]
+            elif stage == 2:
+                # for debug stage-2, this implementation has little float precision difference with the above one
+                assert aug_gt_boxes3d.shape[0] == 2
+                aug_gt_boxes3d[0] = self.rotate_box3d_along_y(aug_gt_boxes3d[0], angle)
+                aug_gt_boxes3d[1] = self.rotate_box3d_along_y(aug_gt_boxes3d[1], angle)
+            else:
+                raise NotImplementedError
+
+            aug_method.append(['rotation', angle])
+
+        if 'scaling' in aug_list and aug_enable[1] < cfg.AUG_METHOD_PROB[1]:
+            scale = np.random.uniform(0.95, 1.05)
+            aug_pts_rect = aug_pts_rect * scale
+            aug_gt_boxes3d[:, 0:6] = aug_gt_boxes3d[:, 0:6] * scale
+            aug_method.append(['scaling', scale])
+
+        if 'flip' in aug_list and aug_enable[2] < cfg.AUG_METHOD_PROB[2]:
+            # flip horizontal
+            aug_pts_rect[:, 0] = -aug_pts_rect[:, 0]
+            aug_gt_boxes3d[:, 0] = -aug_gt_boxes3d[:, 0]
+            # flip orientation: ry > 0: pi - ry, ry < 0: -pi - ry
+            if stage == 1:
+                aug_gt_boxes3d[:, 6] = np.sign(aug_gt_boxes3d[:, 6]) * np.pi - aug_gt_boxes3d[:, 6]
+            elif stage == 2:
+                assert aug_gt_boxes3d.shape[0] == 2
+                aug_gt_boxes3d[0, 6] = np.sign(aug_gt_boxes3d[0, 6]) * np.pi - aug_gt_boxes3d[0, 6]
+                aug_gt_boxes3d[1, 6] = np.sign(aug_gt_boxes3d[1, 6]) * np.pi - aug_gt_boxes3d[1, 6]
+            else:
+                raise NotImplementedError
+
+            aug_method.append('flip')
+
+        return aug_pts_rect, aug_gt_boxes3d, aug_method
+
+    @staticmethod
+    def generate_rpn_training_labels(pts_rect, gt_boxes3d):
+        cls_label = np.zeros((pts_rect.shape[0]), dtype=np.int32)
+        reg_label = np.zeros((pts_rect.shape[0], 7), dtype=np.float32)  # dx, dy, dz, ry, h, w, l
+        gt_corners = kitti_utils.boxes3d_to_corners3d(gt_boxes3d, rotate=True)
+        extend_gt_boxes3d = kitti_utils.enlarge_box3d(gt_boxes3d, extra_width=0.2)
+        extend_gt_corners = kitti_utils.boxes3d_to_corners3d(extend_gt_boxes3d, rotate=True)
+        for k in range(gt_boxes3d.shape[0]):
+            box_corners = gt_corners[k]
+            fg_pt_flag = in_hull(pts_rect, box_corners)
+            fg_pts_rect = pts_rect[fg_pt_flag]
+            cls_label[fg_pt_flag] = 1
+
+            # enlarge the bbox3d, ignore nearby points
+            extend_box_corners = extend_gt_corners[k]
+            fg_enlarge_flag = in_hull(pts_rect, extend_box_corners)
+            ignore_flag = np.logical_xor(fg_pt_flag, fg_enlarge_flag)
+            cls_label[ignore_flag] = -1
+
+            # pixel offset of object center
+            center3d = gt_boxes3d[k][0:3].copy()  # (x, y, z)
+            center3d[1] -= gt_boxes3d[k][3] / 2
+            reg_label[fg_pt_flag, 0:3] = center3d - fg_pts_rect  # Now y is the true center of 3d box 20180928
+
+            # size and angle encoding
+            reg_label[fg_pt_flag, 3] = gt_boxes3d[k][3]  # h
+            reg_label[fg_pt_flag, 4] = gt_boxes3d[k][4]  # w
+            reg_label[fg_pt_flag, 5] = gt_boxes3d[k][5]  # l
+            reg_label[fg_pt_flag, 6] = gt_boxes3d[k][6]  # ry
+
+        return cls_label, reg_label
+
+    def get_rcnn_sample_jit(self, index):
+        sample_id = int(self.sample_id_list[index])
+        rpn_xyz, rpn_features, rpn_intensity, seg_mask = \
+            self.get_rpn_features(self.rcnn_training_feature_dir, sample_id)
+
+        # load rois and gt_boxes3d for this sample
+        roi_file = os.path.join(self.rcnn_training_roi_dir, '%06d.txt' % sample_id)
+        roi_obj_list = kitti_utils.get_objects_from_label(roi_file)
+        roi_boxes3d = kitti_utils.objs_to_boxes3d(roi_obj_list)
+        # roi_scores is not used currently
+        # roi_scores = kitti_utils.objs_to_scores(roi_obj_list)
+
+        gt_obj_list = self.filtrate_objects(self.get_label(sample_id))
+        gt_boxes3d = kitti_utils.objs_to_boxes3d(gt_obj_list)
+        sample_info = OrderedDict()
+        sample_info["sample_id"] = sample_id
+        sample_info['rpn_xyz'] = rpn_xyz
+        sample_info['rpn_features'] = rpn_features
+        sample_info['rpn_intensity'] = rpn_intensity
+        sample_info['seg_mask'] = seg_mask
+        sample_info['roi_boxes3d'] = roi_boxes3d
+        sample_info['pts_depth'] = np.linalg.norm(rpn_xyz, ord=2, axis=1)
+        sample_info['gt_boxes3d'] = gt_boxes3d
+
+        return sample_info
+
+    def sample_bg_inds(self, hard_bg_inds, easy_bg_inds, bg_rois_per_this_image):
+        if hard_bg_inds.size > 0 and easy_bg_inds.size > 0:
+            hard_bg_rois_num = int(bg_rois_per_this_image * cfg.RCNN.HARD_BG_RATIO)
+            easy_bg_rois_num = bg_rois_per_this_image - hard_bg_rois_num
+
+            # sampling hard bg
+            rand_num = np.floor(np.random.rand(hard_bg_rois_num) * hard_bg_inds.size).astype(np.int32)
+            hard_bg_inds = hard_bg_inds[rand_num]
+            # sampling easy bg
+            rand_num = np.floor(np.random.rand(easy_bg_rois_num) * easy_bg_inds.size).astype(np.int32)
+            easy_bg_inds = easy_bg_inds[rand_num]
+
+            bg_inds = np.concatenate([hard_bg_inds, easy_bg_inds], axis=0)
+        elif hard_bg_inds.size > 0 and easy_bg_inds.size == 0:
+            hard_bg_rois_num = bg_rois_per_this_image
+            # sampling hard bg
+            rand_num = np.floor(np.random.rand(hard_bg_rois_num) * hard_bg_inds.size).astype(np.int32)
+            bg_inds = hard_bg_inds[rand_num]
+        elif hard_bg_inds.size == 0 and easy_bg_inds.size > 0:
+            easy_bg_rois_num = bg_rois_per_this_image
+            # sampling easy bg
+            rand_num = np.floor(np.random.rand(easy_bg_rois_num) * easy_bg_inds.size).astype(np.int32)
+            bg_inds = easy_bg_inds[rand_num]
+        else:
+            raise NotImplementedError
+
+        return bg_inds
+
+    def aug_roi_by_noise_batch(self, roi_boxes3d, gt_boxes3d, aug_times=10):
+        """
+        :param roi_boxes3d: (N, 7)
+        :param gt_boxes3d: (N, 7)
+        :return:
+        """
+        iou_of_rois = np.zeros(roi_boxes3d.shape[0], dtype=np.float32)
+        for k in range(roi_boxes3d.__len__()):
+            temp_iou = cnt = 0
+            roi_box3d = roi_boxes3d[k]
+            gt_box3d = gt_boxes3d[k]
+            pos_thresh = min(cfg.RCNN.REG_FG_THRESH, cfg.RCNN.CLS_FG_THRESH)
+            gt_corners = kitti_utils.boxes3d_to_corners3d(gt_box3d.reshape(1, 7), True)
+            aug_box3d = roi_box3d
+            while temp_iou < pos_thresh and cnt < aug_times:
+                if np.random.rand() < 0.2:
+                    aug_box3d = roi_box3d  # p=0.2 to keep the original roi box
+                else:
+                    aug_box3d = self.random_aug_box3d(roi_box3d)
+                aug_corners = kitti_utils.boxes3d_to_corners3d(aug_box3d.reshape(1, 7), True)
+                iou3d = kitti_utils.get_iou3d(aug_corners, gt_corners)
+                temp_iou = iou3d[0][0]
+                cnt += 1
+            roi_boxes3d[k] = aug_box3d
+            iou_of_rois[k] = temp_iou
+        return roi_boxes3d, iou_of_rois
+
+    @staticmethod
+    def canonical_transform_batch(pts_input, roi_boxes3d, gt_boxes3d):
+        """
+        :param pts_input: (N, npoints, 3 + C)
+        :param roi_boxes3d: (N, 7)
+        :param gt_boxes3d: (N, 7)
+        :return:
+        """
+        roi_ry = roi_boxes3d[:, 6] % (2 * np.pi)  # 0 ~ 2pi
+        roi_center = roi_boxes3d[:, 0:3]
+        # shift to center
+        pts_input[:, :, [0, 1, 2]] = pts_input[:, :, [0, 1, 2]] - roi_center.reshape(-1, 1, 3)
+        gt_boxes3d_ct = np.copy(gt_boxes3d)
+        gt_boxes3d_ct[:, 0:3] = gt_boxes3d_ct[:, 0:3] - roi_center
+        # rotate to the direction of head
+        gt_boxes3d_ct = kitti_utils.rotate_pc_along_y_np(
+            gt_boxes3d_ct.reshape(-1, 1, 7),
+            roi_ry,
+        )
+        # TODO: check here
+        gt_boxes3d_ct = gt_boxes3d_ct.reshape(-1,7)
+        gt_boxes3d_ct[:, 6] = gt_boxes3d_ct[:, 6] - roi_ry
+        pts_input = kitti_utils.rotate_pc_along_y_np(
+            pts_input, 
+            roi_ry
+        )
+        return pts_input, gt_boxes3d_ct
+
+    def get_rcnn_training_sample_batch(self, index):
+        sample_id = int(self.sample_id_list[index])
+        rpn_xyz, rpn_features, rpn_intensity, seg_mask = \
+            self.get_rpn_features(self.rcnn_training_feature_dir, sample_id)
+
+        # load rois and gt_boxes3d for this sample
+        roi_file = os.path.join(self.rcnn_training_roi_dir, '%06d.txt' % sample_id)
+        roi_obj_list = kitti_utils.get_objects_from_label(roi_file)
+        roi_boxes3d = kitti_utils.objs_to_boxes3d(roi_obj_list)
+        # roi_scores = kitti_utils.objs_to_scores(roi_obj_list)
+
+        gt_obj_list = self.filtrate_objects(self.get_label(sample_id))
+        gt_boxes3d = kitti_utils.objs_to_boxes3d(gt_obj_list)
+
+        # calculate original iou
+        iou3d = kitti_utils.get_iou3d(kitti_utils.boxes3d_to_corners3d(roi_boxes3d, True),
+                                      kitti_utils.boxes3d_to_corners3d(gt_boxes3d, True))
+        max_overlaps, gt_assignment = iou3d.max(axis=1), iou3d.argmax(axis=1)
+        max_iou_of_gt, roi_assignment = iou3d.max(axis=0), iou3d.argmax(axis=0)
+        roi_assignment = roi_assignment[max_iou_of_gt > 0].reshape(-1)
+
+        # sample fg, easy_bg, hard_bg
+        fg_rois_per_image = int(np.round(cfg.RCNN.FG_RATIO * cfg.RCNN.ROI_PER_IMAGE))
+        fg_thresh = min(cfg.RCNN.REG_FG_THRESH, cfg.RCNN.CLS_FG_THRESH)
+        fg_inds = np.nonzero(max_overlaps >= fg_thresh)[0]
+        fg_inds = np.concatenate((fg_inds, roi_assignment), axis=0)  # consider the roi which has max_overlaps with gt as fg
+
+        easy_bg_inds = np.nonzero((max_overlaps < cfg.RCNN.CLS_BG_THRESH_LO))[0]
+        hard_bg_inds = np.nonzero((max_overlaps < cfg.RCNN.CLS_BG_THRESH) &
+                                  (max_overlaps >= cfg.RCNN.CLS_BG_THRESH_LO))[0]
+
+        fg_num_rois = fg_inds.size
+        bg_num_rois = hard_bg_inds.size + easy_bg_inds.size
+
+        if fg_num_rois > 0 and bg_num_rois > 0:
+            # sampling fg
+            fg_rois_per_this_image = min(fg_rois_per_image, fg_num_rois)
+            rand_num = np.random.permutation(fg_num_rois)
+            fg_inds = fg_inds[rand_num[:fg_rois_per_this_image]]
+
+            # sampling bg
+            bg_rois_per_this_image = cfg.RCNN.ROI_PER_IMAGE  - fg_rois_per_this_image
+            bg_inds = self.sample_bg_inds(hard_bg_inds, easy_bg_inds, bg_rois_per_this_image)
+
+        elif fg_num_rois > 0 and bg_num_rois == 0:
+            # sampling fg
+            rand_num = np.floor(np.random.rand(cfg.RCNN.ROI_PER_IMAGE ) * fg_num_rois)
+            # rand_num = torch.from_numpy(rand_num).type_as(gt_boxes3d).long()
+            fg_inds = fg_inds[rand_num]
+            fg_rois_per_this_image = cfg.RCNN.ROI_PER_IMAGE
+            bg_rois_per_this_image = 0
+        elif bg_num_rois > 0 and fg_num_rois == 0:
+            # sampling bg
+            bg_rois_per_this_image = cfg.RCNN.ROI_PER_IMAGE
+            bg_inds = self.sample_bg_inds(hard_bg_inds, easy_bg_inds, bg_rois_per_this_image)
+            fg_rois_per_this_image = 0
+        else:
+            import pdb
+            pdb.set_trace()
+            raise NotImplementedError
+
+        # augment the rois by noise
+        roi_list, roi_iou_list, roi_gt_list = [], [], []
+        if fg_rois_per_this_image > 0:
+            fg_rois_src = roi_boxes3d[fg_inds].copy()
+            gt_of_fg_rois = gt_boxes3d[gt_assignment[fg_inds]]
+            fg_rois, fg_iou3d = self.aug_roi_by_noise_batch(fg_rois_src, gt_of_fg_rois, aug_times=10)
+            roi_list.append(fg_rois)
+            roi_iou_list.append(fg_iou3d)
+            roi_gt_list.append(gt_of_fg_rois)
+
+        if bg_rois_per_this_image > 0:
+            bg_rois_src = roi_boxes3d[bg_inds].copy()
+            gt_of_bg_rois = gt_boxes3d[gt_assignment[bg_inds]]
+            bg_rois, bg_iou3d = self.aug_roi_by_noise_batch(bg_rois_src, gt_of_bg_rois, aug_times=1)
+            roi_list.append(bg_rois)
+            roi_iou_list.append(bg_iou3d)
+            roi_gt_list.append(gt_of_bg_rois)
+
+        rois = np.concatenate(roi_list, axis=0)
+        iou_of_rois = np.concatenate(roi_iou_list, axis=0)
+        gt_of_rois = np.concatenate(roi_gt_list, axis=0)
+
+        # collect extra features for point cloud pooling
+        if cfg.RCNN.USE_INTENSITY:
+            pts_extra_input_list = [rpn_intensity.reshape(-1, 1), seg_mask.reshape(-1, 1)]
+        else:
+            pts_extra_input_list = [seg_mask.reshape(-1, 1)]
+
+        if cfg.RCNN.USE_DEPTH:
+            pts_depth = (np.linalg.norm(rpn_xyz, ord=2, axis=1) / 70.0) - 0.5
+            pts_extra_input_list.append(pts_depth.reshape(-1, 1))
+        pts_extra_input = np.concatenate(pts_extra_input_list, axis=1)
+
+        # pts, pts_feature, boxes3d, pool_extra_width, sampled_pt_num
+        pts_input, pts_features, pts_empty_flag = roipool3d_utils.roipool3d_cpu(
+            rpn_xyz, rpn_features, rois, pts_extra_input,
+            cfg.RCNN.POOL_EXTRA_WIDTH,
+            sampled_pt_num=cfg.RCNN.NUM_POINTS,
+            #canonical_transform=False
+        )
+
+        # data augmentation
+        if cfg.AUG_DATA and self.mode == 'TRAIN':
+            for k in range(rois.__len__()):
+                aug_pts = pts_input[k, :, 0:3].copy()
+                aug_gt_box3d = gt_of_rois[k].copy()
+                aug_roi_box3d = rois[k].copy()
+
+                # calculate alpha by ry
+                temp_boxes3d = np.concatenate([aug_roi_box3d.reshape(1, 7), aug_gt_box3d.reshape(1, 7)], axis=0)
+                temp_x, temp_z, temp_ry = temp_boxes3d[:, 0], temp_boxes3d[:, 2], temp_boxes3d[:, 6]
+                temp_beta = np.arctan2(temp_z, temp_x).astype(np.float64)
+                temp_alpha = -np.sign(temp_beta) * np.pi / 2 + temp_beta + temp_ry
+
+                # data augmentation
+                aug_pts, aug_boxes3d, aug_method = self.data_augmentation(aug_pts, temp_boxes3d, temp_alpha,
+                                                                          mustaug=True, stage=2)
+
+                # assign to original data
+                pts_input[k, :, 0:3] = aug_pts
+                rois[k] = aug_boxes3d[0]
+                gt_of_rois[k] = aug_boxes3d[1]
+
+        valid_mask = (pts_empty_flag == 0).astype(np.int32)
+        # regression valid mask
+        reg_valid_mask = (iou_of_rois > cfg.RCNN.REG_FG_THRESH).astype(np.int32) & valid_mask
+
+        # classification label
+        cls_label = (iou_of_rois > cfg.RCNN.CLS_FG_THRESH).astype(np.int32)
+        invalid_mask = (iou_of_rois > cfg.RCNN.CLS_BG_THRESH) & (iou_of_rois < cfg.RCNN.CLS_FG_THRESH)
+        cls_label[invalid_mask] = -1
+        cls_label[valid_mask == 0] = -1
+
+        # canonical transform and sampling
+        pts_input_ct, gt_boxes3d_ct = self.canonical_transform_batch(pts_input, rois, gt_of_rois)
+
+        pts_input_ = np.concatenate((pts_input_ct, pts_features), axis=-1)
+        sample_info = OrderedDict()
+
+        sample_info['sample_id'] = sample_id
+        sample_info['pts_input'] = pts_input_
+        sample_info['pts_feature'] = pts_features
+        sample_info['roi_boxes3d'] = rois
+        sample_info['cls_label'] = cls_label
+        sample_info['reg_valid_mask'] = reg_valid_mask
+        sample_info['gt_boxes3d_ct'] = gt_boxes3d_ct
+        sample_info['gt_of_rois'] = gt_of_rois
+        return sample_info
+
+    @staticmethod
+    def random_aug_box3d(box3d):
+        """
+        :param box3d: (7) [x, y, z, h, w, l, ry]
+        random shift, scale, orientation
+        """
+        if cfg.RCNN.REG_AUG_METHOD == 'single':
+            pos_shift = (np.random.rand(3) - 0.5)  # [-0.5 ~ 0.5]
+            hwl_scale = (np.random.rand(3) - 0.5) / (0.5 / 0.15) + 1.0  #
+            angle_rot = (np.random.rand(1) - 0.5) / (0.5 / (np.pi / 12))  # [-pi/12 ~ pi/12]
+
+            aug_box3d = np.concatenate([box3d[0:3] + pos_shift, box3d[3:6] * hwl_scale,
+                                        box3d[6:7] + angle_rot])
+            return aug_box3d
+        elif cfg.RCNN.REG_AUG_METHOD == 'multiple':
+            # pos_range, hwl_range, angle_range, mean_iou
+            range_config = [[0.2, 0.1, np.pi / 12, 0.7],
+                            [0.3, 0.15, np.pi / 12, 0.6],
+                            [0.5, 0.15, np.pi / 9, 0.5],
+                            [0.8, 0.15, np.pi / 6, 0.3],
+                            [1.0, 0.15, np.pi / 3, 0.2]]
+            idx = np.random.randint(len(range_config))
+
+            pos_shift = ((np.random.rand(3) - 0.5) / 0.5) * range_config[idx][0]
+            hwl_scale = ((np.random.rand(3) - 0.5) / 0.5) * range_config[idx][1] + 1.0
+            angle_rot = ((np.random.rand(1) - 0.5) / 0.5) * range_config[idx][2]
+
+            aug_box3d = np.concatenate([box3d[0:3] + pos_shift, box3d[3:6] * hwl_scale, box3d[6:7] + angle_rot])
+            return aug_box3d
+        elif cfg.RCNN.REG_AUG_METHOD == 'normal':
+            x_shift = np.random.normal(loc=0, scale=0.3)
+            y_shift = np.random.normal(loc=0, scale=0.2)
+            z_shift = np.random.normal(loc=0, scale=0.3)
+            h_shift = np.random.normal(loc=0, scale=0.25)
+            w_shift = np.random.normal(loc=0, scale=0.15)
+            l_shift = np.random.normal(loc=0, scale=0.5)
+            ry_shift = ((np.random.rand() - 0.5) / 0.5) * np.pi / 12
+
+            aug_box3d = np.array([box3d[0] + x_shift, box3d[1] + y_shift, box3d[2] + z_shift, box3d[3] + h_shift,
+                                  box3d[4] + w_shift, box3d[5] + l_shift, box3d[6] + ry_shift])
+            return aug_box3d
+        else:
+            raise NotImplementedError
+
+    def get_proposal_from_file(self, index):
+        sample_id = int(self.image_idx_list[index])
+        proposal_file = os.path.join(self.rcnn_eval_roi_dir, '%06d.txt' % sample_id)
+        roi_obj_list = kitti_utils.get_objects_from_label(proposal_file)
+
+        rpn_xyz, rpn_features, rpn_intensity, seg_mask = self.get_rpn_features(self.rcnn_eval_feature_dir, sample_id)
+        pts_rect, pts_rpn_features, pts_intensity = rpn_xyz, rpn_features, rpn_intensity
+
+        roi_box3d_list, roi_scores = [], []
+        for obj in roi_obj_list:
+            box3d = np.array([obj.pos[0], obj.pos[1], obj.pos[2], obj.h, obj.w, obj.l, obj.ry], dtype=np.float32)
+            roi_box3d_list.append(box3d.reshape(1, 7))
+            roi_scores.append(obj.score)
+
+        roi_boxes3d = np.concatenate(roi_box3d_list, axis=0)  # (N, 7)
+        roi_scores = np.array(roi_scores, dtype=np.float32)  # (N)
+
+        if cfg.RCNN.ROI_SAMPLE_JIT:
+            sample_dict = {'sample_id': sample_id,
+                           'rpn_xyz': rpn_xyz,
+                           'rpn_features': rpn_features,
+                           'seg_mask': seg_mask,
+                           'roi_boxes3d': roi_boxes3d,
+                           'roi_scores': roi_scores,
+                           'pts_depth': np.linalg.norm(rpn_xyz, ord=2, axis=1)}
+
+            if self.mode != 'TEST':
+                gt_obj_list = self.filtrate_objects(self.get_label(sample_id))
+                gt_boxes3d = kitti_utils.objs_to_boxes3d(gt_obj_list)
+
+                roi_corners = kitti_utils.boxes3d_to_corners3d(roi_boxes3d,True)
+                gt_corners = kitti_utils.boxes3d_to_corners3d(gt_boxes3d,True)
+                iou3d = kitti_utils.get_iou3d(roi_corners, gt_corners)
+                if gt_boxes3d.shape[0] > 0:
+                    gt_iou = iou3d.max(axis=1)
+                else:
+                    gt_iou = np.zeros(roi_boxes3d.shape[0]).astype(np.float32)
+
+                sample_dict['gt_boxes3d'] = gt_boxes3d
+                sample_dict['gt_iou'] = gt_iou
+            return sample_dict
+
+        if cfg.RCNN.USE_INTENSITY:
+            pts_extra_input_list = [pts_intensity.reshape(-1, 1), seg_mask.reshape(-1, 1)]
+        else:
+            pts_extra_input_list = [seg_mask.reshape(-1, 1)]
+
+        if cfg.RCNN.USE_DEPTH:
+            cur_depth = np.linalg.norm(pts_rect, axis=1, ord=2)
+            cur_depth_norm = (cur_depth / 70.0) - 0.5
+            pts_extra_input_list.append(cur_depth_norm.reshape(-1, 1))
+
+        pts_extra_input = np.concatenate(pts_extra_input_list, axis=1)
+        pts_input, pts_features, _ = roipool3d_utils.roipool3d_cpu(
+            pts_rect, pts_rpn_features, roi_boxes3d, pts_extra_input, 
+            cfg.RCNN.POOL_EXTRA_WIDTH, sampled_pt_num=cfg.RCNN.NUM_POINTS,
+            canonical_transform=True
+        )
+        pts_input = np.concatenate((pts_input, pts_features), axis=-1)
+        
+        sample_dict = OrderedDict()
+        sample_dict['sample_id'] = sample_id 
+        sample_dict['pts_input'] = pts_input 
+        sample_dict['pts_feature'] = pts_features 
+        sample_dict['roi_boxes3d'] = roi_boxes3d 
+        sample_dict['roi_scores'] = roi_scores 
+        #sample_dict['roi_size'] = roi_boxes3d[:, 3:6]
+
+        if self.mode == 'TEST':
+            return sample_dict
+
+        gt_obj_list = self.filtrate_objects(self.get_label(sample_id))
+        gt_boxes3d = np.zeros((gt_obj_list.__len__(), 7), dtype=np.float32)
+
+        for k, obj in enumerate(gt_obj_list):
+            gt_boxes3d[k, 0:3], gt_boxes3d[k, 3], gt_boxes3d[k, 4], gt_boxes3d[k, 5], gt_boxes3d[k, 6] \
+                = obj.pos, obj.h, obj.w, obj.l, obj.ry
+
+        if gt_boxes3d.__len__() == 0:
+            gt_iou = np.zeros((roi_boxes3d.shape[0]), dtype=np.float32)
+        else:
+            roi_corners = kitti_utils.boxes3d_to_corners3d(roi_boxes3d,True)
+            gt_corners = kitti_utils.boxes3d_to_corners3d(gt_boxes3d,True)
+            iou3d = kitti_utils.get_iou3d(roi_corners, gt_corners)
+            gt_iou = iou3d.max(axis=1)
+
+        sample_dict['gt_iou'] = gt_iou
+        sample_dict['gt_boxes3d'] = gt_boxes3d
+        
+        return sample_dict
+
+    def __len__(self):
+        if cfg.RPN.ENABLED:
+            return len(self.sample_id_list)
+        elif cfg.RCNN.ENABLED:
+            if self.mode == 'TRAIN':
+                return len(self.sample_id_list)
+            else:
+                return len(self.image_idx_list)
+        else:
+            raise NotImplementedError
+
+    def __getitem__(self, index):
+        if cfg.RPN.ENABLED:
+            return self.get_rpn_sample(index)
+        elif cfg.RCNN.ENABLED:
+            if self.mode == 'TRAIN':
+                if cfg.RCNN.ROI_SAMPLE_JIT:
+                    return self.get_rcnn_sample_jit(index)
+                else:
+                    return self.get_rcnn_training_sample_batch(index)
+            else:
+                return self.get_proposal_from_file(index)
+        else:
+            raise NotImplementedError
+
+    def padding_batch(self, batch_data, batch_size):
+        max_roi = 0
+        max_gt = 0 
+        
+        for k in range(batch_size):
+            # roi_boxes3d
+            max_roi = max(max_roi, batch_data[k][3].shape[0])
+            # gt_boxes3d
+            max_gt = max(max_gt, batch_data[k][-1].shape[0])
+        batch_roi_boxes3d = np.zeros((batch_size, max_roi, 7))
+        batch_gt_boxes3d = np.zeros((batch_size, max_gt, 7), dtype=np.float32)
+        
+        for i, data in enumerate(batch_data):
+            roi_num = data[3].shape[0]
+            gt_num = data[-1].shape[0]
+            batch_roi_boxes3d[i,:roi_num,:] = data[3]
+            batch_gt_boxes3d[i,:gt_num,:] = data[-1]
+
+        new_batch = []
+        for i, data in enumerate(batch_data):
+            new_batch.append(data[:3])
+            # roi_boxes3d
+            new_batch[i].append(batch_roi_boxes3d[i])
+            # ... 
+            new_batch[i].extend(data[4:7])
+            # gt_boxes3d
+            new_batch[i].append(batch_gt_boxes3d[i])
+        return new_batch
+
+    def padding_batch_eval(self, batch_data, batch_size):
+        max_pts = 0 
+        max_feats = 0
+        max_roi = 0
+        max_score = 0
+        max_iou = 0
+        max_gt = 0
+        
+        for k in range(batch_size):
+            # pts_input
+            max_pts = max(max_pts, batch_data[k][1].shape[0])
+            # pts_feature
+            max_feats = max(max_feats, batch_data[k][2].shape[0])
+            # roi_boxes3d
+            max_roi = max(max_roi, batch_data[k][3].shape[0])
+            # gt_iou 
+            max_iou = max(max_iou, batch_data[k][-2].shape[0])
+            # gt_boxes3d
+            max_gt = max(max_gt, batch_data[k][-1].shape[0])
+        batch_pts_input = np.zeros((batch_size, max_pts, 512, 133), dtype=np.float32)
+        batch_pts_feat = np.zeros((batch_size, max_feats, 512, 128), dtype=np.float32)
+        batch_roi_boxes3d = np.zeros((batch_size, max_roi, 7), dtype=np.float32)
+        batch_gt_iou = np.zeros((batch_size, max_iou), dtype=np.float32)
+        batch_gt_boxes3d = np.zeros((batch_size, max_gt, 7), dtype=np.float32)
+        
+        for i, data in enumerate(batch_data):
+            # num
+            pts_num = data[1].shape[0]
+            pts_feat_num = data[2].shape[0]
+            roi_num = data[3].shape[0]
+            iou_num = data[-2].shape[0]
+            gt_num = data[-1].shape[0]
+            # data
+            batch_pts_input[i, :pts_num, :, :] = data[1]
+            batch_pts_feat[i, :pts_feat_num, :, :] = data[2]
+            batch_roi_boxes3d[i,:roi_num,:] = data[3]
+            batch_gt_iou[i,:iou_num] = data[-2] 
+            batch_gt_boxes3d[i,:gt_num,:] = data[-1]
+            
+        new_batch = []
+        for i, data in enumerate(batch_data):
+            new_batch.append(data[:1])
+            new_batch[i].append(batch_pts_input[i])
+            new_batch[i].append(batch_pts_feat[i])
+            new_batch[i].append(batch_roi_boxes3d[i])
+            new_batch[i].append(data[4])
+            new_batch[i].append(batch_gt_iou[i])
+            new_batch[i].append(batch_gt_boxes3d[i])
+        return new_batch
+
+    def get_reader(self, batch_size, fields, drop_last=False):
+        def reader():
+            batch_out = []
+            idxs = np.arange(self.__len__())
+            if self.mode == 'TRAIN':
+                np.random.shuffle(idxs)
+            for idx in idxs:
+                sample_all = self.__getitem__(idx)
+                sample = [sample_all[f] for f in fields]
+                if has_empty(sample):
+                    logger.info("sample field: %d has empty field"%len(sample))
+                    continue
+                batch_out.append(sample)
+                if len(batch_out) >= batch_size:
+                    if cfg.RPN.ENABLED:
+                        yield batch_out
+                    else:
+                        if self.mode == 'TRAIN':
+                            yield self.padding_batch(batch_out, batch_size)
+                        elif self.mode == 'EVAL':
+                            # batch_size can should be 1 in rcnn_offline eval currently
+                            # if batch_size > 1, batch should be padded as follow
+                            # yield self.padding_batch_eval(batch_out, batch_size)
+                            yield batch_out
+                        else:
+                            logger.error("not only support train/eval padding")
+                    batch_out = []
+            if not drop_last:
+                if len(batch_out) > 0:
+                    yield batch_out
+        return reader
+
+    def get_multiprocess_reader(self, batch_size, fields, proc_num=8, max_queue_len=128, drop_last=False):
+        def read_to_queue(idxs, queue):
+            for idx in idxs:
+                sample_all = self.__getitem__(idx)
+                sample = [sample_all[f] for f in fields]
+                queue.put(sample)
+            queue.put(None)
+
+        def reader():
+            sample_num = self.__len__()
+            idxs = np.arange(self.__len__())
+            if self.mode == 'TRAIN':
+                np.random.shuffle(idxs)
+
+            proc_idxs = []
+            proc_sample_num = int(sample_num / proc_num)
+            start_idx = 0
+            for i in range(proc_num - 1):
+                proc_idxs.append(idxs[start_idx:start_idx + proc_sample_num])
+                start_idx += proc_sample_num
+            proc_idxs.append(idxs[start_idx:])
+
+            queue = multiprocessing.Queue(max_queue_len)
+            p_list = []
+            for i in range(proc_num):
+                p_list.append(multiprocessing.Process(
+                    target=read_to_queue, args=(proc_idxs[i], queue,)))
+                p_list[-1].start()
+
+            finish_num = 0
+            batch_out = []
+            while finish_num < len(p_list):
+                sample = queue.get()
+                if sample is None:
+                    finish_num += 1
+                else:
+                    batch_out.append(sample)
+                    if len(batch_out) == batch_size:
+                        yield batch_out
+                        batch_out = []
+
+            # join process
+            for p in p_list:
+                if p.is_alive():
+                    p.join()
+
+        return reader
+
--- a/PaddleCV/Paddle3D/PointRCNN/eval.py
+++ b/PaddleCV/Paddle3D/PointRCNN/eval.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import time
+import shutil
+import argparse
+import logging
+import multiprocessing
+import numpy as np
+from collections import OrderedDict 
+import paddle
+import paddle.fluid as fluid
+
+from models.point_rcnn import PointRCNN
+from data.kitti_rcnn_reader import KittiRCNNReader
+from utils.run_utils import *
+from utils.config import cfg, load_config, set_config_from_list
+from utils.metric_utils import calc_iou_recall, rpn_metric, rcnn_metric
+
+logging.root.handlers = []
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
+logger = logging.getLogger(__name__)
+
+np.random.seed(1024)  # use same seed
+METRIC_PROC_NUM = 4
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        "PointRCNN semantic segmentation train script")
+    parser.add_argument(
+        '--cfg',
+        type=str,
+        default='cfgs/default.yml',
+        help='specify the config for training')
+    parser.add_argument(
+        '--eval_mode',
+        type=str,
+        default='rpn',
+        required=True,
+        help='specify the training mode')
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=1,
+        help='evaluation batch size, default 1')
+    parser.add_argument(
+        '--ckpt_dir',
+        type=str,
+        default='checkpoints/199',
+        help='specify a ckpt directory to be evaluated if needed')
+    parser.add_argument(
+        '--data_dir',
+        type=str,
+        default='./data',
+        help='KITTI dataset root directory')
+    parser.add_argument(
+        '--output_dir',
+        type=str,
+        default='output',
+        help='output directory')
+    parser.add_argument(
+        '--save_rpn_feature',
+        action='store_true',
+        default=False,
+        help='save features for separately rcnn training and evaluation')
+    parser.add_argument(
+        '--save_result',
+        action='store_true',
+        default=False,
+        help='save roi and refine result of evaluation')
+    parser.add_argument(
+        '--rcnn_eval_roi_dir',
+        type=str,
+        default=None,
+        help='specify the saved rois for rcnn evaluation when using rcnn_offline mode')
+    parser.add_argument(
+        '--rcnn_eval_feature_dir',
+        type=str,
+        default=None,
+        help='specify the saved features for rcnn evaluation when using rcnn_offline mode')
+    parser.add_argument(
+        '--log_interval',
+        type=int,
+        default=1,
+        help='mini-batch interval to log.')
+    parser.add_argument(
+        '--set',
+        dest='set_cfgs',
+        default=None,
+        nargs=argparse.REMAINDER,
+        help='set extra config keys if needed.')
+    args = parser.parse_args()
+    return args
+
+
+def eval():
+    args = parse_args()
+    print_arguments(args)
+    # check whether the installed paddle is compiled with GPU
+    # PointRCNN model can only run on GPU
+    check_gpu(True)
+
+    load_config(args.cfg)
+    if args.set_cfgs is not None:
+        set_config_from_list(args.set_cfgs)
+
+    if not os.path.isdir(args.output_dir):
+        os.makedirs(args.output_dir)
+
+    if args.eval_mode == 'rpn':
+        cfg.RPN.ENABLED = True
+        cfg.RCNN.ENABLED = False
+    elif args.eval_mode == 'rcnn':
+        cfg.RCNN.ENABLED = True
+        cfg.RPN.ENABLED = cfg.RPN.FIXED = True
+        assert args.batch_size, "batch size must be 1 in rcnn evaluation"
+    elif args.eval_mode == 'rcnn_offline':
+        cfg.RCNN.ENABLED = True
+        cfg.RPN.ENABLED = False
+        assert args.batch_size, "batch size must be 1 in rcnn_offline evaluation"
+    else:
+        raise NotImplementedError("unkown eval mode: {}".format(args.eval_mode))
+
+    place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+
+    # build model
+    startup = fluid.Program()
+    eval_prog = fluid.Program()
+    with fluid.program_guard(eval_prog, startup):
+        with fluid.unique_name.guard():
+            eval_model = PointRCNN(cfg, args.batch_size, True, 'TEST')
+            eval_model.build()
+            eval_pyreader = eval_model.get_pyreader()
+            eval_feeds = eval_model.get_feeds()
+            eval_outputs = eval_model.get_outputs()
+    eval_prog = eval_prog.clone(True)
+
+    extra_keys = []
+    if args.eval_mode == 'rpn':
+        extra_keys.extend(['sample_id', 'rpn_cls_label', 'gt_boxes3d'])
+        if args.save_rpn_feature:
+            extra_keys.extend(['pts_rect', 'pts_features', 'pts_input',])
+    eval_keys, eval_values = parse_outputs(
+        eval_outputs, prog=eval_prog, extra_keys=extra_keys)
+
+    eval_compile_prog = fluid.compiler.CompiledProgram(
+        eval_prog).with_data_parallel()
+
+    exe.run(startup)
+
+    # load checkpoint
+    assert os.path.isdir(
+        args.ckpt_dir), "ckpt_dir {} not a directory".format(args.ckpt_dir)
+
+    def if_exist(var):
+        return os.path.exists(os.path.join(args.ckpt_dir, var.name))
+    fluid.io.load_vars(exe, args.ckpt_dir, eval_prog, predicate=if_exist)
+
+    kitti_feature_dir = os.path.join(args.output_dir, 'features')
+    kitti_output_dir = os.path.join(args.output_dir, 'detections', 'data')
+    seg_output_dir = os.path.join(args.output_dir, 'seg_result')
+    if args.save_rpn_feature:
+        if os.path.exists(kitti_feature_dir):
+            shutil.rmtree(kitti_feature_dir)
+        os.makedirs(kitti_feature_dir)
+        if os.path.exists(kitti_output_dir):
+            shutil.rmtree(kitti_output_dir)
+        os.makedirs(kitti_output_dir)
+        if os.path.exists(seg_output_dir):
+            shutil.rmtree(seg_output_dir)
+        os.makedirs(seg_output_dir)
+
+    # must make sure these dirs existing 
+    roi_output_dir = os.path.join('./result_dir', 'roi_result', 'data')
+    refine_output_dir = os.path.join('./result_dir', 'refine_result', 'data')
+    final_output_dir = os.path.join("./result_dir", 'final_result', 'data')
+    if not os.path.exists(final_output_dir):
+        os.makedirs(final_output_dir)
+    if args.save_result:
+        if not os.path.exists(roi_output_dir):
+            os.makedirs(roi_output_dir)
+        if not os.path.exists(refine_output_dir):
+            os.makedirs(refine_output_dir)
+
+    # get reader
+    kitti_rcnn_reader = KittiRCNNReader(data_dir=args.data_dir,
+                                        npoints=cfg.RPN.NUM_POINTS,
+                                        split=cfg.TEST.SPLIT,
+                                        mode='EVAL',
+                                        classes=cfg.CLASSES,
+                                        rcnn_eval_roi_dir=args.rcnn_eval_roi_dir,
+                                        rcnn_eval_feature_dir=args.rcnn_eval_feature_dir)
+    eval_reader = kitti_rcnn_reader.get_multiprocess_reader(args.batch_size, eval_feeds)
+    eval_pyreader.decorate_sample_list_generator(eval_reader, place)
+
+    thresh_list = [0.1, 0.3, 0.5, 0.7, 0.9]
+    queue = multiprocessing.Queue(128)
+    mgr = multiprocessing.Manager()
+    lock = multiprocessing.Lock()
+    mdict = mgr.dict()
+    if cfg.RPN.ENABLED:
+        mdict['exit_proc'] = 0
+        mdict['total_gt_bbox'] = 0
+        mdict['total_cnt'] = 0
+        mdict['total_rpn_iou'] = 0
+        for i in range(len(thresh_list)):
+            mdict['total_recalled_bbox_list_{}'.format(i)] = 0
+
+        p_list = []
+        for i in range(METRIC_PROC_NUM):
+            p_list.append(multiprocessing.Process(
+                target=rpn_metric,
+                args=(queue, mdict, lock, thresh_list, args.save_rpn_feature, kitti_feature_dir,
+                      seg_output_dir, kitti_output_dir, kitti_rcnn_reader, cfg.CLASSES)))
+            p_list[-1].start()
+    
+    if cfg.RCNN.ENABLED:
+        for i in range(len(thresh_list)):
+            mdict['total_recalled_bbox_list_{}'.format(i)] = 0
+            mdict['total_roi_recalled_bbox_list_{}'.format(i)] = 0
+        mdict['exit_proc'] = 0
+        mdict['total_cls_acc'] = 0 
+        mdict['total_cls_acc_refined'] = 0
+        mdict['total_det_num'] = 0
+        mdict['total_gt_bbox'] = 0
+        p_list = []
+        for i in range(METRIC_PROC_NUM):
+            p_list.append(multiprocessing.Process(
+                target=rcnn_metric,
+                args=(queue, mdict, lock, thresh_list, kitti_rcnn_reader, roi_output_dir,
+                      refine_output_dir, final_output_dir, args.save_result)
+            ))
+            p_list[-1].start()
+
+    try:
+        eval_pyreader.start()
+        eval_iter = 0
+        start_time = time.time()
+        
+        cur_time = time.time()
+        while True:
+            eval_outs = exe.run(eval_compile_prog, fetch_list=eval_values, return_numpy=False)
+            rets_dict = {k: (np.array(v), v.recursive_sequence_lengths()) 
+                    for k, v in zip(eval_keys, eval_outs)}
+            run_time = time.time() - cur_time
+            cur_time = time.time()
+            queue.put(rets_dict)
+            eval_iter += 1
+
+            logger.info("[EVAL] iter {}, time: {:.2f}".format(
+                eval_iter, run_time))
+
+    except fluid.core.EOFException:
+        # terminate metric process
+        for i in range(METRIC_PROC_NUM):
+            queue.put(None)
+        while mdict['exit_proc'] < METRIC_PROC_NUM:
+            time.sleep(1)
+        for p in p_list:
+            if p.is_alive():
+                p.join()
+
+        end_time = time.time()
+        logger.info("[EVAL] total {} iter finished, average time: {:.2f}".format(
+            eval_iter, (end_time - start_time) / float(eval_iter)))
+
+        if cfg.RPN.ENABLED:
+            avg_rpn_iou = mdict['total_rpn_iou'] / max(len(kitti_rcnn_reader), 1.)
+            logger.info("average rpn iou: {:.3f}".format(avg_rpn_iou))
+            total_gt_bbox = float(max(mdict['total_gt_bbox'], 1.0))
+            for idx, thresh in enumerate(thresh_list):
+                recall = mdict['total_recalled_bbox_list_{}'.format(idx)] / total_gt_bbox
+                logger.info("total bbox recall(thresh={:.3f}): {} / {} = {:.3f}".format(
+                    thresh, mdict['total_recalled_bbox_list_{}'.format(idx)], mdict['total_gt_bbox'], recall))
+
+        if cfg.RCNN.ENABLED:
+            cnt = float(max(eval_iter, 1.0))
+            avg_cls_acc = mdict['total_cls_acc'] / cnt
+            avg_cls_acc_refined = mdict['total_cls_acc_refined'] / cnt
+            avg_det_num = mdict['total_det_num'] / cnt
+            
+            logger.info("avg_cls_acc: {}".format(avg_cls_acc))
+            logger.info("avg_cls_acc_refined: {}".format(avg_cls_acc_refined))
+            logger.info("avg_det_num: {}".format(avg_det_num))             
+            
+            total_gt_bbox = float(max(mdict['total_gt_bbox'], 1.0))
+            for idx, thresh in enumerate(thresh_list):
+                cur_roi_recall = mdict['total_roi_recalled_bbox_list_{}'.format(idx)] / total_gt_bbox
+                logger.info('total roi bbox recall(thresh=%.3f): %d / %d = %f' % (
+                    thresh, mdict['total_roi_recalled_bbox_list_{}'.format(idx)], total_gt_bbox, cur_roi_recall))
+            
+            for idx, thresh in enumerate(thresh_list):
+                cur_recall = mdict['total_recalled_bbox_list_{}'.format(idx)] / total_gt_bbox
+                logger.info('total bbox recall(thresh=%.2f) %d / %.2f = %.4f' % (
+                    thresh, mdict['total_recalled_bbox_list_{}'.format(idx)], total_gt_bbox, cur_recall))
+            
+            split_file = os.path.join('./data/KITTI', 'ImageSets', 'val.txt')
+            image_idx_list = [x.strip() for x in open(split_file).readlines()]
+            for k in range(image_idx_list.__len__()):
+                cur_file = os.path.join(final_output_dir, '%s.txt' % image_idx_list[k])
+                if not os.path.exists(cur_file):
+                    with open(cur_file, 'w') as temp_f:
+                        pass
+
+            if float(sys.version[:3]) >= 3.6:
+                label_dir = os.path.join('./data/KITTI/object/training', 'label_2')
+                split_file = os.path.join('./data/KITTI', 'ImageSets', 'val.txt')
+                final_output_dir = os.path.join("./result_dir", 'final_result', 'data')
+                name_to_class = {'Car': 0, 'Pedestrian': 1, 'Cyclist': 2}
+
+                from tools.kitti_object_eval_python.evaluate import evaluate as kitti_evaluate 
+                ap_result_str, ap_dict = kitti_evaluate(
+                    label_dir, final_output_dir, label_split_file=split_file,
+                     current_class=name_to_class["Car"])
+
+                logger.info("KITTI evaluate: {}, {}".format(ap_result_str, ap_dict))
+
+            else:
+                logger.info("KITTI mAP only support python version >= 3.6, users can "
+                            "run 'python3 tools/kitti_eval.py' to evaluate KITTI mAP.")
+
+    finally:
+        eval_pyreader.reset()
+
+
+if __name__ == "__main__":
+    eval()
--- a/PaddleCV/Paddle3D/PointRCNN/ext_op
+++ b/PaddleCV/Paddle3D/PointRCNN/ext_op
+../PointNet++/ext_op
\ No newline at end of file
--- a/PaddleCV/Paddle3D/PointRCNN/images/teaser.png
+++ b/PaddleCV/Paddle3D/PointRCNN/images/teaser.png
--- a/PaddleCV/Paddle3D/PointRCNN/models/__init__.py
+++ b/PaddleCV/Paddle3D/PointRCNN/models/__init__.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
--- a/PaddleCV/Paddle3D/PointRCNN/models/loss_utils.py
+++ b/PaddleCV/Paddle3D/PointRCNN/models/loss_utils.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Constant
+
+__all__ = ["get_reg_loss"]
+
+
+def sigmoid_focal_loss(logits, labels, weights, gamma=2.0, alpha=0.25):
+    sce_loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, labels)
+    prob = fluid.layers.sigmoid(logits)
+    p_t = labels * prob + (1.0 - labels) * (1.0 - prob)
+    modulating_factor = fluid.layers.pow(1.0 - p_t, gamma)
+    alpha_weight_factor = labels * alpha + (1.0 - labels) * (1.0 - alpha)
+    return modulating_factor * alpha_weight_factor * sce_loss * weights
+
+
+def get_reg_loss(pred_reg, reg_label, fg_mask, point_num, loc_scope,
+                 loc_bin_size, num_head_bin, anchor_size,
+                 get_xz_fine=True, get_y_by_bin=False, loc_y_scope=0.5,
+                 loc_y_bin_size=0.25, get_ry_fine=False):
+
+    """
+    Bin-based 3D bounding boxes regression loss. See https://arxiv.org/abs/1812.04244 for more details.
+
+    :param pred_reg: (N, C)
+    :param reg_label: (N, 7) [dx, dy, dz, h, w, l, ry]
+    :param loc_scope: constant
+    :param loc_bin_size: constant
+    :param num_head_bin: constant
+    :param anchor_size: (N, 3) or (3)
+    :param get_xz_fine:
+    :param get_y_by_bin:
+    :param loc_y_scope:
+    :param loc_y_bin_size:
+    :param get_ry_fine:
+    :return:
+    """
+    fg_num = fluid.layers.cast(fluid.layers.reduce_sum(fg_mask), dtype=pred_reg.dtype)
+    fg_num = fluid.layers.clip(fg_num, min=1.0, max=point_num)
+    fg_scale = float(point_num) / fg_num
+
+    per_loc_bin_num = int(loc_scope / loc_bin_size) * 2
+    loc_y_bin_num = int(loc_y_scope / loc_y_bin_size) * 2
+
+    reg_loss_dict = {}
+
+    # xz localization loss
+    x_offset_label, y_offset_label, z_offset_label = reg_label[:, 0:1], reg_label[:, 1:2], reg_label[:, 2:3]
+    x_shift = fluid.layers.clip(x_offset_label + loc_scope, 0., loc_scope * 2 - 1e-3)
+    z_shift = fluid.layers.clip(z_offset_label + loc_scope, 0., loc_scope * 2 - 1e-3)
+    x_bin_label = fluid.layers.cast(x_shift / loc_bin_size, dtype='int64')
+    z_bin_label = fluid.layers.cast(z_shift / loc_bin_size, dtype='int64')
+
+    x_bin_l, x_bin_r = 0, per_loc_bin_num
+    z_bin_l, z_bin_r = per_loc_bin_num, per_loc_bin_num * 2
+    start_offset = z_bin_r
+
+    loss_x_bin = fluid.layers.softmax_with_cross_entropy(pred_reg[:, x_bin_l: x_bin_r], x_bin_label)
+    loss_x_bin = fluid.layers.reduce_mean(loss_x_bin * fg_mask) * fg_scale
+    loss_z_bin = fluid.layers.softmax_with_cross_entropy(pred_reg[:, z_bin_l: z_bin_r], z_bin_label)
+    loss_z_bin = fluid.layers.reduce_mean(loss_z_bin * fg_mask) * fg_scale
+    reg_loss_dict['loss_x_bin'] = loss_x_bin
+    reg_loss_dict['loss_z_bin'] = loss_z_bin
+    loc_loss = loss_x_bin + loss_z_bin
+
+    if get_xz_fine:
+        x_res_l, x_res_r = per_loc_bin_num * 2, per_loc_bin_num * 3
+        z_res_l, z_res_r = per_loc_bin_num * 3, per_loc_bin_num * 4
+        start_offset = z_res_r
+
+        x_res_label = x_shift - (fluid.layers.cast(x_bin_label, dtype=x_shift.dtype) * loc_bin_size + loc_bin_size / 2.)
+        z_res_label = z_shift - (fluid.layers.cast(z_bin_label, dtype=z_shift.dtype) * loc_bin_size + loc_bin_size / 2.)
+        x_res_norm_label = x_res_label / loc_bin_size
+        z_res_norm_label = z_res_label / loc_bin_size
+
+        x_bin_onehot = fluid.layers.one_hot(x_bin_label, depth=per_loc_bin_num)
+        z_bin_onehot = fluid.layers.one_hot(z_bin_label, depth=per_loc_bin_num)
+
+        loss_x_res = fluid.layers.smooth_l1(fluid.layers.reduce_sum(pred_reg[:, x_res_l: x_res_r] * x_bin_onehot, dim=1, keep_dim=True), x_res_norm_label)
+        loss_x_res = fluid.layers.reduce_mean(loss_x_res * fg_mask) * fg_scale
+        loss_z_res = fluid.layers.smooth_l1(fluid.layers.reduce_sum(pred_reg[:, z_res_l: z_res_r] * z_bin_onehot, dim=1, keep_dim=True), z_res_norm_label)
+        loss_z_res = fluid.layers.reduce_mean(loss_z_res * fg_mask) * fg_scale
+        reg_loss_dict['loss_x_res'] = loss_x_res
+        reg_loss_dict['loss_z_res'] = loss_z_res
+        loc_loss += loss_x_res + loss_z_res
+
+    # y localization loss
+    if get_y_by_bin:
+        y_bin_l, y_bin_r = start_offset, start_offset + loc_y_bin_num
+        y_res_l, y_res_r = y_bin_r, y_bin_r + loc_y_bin_num
+        start_offset = y_res_r
+
+        y_shift = fluid.layers.clip(y_offset_label + loc_y_scope, 0., loc_y_scope * 2 - 1e-3)
+        y_bin_label = fluid.layers.cast(y_shift / loc_y_bin_size, dtype='int64')
+        y_res_label = y_shift - (fluid.layers.cast(y_bin_label, dtype=y_shift.dtype) * loc_y_bin_size + loc_y_bin_size / 2.)
+        y_res_norm_label = y_res_label / loc_y_bin_size
+
+        y_bin_onehot = fluid.layers.one_hot(y_bin_label, depth=per_loc_bin_num)
+
+        loss_y_bin = fluid.layers.cross_entropy(pred_reg[:, y_bin_l: y_bin_r], y_bin_label)
+        loss_y_bin = fluid.layers.reduce_mean(loss_y_bin * fg_mask) * fg_scale
+        loss_y_res = fluid.layers.smooth_l1(fluid.layers.reduce_sum(pred_reg[:, y_res_l: y_res_r] * y_bin_onehot, dim=1, keep_dim=True), y_res_norm_label)
+        loss_y_res = fluid.layers.reduce_mean(loss_y_res * fg_mask) * fg_scale
+
+        reg_loss_dict['loss_y_bin'] = loss_y_bin
+        reg_loss_dict['loss_y_res'] = loss_y_res
+
+        loc_loss += loss_y_bin + loss_y_res
+    else:
+        y_offset_l, y_offset_r = start_offset, start_offset + 1
+        start_offset = y_offset_r
+
+        loss_y_offset = fluid.layers.smooth_l1(fluid.layers.reduce_sum(pred_reg[:, y_offset_l: y_offset_r], dim=1, keep_dim=True), y_offset_label)
+        loss_y_offset = fluid.layers.reduce_mean(loss_y_offset * fg_mask) * fg_scale
+        reg_loss_dict['loss_y_offset'] = loss_y_offset
+        loc_loss += loss_y_offset
+
+    # angle loss
+    ry_bin_l, ry_bin_r = start_offset, start_offset + num_head_bin
+    ry_res_l, ry_res_r = ry_bin_r, ry_bin_r + num_head_bin
+
+    ry_label = reg_label[:, 6:7]
+
+    if get_ry_fine:
+        # divide pi/2 into several bins
+        angle_per_class = (np.pi / 2) / num_head_bin
+
+        ry_label = ry_label % (2 * np.pi)  # 0 ~ 2pi
+        opposite_flag = fluid.layers.logical_and(ry_label > np.pi * 0.5, ry_label < np.pi * 1.5)
+        opposite_flag = fluid.layers.cast(opposite_flag, dtype=ry_label.dtype)
+        shift_angle = (ry_label + opposite_flag * np.pi + np.pi * 0.5) % (2 * np.pi)  # (0 ~ pi)
+        shift_angle.stop_gradient = True
+
+        shift_angle = fluid.layers.clip(shift_angle - np.pi * 0.25, min=1e-3, max=np.pi * 0.5 - 1e-3)  # (0, pi/2)
+
+        # bin center is (5, 10, 15, ..., 85)
+        ry_bin_label = fluid.layers.cast(shift_angle / angle_per_class, dtype='int64')
+        ry_res_label = shift_angle - (fluid.layers.cast(ry_bin_label, dtype=shift_angle.dtype) * angle_per_class + angle_per_class / 2)
+        ry_res_norm_label = ry_res_label / (angle_per_class / 2)
+
+    else:
+        # divide 2pi into several bins
+        angle_per_class = (2 * np.pi) / num_head_bin
+        heading_angle = ry_label % (2 * np.pi)  # 0 ~ 2pi
+
+        shift_angle = (heading_angle + angle_per_class / 2) % (2 * np.pi)
+        shift_angle.stop_gradient = True
+        ry_bin_label = fluid.layers.cast(shift_angle / angle_per_class, dtype='int64')
+        ry_res_label = shift_angle - (fluid.layers.cast(ry_bin_label, dtype=shift_angle.dtype) * angle_per_class + angle_per_class / 2)
+        ry_res_norm_label = ry_res_label / (angle_per_class / 2)
+
+    ry_bin_onehot = fluid.layers.one_hot(ry_bin_label, depth=num_head_bin)
+    loss_ry_bin = fluid.layers.softmax_with_cross_entropy(pred_reg[:, ry_bin_l:ry_bin_r], ry_bin_label)
+    loss_ry_bin = fluid.layers.reduce_mean(loss_ry_bin * fg_mask) * fg_scale
+    loss_ry_res = fluid.layers.smooth_l1(fluid.layers.reduce_sum(pred_reg[:, ry_res_l: ry_res_r] * ry_bin_onehot, dim=1, keep_dim=True), ry_res_norm_label)
+    loss_ry_res = fluid.layers.reduce_mean(loss_ry_res * fg_mask) * fg_scale
+
+    reg_loss_dict['loss_ry_bin'] = loss_ry_bin
+    reg_loss_dict['loss_ry_res'] = loss_ry_res
+    angle_loss = loss_ry_bin + loss_ry_res
+
+    # size loss
+    size_res_l, size_res_r = ry_res_r, ry_res_r + 3
+    assert pred_reg.shape[1] == size_res_r, '%d vs %d' % (pred_reg.shape[1], size_res_r)
+
+    anchor_size_var = fluid.layers.zeros(shape=[3], dtype=reg_label.dtype)
+    fluid.layers.assign(np.array(anchor_size).astype('float32'), anchor_size_var)
+    size_res_norm_label = (reg_label[:, 3:6] - anchor_size_var) / anchor_size_var
+    size_res_norm_label = fluid.layers.reshape(size_res_norm_label, shape=[-1, 1], inplace=True)
+    size_res_norm = pred_reg[:, size_res_l:size_res_r]
+    size_res_norm = fluid.layers.reshape(size_res_norm, shape=[-1, 1], inplace=True)
+    size_loss = fluid.layers.smooth_l1(size_res_norm, size_res_norm_label)
+    size_loss = fluid.layers.reduce_mean(fluid.layers.reshape(size_loss, [-1, 3]) * fg_mask) * fg_scale
+
+    # Total regression loss
+    reg_loss_dict['loss_loc'] = loc_loss
+    reg_loss_dict['loss_angle'] = angle_loss
+    reg_loss_dict['loss_size'] = size_loss
+
+    return loc_loss, angle_loss, size_loss, reg_loss_dict
+
--- a/PaddleCV/Paddle3D/PointRCNN/models/point_rcnn.py
+++ b/PaddleCV/Paddle3D/PointRCNN/models/point_rcnn.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+from collections import OrderedDict
+
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Constant
+
+from models.rpn import RPN
+from models.rcnn import RCNN
+
+
+__all__ = ["PointRCNN"]
+
+
+class PointRCNN(object):
+    def __init__(self, cfg, batch_size, use_xyz=True, mode='TRAIN', prog=None):
+        self.cfg = cfg
+        self.batch_size = batch_size
+        self.use_xyz = use_xyz
+        self.mode = mode
+        self.is_train = mode == 'TRAIN'
+        self.num_points = self.cfg.RPN.NUM_POINTS
+        self.prog = prog
+        self.inputs = None
+        self.pyreader = None
+
+    def build_inputs(self):
+        self.inputs = OrderedDict()
+
+        if self.cfg.RPN.ENABLED:
+            self.inputs['sample_id'] = fluid.layers.data(name='sample_id', shape=[1], dtype='int32')
+            self.inputs['pts_input'] = fluid.layers.data(name='pts_input', shape=[self.num_points, 3], dtype='float32')
+            self.inputs['pts_rect'] = fluid.layers.data(name='pts_rect', shape=[self.num_points, 3], dtype='float32')
+            self.inputs['pts_features'] = fluid.layers.data(name='pts_features', shape=[self.num_points, 1], dtype='float32')
+            self.inputs['rpn_cls_label'] = fluid.layers.data(name='rpn_cls_label', shape=[self.num_points], dtype='int32')
+            self.inputs['rpn_reg_label'] = fluid.layers.data(name='rpn_reg_label', shape=[self.num_points, 7], dtype='float32')
+            self.inputs['gt_boxes3d'] = fluid.layers.data(name='gt_boxes3d', shape=[7], lod_level=1, dtype='float32')
+
+        if self.cfg.RCNN.ENABLED:
+            if self.cfg.RCNN.ROI_SAMPLE_JIT:
+                self.inputs['sample_id'] = fluid.layers.data(name='sample_id', shape=[1], dtype='int32', append_batch_size=False)
+                self.inputs['rpn_xyz'] = fluid.layers.data(name='rpn_xyz', shape=[self.num_points, 3], dtype='float32', append_batch_size=False)
+                self.inputs['rpn_features'] = fluid.layers.data(name='rpn_features', shape=[self.num_points,128], dtype='float32', append_batch_size=False)
+                self.inputs['rpn_intensity'] = fluid.layers.data(name='rpn_intensity', shape=[self.num_points], dtype='float32', append_batch_size=False)
+                self.inputs['seg_mask'] = fluid.layers.data(name='seg_mask', shape=[self.num_points], dtype='float32', append_batch_size=False)
+                self.inputs['roi_boxes3d'] = fluid.layers.data(name='roi_boxes3d', shape=[-1, -1, 7], dtype='float32', append_batch_size=False, lod_level=0)
+                self.inputs['pts_depth'] = fluid.layers.data(name='pts_depth', shape=[self.num_points], dtype='float32', append_batch_size=False)
+                self.inputs['gt_boxes3d'] = fluid.layers.data(name='gt_boxes3d', shape=[-1, -1, 7], dtype='float32', append_batch_size=False, lod_level=0)
+            else:
+                self.inputs['sample_id'] = fluid.layers.data(name='sample_id', shape=[-1], dtype='int32', append_batch_size=False)
+                self.inputs['pts_input'] = fluid.layers.data(name='pts_input', shape=[-1,512,133], dtype='float32', append_batch_size=False)
+                self.inputs['pts_feature'] = fluid.layers.data(name='pts_feature', shape=[-1,512,128], dtype='float32', append_batch_size=False)
+                self.inputs['roi_boxes3d'] = fluid.layers.data(name='roi_boxes3d', shape=[-1,7], dtype='float32', append_batch_size=False)
+                if self.is_train:
+                    self.inputs['cls_label'] = fluid.layers.data(name='cls_label', shape=[-1], dtype='float32', append_batch_size=False)
+                    self.inputs['reg_valid_mask'] = fluid.layers.data(name='reg_valid_mask', shape=[-1], dtype='float32', append_batch_size=False)
+                    self.inputs['gt_boxes3d_ct'] = fluid.layers.data(name='gt_boxes3d_ct', shape=[-1,7], dtype='float32', append_batch_size=False)
+                    self.inputs['gt_of_rois'] = fluid.layers.data(name='gt_of_rois', shape=[-1,7], dtype='float32', append_batch_size=False)
+                else:
+                    self.inputs['roi_scores'] = fluid.layers.data(name='roi_scores', shape=[-1,], dtype='float32', append_batch_size=False)
+                    self.inputs['gt_iou'] = fluid.layers.data(name='gt_iou', shape=[-1], dtype='float32', append_batch_size=False)
+                    self.inputs['gt_boxes3d'] = fluid.layers.data(name='gt_boxes3d', shape=[-1,-1,7], dtype='float32', append_batch_size=False, lod_level=0)
+                
+
+        self.pyreader = fluid.io.PyReader(
+                feed_list=list(self.inputs.values()),
+                capacity=64,
+                use_double_buffer=True,
+                iterable=False)
+
+    def build(self):
+        self.build_inputs()
+        if self.cfg.RPN.ENABLED:
+            self.rpn = RPN(self.cfg, self.batch_size, self.use_xyz,
+                           self.mode, self.prog)
+            self.rpn.build(self.inputs)
+            self.rpn_outputs = self.rpn.get_outputs()
+            self.outputs = self.rpn_outputs
+        
+        if self.cfg.RCNN.ENABLED:
+            self.rcnn = RCNN(self.cfg, 1, self.batch_size, self.mode)
+            self.rcnn.build_model(self.inputs)
+            self.outputs = self.rcnn.get_outputs()
+        
+        if self.mode == 'TRAIN':
+            if self.cfg.RPN.ENABLED:
+                self.outputs['rpn_loss'], self.outputs['rpn_loss_cls'], \
+                        self.outputs['rpn_loss_reg'] = self.rpn.get_loss()
+            if self.cfg.RCNN.ENABLED:
+                self.outputs['rcnn_loss'], self.outputs['rcnn_loss_cls'], \
+                        self.outputs['rcnn_loss_reg'] = self.rcnn.get_loss()
+            self.outputs['loss'] = self.outputs.get('rpn_loss', 0.) \
+                                 + self.outputs.get('rcnn_loss', 0.)
+
+    def get_feeds(self):
+        return list(self.inputs.keys())
+
+    def get_outputs(self):
+        return self.outputs
+
+    def get_loss(self):
+        rpn_loss, _, _ = self.rpn.get_loss()
+        rcnn_loss, _, _ = self.rcnn.get_loss()
+        return rpn_loss + rcnn_loss
+
+    def get_pyreader(self):
+        return self.pyreader
+        
--- a/PaddleCV/Paddle3D/PointRCNN/models/pointnet2_modules.py
+++ b/PaddleCV/Paddle3D/PointRCNN/models/pointnet2_modules.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+"""
+Contains PointNet++  utility functions.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Constant
+from ext_op import *
+
+__all__ = ["conv_bn", "pointnet_sa_module", "pointnet_fp_module", "MLP"]
+
+
+def query_and_group(xyz, new_xyz, radius, nsample, features=None, use_xyz=True):
+    """
+    Perform query_ball and group_points
+
+    Args:
+        xyz (Variable): xyz coordiantes features with shape [B, N, 3]
+        new_xyz (Variable): centriods features with shape [B, npoint, 3]
+        radius (float32): radius of ball
+        nsample (int32): maximum number of gather features
+        features (Variable): features with shape [B, N, C]
+        use_xyz (bool): whether use xyz coordiantes features
+
+    Returns:
+        out (Variable): features with shape [B, npoint, nsample, C + 3]
+    """
+    idx = query_ball(xyz, new_xyz, radius, nsample)
+    idx.stop_gradient = True
+    xyz = fluid.layers.transpose(xyz,perm=[0, 2, 1])
+    grouped_xyz = group_points(xyz, idx)
+    expand_new_xyz = fluid.layers.unsqueeze(fluid.layers.transpose(new_xyz, perm=[0, 2, 1]), axes=[-1])
+    expand_new_xyz = fluid.layers.expand(expand_new_xyz, [1, 1, 1, grouped_xyz.shape[3]])
+    grouped_xyz -= expand_new_xyz
+
+    if features is not None:
+        grouped_features = group_points(features, idx)
+        return fluid.layers.concat([grouped_xyz, grouped_features], axis=1) \
+                if use_xyz else grouped_features
+    else:
+        assert use_xyz, "use_xyz should be True when features is None"
+        return grouped_xyz
+
+
+def group_all(xyz, features=None, use_xyz=True):
+    """
+    Group all xyz and features when npoint is None
+    See query_and_group
+    """
+    xyz = fluid.layers.transpose(xyz,perm=[0, 2, 1])
+    grouped_xyz = fluid.layers.unsqueeze(xyz, axes=[2])
+    if features is not None:
+        grouped_features = fluid.layers.unsqueeze(features, axes=[2])
+        return fluid.layers.concat([grouped_xyz, grouped_features], axis=1) if use_xyz else grouped_features
+    else:
+        return grouped_xyz
+
+
+def conv_bn(input, out_channels, bn=True, bn_momentum=0.95, act='relu', name=None):
+    param_attr = ParamAttr(name='{}_conv_weight'.format(name),)
+    bias_attr = ParamAttr(name='{}_conv_bias'.format(name)) \
+                                  if not bn else False
+    out = fluid.layers.conv2d(input,
+                              num_filters=out_channels,
+                              filter_size=1,
+                              stride=1,
+                              padding=0,
+                              dilation=1,
+                              param_attr=param_attr,
+                              bias_attr=bias_attr,
+			      act=act if not bn else None)
+    if bn:
+        bn_name = name + "_bn"
+        out = fluid.layers.batch_norm(out,
+                                      act=act,
+				      momentum=bn_momentum,
+                                      param_attr=ParamAttr(name=bn_name + "_scale"),
+                                      bias_attr=ParamAttr(name=bn_name + "_offset"),
+                                      moving_mean_name=bn_name + '_mean',
+                                      moving_variance_name=bn_name + '_var')
+
+    return out
+
+
+def MLP(features, out_channels_list, bn=True, bn_momentum=0.95, act='relu', name=None):
+    out = features
+    for i, out_channels in enumerate(out_channels_list):
+        out = conv_bn(out, out_channels, bn=bn, act=act, bn_momentum=bn_momentum, name=name + "_{}".format(i))
+    return out
+        
+
+def pointnet_sa_module(xyz,
+                       npoint=None,
+                       radiuss=[],
+                       nsamples=[],
+                       mlps=[],
+                       feature=None,
+                       bn=True,
+		       bn_momentum=0.95,
+                       use_xyz=True,
+                       name=None):
+    """
+    PointNet MSG(Multi-Scale Group) Set Abstraction Module.
+    Call with radiuss, nsamples, mlps as single element list for 
+    SSG(Single-Scale Group).
+
+    Args:
+        xyz (Variable): xyz coordiantes features with shape [B, N, 3]
+        radiuss ([float32]): list of radius of ball
+        nsamples ([int32]): list of maximum number of gather features
+        mlps ([[int32]]): list of out_channels_list
+        feature (Variable): features with shape [B, C, N]
+        bn (bool): whether perform batch norm after conv2d
+	bn_momentum (float): momentum of batch norm
+        use_xyz (bool): whether use xyz coordiantes features
+
+    Returns:
+        new_xyz (Variable): centriods features with shape [B, npoint, 3]
+        out (Variable): features with shape [B, npoint, \sum_i{mlps[i][-1]}]
+    """
+    assert len(radiuss) == len(nsamples) == len(mlps), \
+            "radiuss, nsamples, mlps length should be same"
+
+    farthest_idx = farthest_point_sampling(xyz, npoint)
+    farthest_idx.stop_gradient = True
+    new_xyz = gather_point(xyz, farthest_idx) if npoint is not None else None
+
+    outs = []
+    for i, (radius, nsample, mlp) in enumerate(zip(radiuss, nsamples, mlps)):
+        out = query_and_group(xyz, new_xyz, radius, nsample, feature, use_xyz) if npoint is not None else group_all(xyz, feature, use_xyz)
+        out = MLP(out, mlp, bn=bn, bn_momentum=bn_momentum, name=name + '_mlp{}'.format(i))
+        out = fluid.layers.pool2d(out, pool_size=[1, out.shape[3]], pool_type='max')
+        out = fluid.layers.squeeze(out, axes=[-1])
+        outs.append(out)
+    out = fluid.layers.concat(outs, axis=1)
+
+    return (new_xyz, out)
+
+
+def pointnet_fp_module(unknown, known, unknown_feats, known_feats, mlp, bn=True, bn_momentum=0.95, name=None):
+    """
+    PointNet Feature Propagation Module
+
+    Args:
+        unknown (Variable): unknown xyz coordiantes features with shape [B, N, 3]
+        known (Variable): known xyz coordiantes features with shape [B, M, 3]
+        unknown_feats (Variable): unknown features with shape [B, N, C1] to be propagated to
+        known_feats (Variable): known features with shape [B, M, C2] to be propagated from
+        mlp ([int32]): out_channels_list
+        bn (bool): whether perform batch norm after conv2d
+
+    Returns:
+        new_features (Variable): new features with shape [B, N, mlp[-1]]
+    """
+    if known is None:
+        raise NotImplementedError("Not implement known as None currently.")
+    else:
+        dist, idx = three_nn(unknown, known, eps=0.)
+        dist.stop_gradient = True
+        idx.stop_gradient = True
+        dist = fluid.layers.sqrt(dist)
+        ones = fluid.layers.fill_constant_batch_size_like(dist, dist.shape, dist.dtype, 1)
+        dist_recip = ones / (dist + 1e-8); # 1.0 / dist
+        norm = fluid.layers.reduce_sum(dist_recip, dim=-1, keep_dim=True)
+        weight = dist_recip / norm
+        weight.stop_gradient = True
+        interp_feats = three_interp(known_feats, weight, idx)
+
+    new_features = interp_feats if unknown_feats is None else \
+                    fluid.layers.concat([interp_feats, unknown_feats], axis=-1)
+    new_features = fluid.layers.transpose(new_features, perm=[0, 2, 1])
+    new_features = fluid.layers.unsqueeze(new_features, axes=[-1])
+    new_features = MLP(new_features, mlp, bn=bn, bn_momentum=bn_momentum, name=name + '_mlp')
+    new_features = fluid.layers.squeeze(new_features, axes=[-1])
+    new_features = fluid.layers.transpose(new_features, perm=[0, 2, 1])
+    
+    return new_features
+
--- a/PaddleCV/Paddle3D/PointRCNN/models/pointnet2_msg.py
+++ b/PaddleCV/Paddle3D/PointRCNN/models/pointnet2_msg.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+"""
+Contains PointNet++ SSG/MSG semantic segmentation models
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Constant
+from models.pointnet2_modules import *
+
+__all__ = ["PointNet2MSG"]
+
+
+class PointNet2MSG(object):
+    def __init__(self, cfg, xyz, feature=None, use_xyz=True):
+        self.cfg = cfg
+        self.xyz = xyz
+        self.feature = feature
+        self.use_xyz = use_xyz
+        self.model_config()
+
+    def model_config(self):
+        self.SA_confs = []
+        for i in range(self.cfg.RPN.SA_CONFIG.NPOINTS.__len__()):
+            self.SA_confs.append({
+                "npoint": self.cfg.RPN.SA_CONFIG.NPOINTS[i],
+                "radiuss": self.cfg.RPN.SA_CONFIG.RADIUS[i],
+                "nsamples": self.cfg.RPN.SA_CONFIG.NSAMPLE[i],
+                "mlps": self.cfg.RPN.SA_CONFIG.MLPS[i],
+                })
+
+        self.FP_confs = []
+        for i in range(self.cfg.RPN.FP_MLPS.__len__()):
+            self.FP_confs.append({"mlp": self.cfg.RPN.FP_MLPS[i]})
+
+    def build(self, bn_momentum=0.95):
+        xyzs, features = [self.xyz], [self.feature]
+        xyzi, featurei = self.xyz, self.feature
+        for i, SA_conf in enumerate(self.SA_confs):
+            xyzi, featurei = pointnet_sa_module(
+                    xyz=xyzi,
+                    feature=featurei,
+                    bn_momentum=bn_momentum,
+                    use_xyz=self.use_xyz,
+                    name="sa_{}".format(i),
+                    **SA_conf)
+            xyzs.append(xyzi)
+            features.append(fluid.layers.transpose(featurei, perm=[0, 2, 1]))
+        for i in range(-1, -(len(self.FP_confs) + 1), -1):
+            features[i - 1] = pointnet_fp_module(
+                    unknown=xyzs[i - 1],
+                    known=xyzs[i],
+                    unknown_feats=features[i - 1],
+                    known_feats=features[i],
+                    bn_momentum=bn_momentum,
+                    name="fp_{}".format(i + len(self.FP_confs)),
+                    **self.FP_confs[i])
+
+        return xyzs[0], features[0]
+
--- a/PaddleCV/Paddle3D/PointRCNN/models/rcnn.py
+++ b/PaddleCV/Paddle3D/PointRCNN/models/rcnn.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import sys
+
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Constant
+
+from models.pointnet2_modules import MLP, pointnet_sa_module, conv_bn
+from models.loss_utils import sigmoid_focal_loss , get_reg_loss
+from utils.proposal_target import get_proposal_target_func
+from utils.cyops.kitti_utils import rotate_pc_along_y
+
+__all__ = ['RCNN']
+
+
+class RCNN(object):
+    def __init__(self, cfg, num_classes, batch_size, mode='TRAIN', use_xyz=True, input_channels=0):
+        self.cfg = cfg
+        self.use_xyz = use_xyz
+        self.num_classes = num_classes
+        self.input_channels = input_channels
+        self.inputs = None
+        self.training = mode == 'TRAIN'
+        self.batch_size = batch_size
+
+    def create_tmp_var(self, name, dtype, shape):
+        return fluid.default_main_program().current_block().create_var(
+            name=name, dtype=dtype, shape=shape
+        )
+
+    def build_model(self, inputs):
+        self.inputs = inputs
+        if self.cfg.RCNN.ROI_SAMPLE_JIT:
+            if self.training:
+                proposal_target = get_proposal_target_func(self.cfg)
+                        
+                tmp_list = [
+                    self.inputs['seg_mask'],
+                    self.inputs['rpn_features'],
+                    self.inputs['gt_boxes3d'],
+                    self.inputs['rpn_xyz'],
+                    self.inputs['pts_depth'],
+                    self.inputs['roi_boxes3d'],
+                    self.inputs['rpn_intensity'],
+                ]
+                out_name = ['reg_valid_mask' ,'sampled_pts' ,'roi_boxes3d', 'gt_of_rois', 'pts_feature' ,'cls_label','gt_iou']
+                reg_valid_mask = self.create_tmp_var(name="reg_valid_mask",dtype='float32',shape=[-1,])
+                sampled_pts = self.create_tmp_var(name="sampled_pts",dtype='float32',shape=[-1, self.cfg.RCNN.NUM_POINTS, 3])
+                new_roi_boxes3d = self.create_tmp_var(name="new_roi_boxes3d",dtype='float32',shape=[-1, 7])
+                gt_of_rois = self.create_tmp_var(name="gt_of_rois", dtype='float32', shape=[-1,7])
+                pts_feature = self.create_tmp_var(name="pts_feature", dtype='float32',shape=[-1,512,130])
+                cls_label = self.create_tmp_var(name="cls_label",dtype='int64',shape=[-1])
+                gt_iou = self.create_tmp_var(name="gt_iou",dtype='float32',shape=[-1])
+                
+                out_list = [reg_valid_mask, sampled_pts, new_roi_boxes3d, gt_of_rois, pts_feature, cls_label, gt_iou]
+                out = fluid.layers.py_func(func=proposal_target,x=tmp_list,out=out_list)
+                
+                self.target_dict = {}
+                for i,item in enumerate(out):
+                    self.target_dict[out_name[i]] = item
+                
+                pts = fluid.layers.concat(input=[self.target_dict['sampled_pts'],self.target_dict['pts_feature']], axis=2)
+                self.debug = pts
+                self.target_dict['pts_input'] = pts
+            else:
+                rpn_xyz, rpn_features = inputs['rpn_xyz'], inputs['rpn_features']
+                batch_rois = inputs['roi_boxes3d']
+                rpn_intensity = inputs['rpn_intensity']
+                rpn_intensity = fluid.layers.unsqueeze(rpn_intensity,axes=[2])
+                seg_mask = fluid.layers.unsqueeze(inputs['seg_mask'],axes=[2])
+                if self.cfg.RCNN.USE_INTENSITY:
+                    pts_extra_input_list = [rpn_intensity, seg_mask]
+                else:
+                    pts_extra_input_list = [seg_mask]
+
+                if self.cfg.RCNN.USE_DEPTH:
+                    pts_depth = inputs['pts_depth'] / 70.0 -0.5
+                    pts_depth = fluid.layers.unsqueeze(pts_depth,axes=[2])
+                    pts_extra_input_list.append(pts_depth)
+                pts_extra_input = fluid.layers.concat(pts_extra_input_list, axis=2)
+                pts_feature = fluid.layers.concat([pts_extra_input, rpn_features],axis=2)
+                
+                pooled_features, pooled_empty_flag = fluid.layers.roi_pool_3d(rpn_xyz,pts_feature,batch_rois,
+                                                                              self.cfg.RCNN.POOL_EXTRA_WIDTH,
+                                                                              sampled_pt_num=self.cfg.RCNN.NUM_POINTS)
+                # canonical transformation
+                batch_size = batch_rois.shape[0]
+                roi_center = batch_rois[:, :, 0:3]
+                tmp = pooled_features[:, :, :, 0:3] - fluid.layers.unsqueeze(roi_center,axes=[2])
+                pooled_features = fluid.layers.concat(input=[tmp,pooled_features[:,:,:,3:]],axis=3)
+                concat_list = []
+                for i in range(batch_size):
+                    tmp = rotate_pc_along_y(pooled_features[i, :, :, 0:3],
+                                                        batch_rois[i, :, 6])
+                    concat = fluid.layers.concat([tmp,pooled_features[i,:,:,3:]],axis=-1)
+                    concat = fluid.layers.unsqueeze(concat,axes=[0])
+                    concat_list.append(concat)
+                pooled_features = fluid.layers.concat(concat_list,axis=0)
+                pts = fluid.layers.reshape(pooled_features,shape=[-1,pooled_features.shape[2],pooled_features.shape[3]])
+        
+        else:
+            pts = inputs['pts_input']
+            self.target_dict = {}
+            self.target_dict['pts_input'] = inputs['pts_input']
+            self.target_dict['roi_boxes3d'] = inputs['roi_boxes3d']
+        
+            if self.training:
+                self.target_dict['cls_label'] = inputs['cls_label']
+                self.target_dict['reg_valid_mask'] = inputs['reg_valid_mask']
+                self.target_dict['gt_of_rois'] = inputs['gt_boxes3d_ct']
+        
+        xyz = pts[:,:,0:3]
+        feature = fluid.layers.transpose(pts[:,:,3:], [0,2,1]) if pts.shape[-1]>3 else None
+        if self.cfg.RCNN.USE_RPN_FEATURES:
+            self.rcnn_input_channel = 3 + int(self.cfg.RCNN.USE_INTENSITY) + \
+                                      int(self.cfg.RCNN.USE_MASK) + int(self.cfg.RCNN.USE_DEPTH)
+            c_out = self.cfg.RCNN.XYZ_UP_LAYER[-1]
+
+            xyz_input = pts[:,:,:self.rcnn_input_channel]
+            xyz_input = fluid.layers.transpose(xyz_input, [0,2,1])
+            xyz_input = fluid.layers.unsqueeze(xyz_input, axes=[3])
+            
+            rpn_feature = pts[:,:,self.rcnn_input_channel:]
+            rpn_feature = fluid.layers.transpose(rpn_feature, [0,2,1])
+            rpn_feature = fluid.layers.unsqueeze(rpn_feature,axes=[3])
+
+            xyz_feature = MLP(
+                xyz_input,
+                out_channels_list=self.cfg.RCNN.XYZ_UP_LAYER,
+                bn=self.cfg.RCNN.USE_BN,
+                name="xyz_up_layer")
+            
+            merged_feature = fluid.layers.concat([xyz_feature, rpn_feature],axis=1)
+            merged_feature = MLP(
+                merged_feature,
+                out_channels_list=[c_out], 
+                bn=self.cfg.RCNN.USE_BN, 
+                name="xyz_down_layer")
+
+            xyzs = [xyz]
+            features = [fluid.layers.squeeze(merged_feature,axes=[3])]
+        else:
+            xyzs = [xyz]
+            features = [feature]
+        
+        # forward
+        xyzi, featurei = xyzs[-1], features[-1]
+        for k in range(len(self.cfg.RCNN.SA_CONFIG.NPOINTS)):
+            mlps = self.cfg.RCNN.SA_CONFIG.MLPS[k]
+            npoint = self.cfg.RCNN.SA_CONFIG.NPOINTS[k] if self.cfg.RCNN.SA_CONFIG.NPOINTS[k] != -1 else None
+            
+            xyzi, featurei = pointnet_sa_module(
+                xyz=xyzi,
+                feature = featurei,
+                bn = self.cfg.RCNN.USE_BN,
+                use_xyz = self.use_xyz,
+                name = "sa_{}".format(k),
+                npoint = npoint,
+                mlps = [mlps],
+                radiuss = [self.cfg.RCNN.SA_CONFIG.RADIUS[k]],
+                nsamples = [self.cfg.RCNN.SA_CONFIG.NSAMPLE[k]]
+            )
+            xyzs.append(xyzi)
+            features.append(featurei)
+        
+        head_in = features[-1]
+        head_in = fluid.layers.unsqueeze(head_in, axes=[2])
+        
+        cls_out = head_in
+        reg_out = cls_out
+        
+        for i in range(0, self.cfg.RCNN.CLS_FC.__len__()):
+            cls_out = conv_bn(cls_out, self.cfg.RCNN.CLS_FC[i], bn=self.cfg.RCNN.USE_BN, name='rcnn_cls_{}'.format(i))
+            if i == 0 and self.cfg.RCNN.DP_RATIO >= 0:
+                cls_out = fluid.layers.dropout(cls_out, self.cfg.RCNN.DP_RATIO, dropout_implementation="upscale_in_train")
+        cls_channel = 1 if self.num_classes == 2 else self.num_classes
+        cls_out = conv_bn(cls_out, cls_channel, act=None, name="cls_out", bn=self.cfg.RCNN.USE_BN)
+        self.cls_out = fluid.layers.squeeze(cls_out,axes=[1,3])
+       
+        per_loc_bin_num = int(self.cfg.RCNN.LOC_SCOPE / self.cfg.RCNN.LOC_BIN_SIZE) * 2
+        loc_y_bin_num = int(self.cfg.RCNN.LOC_Y_SCOPE / self.cfg.RCNN.LOC_Y_BIN_SIZE) * 2
+        reg_channel = per_loc_bin_num * 4 + self.cfg.RCNN.NUM_HEAD_BIN * 2 + 3
+        reg_channel += (1 if not self.cfg.RCNN.LOC_Y_BY_BIN else loc_y_bin_num * 2)
+        for i in range(0, self.cfg.RCNN.REG_FC.__len__()):
+            reg_out = conv_bn(reg_out, self.cfg.RCNN.REG_FC[i], bn=self.cfg.RCNN.USE_BN, name='rcnn_reg_{}'.format(i))
+            if i == 0 and self.cfg.RCNN.DP_RATIO >= 0:
+                reg_out = fluid.layers.dropout(reg_out, self.cfg.RCNN.DP_RATIO, dropout_implementation="upscale_in_train")
+
+        reg_out = conv_bn(reg_out, reg_channel, act=None, name="reg_out", bn=self.cfg.RCNN.USE_BN)
+        self.reg_out = fluid.layers.squeeze(reg_out, axes=[2,3])
+
+        
+        self.outputs = {
+            'rcnn_cls':self.cls_out,
+            'rcnn_reg':self.reg_out,
+        }
+        if self.training:
+            self.outputs.update(self.target_dict)
+        elif not self.training:
+            self.outputs['sample_id'] = inputs['sample_id']
+            self.outputs['pts_input'] = inputs['pts_input']
+            self.outputs['roi_boxes3d'] = inputs['roi_boxes3d']
+            self.outputs['roi_scores'] = inputs['roi_scores']
+            self.outputs['gt_iou'] = inputs['gt_iou'] 
+            self.outputs['gt_boxes3d'] = inputs['gt_boxes3d']
+
+            if self.cls_out.shape[1] == 1:
+                raw_scores = fluid.layers.reshape(self.cls_out, shape=[-1])
+                norm_scores = fluid.layers.sigmoid(raw_scores)
+            else:
+                norm_scores = fluid.layers.softmax(self.cls_out, axis=1)
+            self.outputs['norm_scores'] = norm_scores
+            
+    def get_outputs(self):
+        return self.outputs
+
+    def get_loss(self):
+        assert self.inputs is not None, \
+            "please call build() first"
+        rcnn_cls_label = self.outputs['cls_label']
+        reg_valid_mask = self.outputs['reg_valid_mask']
+        roi_boxes3d = self.outputs['roi_boxes3d']
+        roi_size = roi_boxes3d[:, 3:6]
+        gt_boxes3d_ct = self.outputs['gt_of_rois']
+        pts_input = self.outputs['pts_input']
+
+        rcnn_cls = self.cls_out
+        rcnn_reg = self.reg_out
+
+        # RCNN classification loss
+        assert self.cfg.RCNN.LOSS_CLS in ["SigmoidFocalLoss", "BinaryCrossEntropy"], \
+                "unsupported RCNN cls loss type {}".format(self.cfg.RCNN.LOSS_CLS)
+
+        if self.cfg.RCNN.LOSS_CLS == "SigmoidFocalLoss":
+            cls_flat = fluid.layers.reshape(self.cls_out, shape=[-1])
+            cls_label_flat = fluid.layers.reshape(rcnn_cls_label, shape=[-1])
+            cls_label_flat = fluid.layers.cast(cls_label_flat, dtype=cls_flat.dtype)
+            cls_target = fluid.layers.cast(cls_label_flat>0, dtype=cls_flat.dtype)
+            cls_label_flat.stop_gradient = True
+            pos = fluid.layers.cast(cls_label_flat > 0, dtype=cls_flat.dtype)
+            pos.stop_gradient = True
+            pos_normalizer = fluid.layers.reduce_sum(pos)
+            cls_weights = fluid.layers.cast(cls_label_flat >= 0, dtype=cls_flat.dtype)
+            cls_weights = cls_weights / fluid.layers.clip(pos_normalizer, min=1.0, max=1e10)
+            cls_weights.stop_gradient = True
+            rcnn_loss_cls = sigmoid_focal_loss(cls_flat, cls_target, cls_weights)
+            rcnn_loss_cls = fluid.layers.reduce_sum(rcnn_loss_cls)
+        else: # BinaryCrossEntropy
+            cls_label = fluid.layers.reshape(rcnn_cls_label, shape=self.cls_out.shape)
+            cls_valid_mask = fluid.layers.cast(cls_label >= 0, dtype=self.cls_out.dtype)
+            cls_label = fluid.layers.cast(cls_label, dtype=self.cls_out.dtype)
+            cls_label.stop_gradient = True
+            rcnn_loss_cls = fluid.layers.sigmoid_cross_entropy_with_logits(self.cls_out, cls_label)
+            cls_mask_normalzer = fluid.layers.reduce_sum(cls_valid_mask)
+            rcnn_loss_cls = fluid.layers.reduce_sum(rcnn_loss_cls * cls_valid_mask) \
+                                / fluid.layers.clip(cls_mask_normalzer, min=1.0, max=1e10)
+
+        # RCNN regression loss
+        reg_out = self.reg_out
+        fg_mask = fluid.layers.cast(reg_valid_mask > 0, dtype=reg_out.dtype)
+        fg_mask.stop_gradient = True
+        gt_boxes3d_ct = fluid.layers.reshape(gt_boxes3d_ct, [-1,7])
+        all_anchor_size = roi_size
+        anchor_size = all_anchor_size[fg_mask] if self.cfg.RCNN.SIZE_RES_ON_ROI else self.cfg.CLS_MEAN_SIZE[0]
+
+        loc_loss, angle_loss, size_loss, loss_dict = get_reg_loss(
+            reg_out * fg_mask,
+            gt_boxes3d_ct,
+            fg_mask,
+            point_num=float(self.batch_size*64),
+            loc_scope=self.cfg.RCNN.LOC_SCOPE,
+            loc_bin_size=self.cfg.RCNN.LOC_BIN_SIZE,
+            num_head_bin=self.cfg.RCNN.NUM_HEAD_BIN,
+            anchor_size=anchor_size,
+            get_xz_fine=True,
+            get_y_by_bin=self.cfg.RCNN.LOC_Y_BY_BIN,
+            loc_y_scope=self.cfg.RCNN.LOC_Y_SCOPE,
+            loc_y_bin_size=self.cfg.RCNN.LOC_Y_BIN_SIZE,
+            get_ry_fine=True
+        )
+        rcnn_loss_reg = loc_loss + angle_loss + size_loss * 3
+        rcnn_loss = rcnn_loss_cls + rcnn_loss_reg
+        return rcnn_loss, rcnn_loss_cls, rcnn_loss_reg
+
--- a/PaddleCV/Paddle3D/PointRCNN/models/rpn.py
+++ b/PaddleCV/Paddle3D/PointRCNN/models/rpn.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Normal, Constant
+
+from utils.proposal_utils import get_proposal_func
+from models.pointnet2_msg import PointNet2MSG
+from models.pointnet2_modules import conv_bn
+from models.loss_utils import sigmoid_focal_loss, get_reg_loss
+
+__all__ = ["RPN"]
+
+
+class RPN(object):
+    def __init__(self, cfg, batch_size, use_xyz=True, mode='TRAIN', prog=None):
+        self.cfg = cfg
+        self.batch_size = batch_size
+        self.use_xyz = use_xyz
+        self.mode = mode
+        self.is_train = mode == 'TRAIN'
+        self.inputs = None
+        self.prog = fluid.default_main_program() if prog is None else prog
+
+    def build(self, inputs):
+        assert self.cfg.RPN.BACKBONE == 'pointnet2_msg', \
+                "RPN backbone only support pointnet2_msg"
+        self.inputs = inputs
+        self.outputs = {}
+
+        xyz = inputs["pts_input"]
+        assert not self.cfg.RPN.USE_INTENSITY, \
+                "RPN.USE_INTENSITY not support now"
+        feature = None
+        msg = PointNet2MSG(self.cfg, xyz, feature, self.use_xyz)
+        backbone_xyz, backbone_feature = msg.build()
+        self.outputs['backbone_xyz'] = backbone_xyz
+        self.outputs['backbone_feature'] = backbone_feature
+
+        backbone_feature = fluid.layers.transpose(backbone_feature, perm=[0, 2, 1])
+        cls_out = fluid.layers.unsqueeze(backbone_feature, axes=[-1])
+        reg_out = cls_out
+
+        # classification branch
+        for i in range(self.cfg.RPN.CLS_FC.__len__()):
+            cls_out = conv_bn(cls_out, self.cfg.RPN.CLS_FC[i], bn=self.cfg.RPN.USE_BN, name='rpn_cls_{}'.format(i))
+            if i == 0 and self.cfg.RPN.DP_RATIO > 0:
+                cls_out = fluid.layers.dropout(cls_out, self.cfg.RPN.DP_RATIO, dropout_implementation="upscale_in_train")
+        cls_out = fluid.layers.conv2d(cls_out,
+                                      num_filters=1,
+				      filter_size=1,
+				      stride=1,
+				      padding=0,
+				      dilation=1,
+                                      param_attr=ParamAttr(name='rpn_cls_out_conv_weight'),
+                                      bias_attr=ParamAttr(name='rpn_cls_out_conv_bias',
+                                                          initializer=Constant(-np.log(99))))
+        cls_out = fluid.layers.squeeze(cls_out, axes=[1, 3])
+        self.outputs['rpn_cls'] = cls_out
+
+        # regression branch
+        per_loc_bin_num = int(self.cfg.RPN.LOC_SCOPE / self.cfg.RPN.LOC_BIN_SIZE) * 2
+        if self.cfg.RPN.LOC_XZ_FINE:
+            reg_channel = per_loc_bin_num * 4 + self.cfg.RPN.NUM_HEAD_BIN * 2 + 3
+        else:
+            reg_channel = per_loc_bin_num * 2 + self.cfg.RPN.NUM_HEAD_BIN * 2 + 3
+        reg_channel += 1  # reg y
+
+        for i in range(self.cfg.RPN.REG_FC.__len__()):
+            reg_out = conv_bn(reg_out, self.cfg.RPN.REG_FC[i], bn=self.cfg.RPN.USE_BN, name='rpn_reg_{}'.format(i))
+            if i == 0 and self.cfg.RPN.DP_RATIO > 0:
+                reg_out = fluid.layers.dropout(reg_out, self.cfg.RPN.DP_RATIO, dropout_implementation="upscale_in_train")
+        reg_out = fluid.layers.conv2d(reg_out,
+                                      num_filters=reg_channel,
+				      filter_size=1,
+				      stride=1,
+				      padding=0,
+				      dilation=1,
+                                      param_attr=ParamAttr(name='rpn_reg_out_conv_weight',
+                                                           initializer=Normal(0., 0.001),),
+                                      bias_attr=ParamAttr(name='rpn_reg_out_conv_bias'))
+        reg_out = fluid.layers.squeeze(reg_out, axes=[3])
+        reg_out = fluid.layers.transpose(reg_out, [0, 2, 1])
+        self.outputs['rpn_reg'] = reg_out
+
+        if self.mode != 'TRAIN' or self.cfg.RCNN.ENABLED:
+            rpn_scores_row = cls_out
+            rpn_scores_norm = fluid.layers.sigmoid(rpn_scores_row)
+            seg_mask = fluid.layers.cast(rpn_scores_norm > self.cfg.RPN.SCORE_THRESH, dtype='float32')
+            pts_depth = fluid.layers.sqrt(fluid.layers.reduce_sum(backbone_xyz * backbone_xyz, dim=2))
+            proposal_func = get_proposal_func(self.cfg, self.mode)
+            proposal_input = fluid.layers.concat([fluid.layers.unsqueeze(rpn_scores_row, axes=[-1]),
+                                                  backbone_xyz, reg_out], axis=-1)
+            proposal = self.prog.current_block().create_var(name='proposal',
+                                                            shape=[-1, proposal_input.shape[1], 8],
+                                                            dtype='float32')
+            fluid.layers.py_func(proposal_func, proposal_input, proposal)
+            rois, roi_scores_row = proposal[:, :, :7], proposal[:, :, -1]
+            self.outputs['rois'] = rois
+            self.outputs['roi_scores_row'] = roi_scores_row
+            self.outputs['seg_mask'] = seg_mask
+            self.outputs['pts_depth'] = pts_depth
+
+    def get_outputs(self):
+        return self.outputs
+
+    def get_loss(self):
+        assert self.inputs is not None, \
+                "please call build() first"
+        rpn_cls_label = self.inputs['rpn_cls_label']
+        rpn_reg_label = self.inputs['rpn_reg_label']
+        rpn_cls = self.outputs['rpn_cls']
+        rpn_reg = self.outputs['rpn_reg']
+
+        # RPN classification loss
+        assert self.cfg.RPN.LOSS_CLS == "SigmoidFocalLoss", \
+                "unsupported RPN cls loss type {}".format(self.cfg.RPN.LOSS_CLS)
+        cls_flat = fluid.layers.reshape(rpn_cls, shape=[-1])
+        cls_label_flat = fluid.layers.reshape(rpn_cls_label, shape=[-1])
+        cls_label_pos = fluid.layers.cast(cls_label_flat > 0, dtype=cls_flat.dtype)
+        pos_normalizer = fluid.layers.reduce_sum(cls_label_pos)
+        cls_weights = fluid.layers.cast(cls_label_flat >= 0, dtype=cls_flat.dtype)
+        cls_weights = cls_weights / fluid.layers.clip(pos_normalizer, min=1.0, max=1e10)
+        cls_weights.stop_gradient = True
+        cls_label_flat = fluid.layers.cast(cls_label_flat, dtype=cls_flat.dtype)
+        cls_label_flat.stop_gradient = True
+        rpn_loss_cls = sigmoid_focal_loss(cls_flat, cls_label_pos, cls_weights)
+        rpn_loss_cls = fluid.layers.reduce_sum(rpn_loss_cls)
+
+        # RPN regression loss
+        rpn_reg = fluid.layers.reshape(rpn_reg, [-1, rpn_reg.shape[-1]])
+        reg_label = fluid.layers.reshape(rpn_reg_label, [-1, rpn_reg_label.shape[-1]])
+        fg_mask = fluid.layers.cast(cls_label_flat > 0, dtype=rpn_reg.dtype)
+        fg_mask.stop_gradient = True
+        loc_loss, angle_loss, size_loss, loss_dict = get_reg_loss(
+                                        rpn_reg * fg_mask, reg_label, fg_mask,
+                                        float(self.batch_size * self.cfg.RPN.NUM_POINTS),
+                                        loc_scope=self.cfg.RPN.LOC_SCOPE,
+                                        loc_bin_size=self.cfg.RPN.LOC_BIN_SIZE,
+                                        num_head_bin=self.cfg.RPN.NUM_HEAD_BIN,
+                                        anchor_size=self.cfg.CLS_MEAN_SIZE[0],
+                                        get_xz_fine=self.cfg.RPN.LOC_XZ_FINE,
+                                        get_y_by_bin=False,
+                                        get_ry_fine=False)
+        rpn_loss_reg = loc_loss + angle_loss + size_loss * 3
+
+        self.rpn_loss = rpn_loss_cls * self.cfg.RPN.LOSS_WEIGHT[0] + rpn_loss_reg * self.cfg.RPN.LOSS_WEIGHT[1]
+        return self.rpn_loss, rpn_loss_cls, rpn_loss_reg
+        
--- a/PaddleCV/Paddle3D/PointRCNN/requirement.txt
+++ b/PaddleCV/Paddle3D/PointRCNN/requirement.txt
+Cython
+opencv-python
+shapely
+scikit-image
+Numba
+fire
--- a/PaddleCV/Paddle3D/PointRCNN/tools/generate_aug_scene.py
+++ b/PaddleCV/Paddle3D/PointRCNN/tools/generate_aug_scene.py
+"""
+Generate GT database
+This code is based on https://github.com/sshaoshuai/PointRCNN/blob/master/tools/generate_aug_scene.py
+"""
+
+import os
+import numpy as np
+import pickle
+
+import pts_utils
+import utils.cyops.kitti_utils as kitti_utils 
+from utils.box_utils import boxes_iou3d
+from utils import calibration as calib
+from data.kitti_dataset import KittiDataset
+import argparse
+
+np.random.seed(1024)
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--mode', type=str, default='generator')
+parser.add_argument('--class_name', type=str, default='Car')
+parser.add_argument('--data_dir', type=str, default='./data')
+parser.add_argument('--save_dir', type=str, default='./data/KITTI/aug_scene/training')
+parser.add_argument('--split', type=str, default='train')
+parser.add_argument('--gt_database_dir', type=str, default='./data/gt_database/train_gt_database_3level_Car.pkl')
+parser.add_argument('--include_similar', action='store_true', default=False)
+parser.add_argument('--aug_times', type=int, default=4)
+args = parser.parse_args()
+
+PC_REDUCE_BY_RANGE = True
+if args.class_name == 'Car':
+    PC_AREA_SCOPE = np.array([[-40, 40], [-1, 3], [0, 70.4]])  # x, y, z scope in rect camera coords
+else:
+    PC_AREA_SCOPE = np.array([[-30, 30], [-1, 3], [0, 50]])
+
+
+def log_print(info, fp=None):
+    print(info)
+    if fp is not None:
+        # print(info, file=fp)
+        fp.write(info+"\n")
+
+
+def save_kitti_format(calib, bbox3d, obj_list, img_shape, save_fp):
+    corners3d = kitti_utils.boxes3d_to_corners3d(bbox3d)
+    img_boxes, _ = calib.corners3d_to_img_boxes(corners3d)
+
+    img_boxes[:, 0] = np.clip(img_boxes[:, 0], 0, img_shape[1] - 1)
+    img_boxes[:, 1] = np.clip(img_boxes[:, 1], 0, img_shape[0] - 1)
+    img_boxes[:, 2] = np.clip(img_boxes[:, 2], 0, img_shape[1] - 1)
+    img_boxes[:, 3] = np.clip(img_boxes[:, 3], 0, img_shape[0] - 1)
+
+    # Discard boxes that are larger than 80% of the image width OR height
+    img_boxes_w = img_boxes[:, 2] - img_boxes[:, 0]
+    img_boxes_h = img_boxes[:, 3] - img_boxes[:, 1]
+    box_valid_mask = np.logical_and(img_boxes_w < img_shape[1] * 0.8, img_boxes_h < img_shape[0] * 0.8)
+
+    for k in range(bbox3d.shape[0]):
+        if box_valid_mask[k] == 0:
+            continue
+        x, z, ry = bbox3d[k, 0], bbox3d[k, 2], bbox3d[k, 6]
+        beta = np.arctan2(z, x)
+        alpha = -np.sign(beta) * np.pi / 2 + beta + ry
+
+        save_fp.write('%s %.2f %d %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f\n' %
+              (args.class_name, obj_list[k].trucation, int(obj_list[k].occlusion), alpha, img_boxes[k, 0], img_boxes[k, 1],
+               img_boxes[k, 2], img_boxes[k, 3],
+               bbox3d[k, 3], bbox3d[k, 4], bbox3d[k, 5], bbox3d[k, 0], bbox3d[k, 1], bbox3d[k, 2],
+               bbox3d[k, 6]))
+
+
+class AugSceneGenerator(KittiDataset):
+    def __init__(self, root_dir, gt_database=None, split='train', classes=args.class_name):
+        super(AugSceneGenerator, self).__init__(root_dir, split=split)
+        self.gt_database = None
+        if classes == 'Car':
+            self.classes = ('Background', 'Car')
+        elif classes == 'People':
+            self.classes = ('Background', 'Pedestrian', 'Cyclist')
+        elif classes == 'Pedestrian':
+            self.classes = ('Background', 'Pedestrian')
+        elif classes == 'Cyclist':
+            self.classes = ('Background', 'Cyclist')
+        else:
+            assert False, "Invalid classes: %s" % classes
+
+        self.gt_database = gt_database
+
+    def __len__(self):
+        raise NotImplementedError
+
+    def __getitem__(self, item):
+        raise NotImplementedError
+
+    def filtrate_dc_objects(self, obj_list):
+        valid_obj_list = []
+        for obj in obj_list:
+            if obj.cls_type in ['DontCare']:
+                continue
+            valid_obj_list.append(obj)
+
+        return valid_obj_list
+
+    def filtrate_objects(self, obj_list):
+        valid_obj_list = []
+        type_whitelist = self.classes
+        if args.include_similar:
+            type_whitelist = list(self.classes)
+            if 'Car' in self.classes:
+                type_whitelist.append('Van')
+            if 'Pedestrian' in self.classes or 'Cyclist' in self.classes:
+                type_whitelist.append('Person_sitting')
+
+        for obj in obj_list:
+            if obj.cls_type in type_whitelist:
+                valid_obj_list.append(obj)
+        return valid_obj_list
+
+    @staticmethod
+    def get_valid_flag(pts_rect, pts_img, pts_rect_depth, img_shape):
+        """
+        Valid point should be in the image (and in the PC_AREA_SCOPE)
+        :param pts_rect:
+        :param pts_img:
+        :param pts_rect_depth:
+        :param img_shape:
+        :return:
+        """
+        val_flag_1 = np.logical_and(pts_img[:, 0] >= 0, pts_img[:, 0] < img_shape[1])
+        val_flag_2 = np.logical_and(pts_img[:, 1] >= 0, pts_img[:, 1] < img_shape[0])
+        val_flag_merge = np.logical_and(val_flag_1, val_flag_2)
+        pts_valid_flag = np.logical_and(val_flag_merge, pts_rect_depth >= 0)
+
+        if PC_REDUCE_BY_RANGE:
+            x_range, y_range, z_range = PC_AREA_SCOPE
+            pts_x, pts_y, pts_z = pts_rect[:, 0], pts_rect[:, 1], pts_rect[:, 2]
+            range_flag = (pts_x >= x_range[0]) & (pts_x <= x_range[1]) \
+                         & (pts_y >= y_range[0]) & (pts_y <= y_range[1]) \
+                         & (pts_z >= z_range[0]) & (pts_z <= z_range[1])
+            pts_valid_flag = pts_valid_flag & range_flag
+        return pts_valid_flag
+
+    @staticmethod
+    def check_pc_range(xyz):
+        """
+        :param xyz: [x, y, z]
+        :return:
+        """
+        x_range, y_range, z_range = PC_AREA_SCOPE
+        if (x_range[0] <= xyz[0] <= x_range[1]) and (y_range[0] <= xyz[1] <= y_range[1]) and \
+                (z_range[0] <= xyz[2] <= z_range[1]):
+            return True
+        return False
+
+    def aug_one_scene(self, sample_id, pts_rect, pts_intensity, all_gt_boxes3d):
+        """
+        :param pts_rect: (N, 3)
+        :param gt_boxes3d: (M1, 7)
+        :param all_gt_boxex3d: (M2, 7)
+        :return:
+        """
+        assert self.gt_database is not None
+        extra_gt_num = np.random.randint(10, 15)
+        try_times = 50
+        cnt = 0
+        cur_gt_boxes3d = all_gt_boxes3d.copy()
+        cur_gt_boxes3d[:, 4] += 0.5
+        cur_gt_boxes3d[:, 5] += 0.5  # enlarge new added box to avoid too nearby boxes
+
+        extra_gt_obj_list = []
+        extra_gt_boxes3d_list = []
+        new_pts_list, new_pts_intensity_list = [], []
+        src_pts_flag = np.ones(pts_rect.shape[0], dtype=np.int32)
+
+        road_plane = self.get_road_plane(sample_id)
+        a, b, c, d = road_plane
+
+        while try_times > 0:
+            try_times -= 1
+
+            rand_idx = np.random.randint(0, self.gt_database.__len__() - 1)
+
+            new_gt_dict = self.gt_database[rand_idx]
+            new_gt_box3d = new_gt_dict['gt_box3d'].copy()
+            new_gt_points = new_gt_dict['points'].copy()
+            new_gt_intensity = new_gt_dict['intensity'].copy()
+            new_gt_obj = new_gt_dict['obj']
+            center = new_gt_box3d[0:3]
+            if PC_REDUCE_BY_RANGE and (self.check_pc_range(center) is False):
+                continue
+            if cnt > extra_gt_num:
+                break
+            if new_gt_points.__len__() < 5:  # too few points
+                continue
+
+            # put it on the road plane
+            cur_height = (-d - a * center[0] - c * center[2]) / b
+            move_height = new_gt_box3d[1] - cur_height
+            new_gt_box3d[1] -= move_height
+            new_gt_points[:, 1] -= move_height
+
+            cnt += 1
+
+            iou3d = boxes_iou3d(new_gt_box3d.reshape(1, 7), cur_gt_boxes3d)
+
+            valid_flag = iou3d.max() < 1e-8
+            if not valid_flag:
+                continue
+
+            enlarged_box3d = new_gt_box3d.copy()
+            enlarged_box3d[3] += 2  # remove the points above and below the object
+            boxes_pts_mask_list = pts_utils.pts_in_boxes3d(pts_rect, enlarged_box3d.reshape(1, 7))
+            pt_mask_flag = (boxes_pts_mask_list[0] == 1)
+            src_pts_flag[pt_mask_flag] = 0  # remove the original points which are inside the new box
+
+            new_pts_list.append(new_gt_points)
+            new_pts_intensity_list.append(new_gt_intensity)
+            enlarged_box3d = new_gt_box3d.copy()
+            enlarged_box3d[4] += 0.5
+            enlarged_box3d[5] += 0.5  # enlarge new added box to avoid too nearby boxes
+            cur_gt_boxes3d = np.concatenate((cur_gt_boxes3d, enlarged_box3d.reshape(1, 7)), axis=0)
+            extra_gt_boxes3d_list.append(new_gt_box3d.reshape(1, 7))
+            extra_gt_obj_list.append(new_gt_obj)
+
+        if new_pts_list.__len__() == 0:
+            return False, pts_rect, pts_intensity, None, None
+
+        extra_gt_boxes3d = np.concatenate(extra_gt_boxes3d_list, axis=0)
+        # remove original points and add new points
+        pts_rect = pts_rect[src_pts_flag == 1]
+        pts_intensity = pts_intensity[src_pts_flag == 1]
+        new_pts_rect = np.concatenate(new_pts_list, axis=0)
+        new_pts_intensity = np.concatenate(new_pts_intensity_list, axis=0)
+        pts_rect = np.concatenate((pts_rect, new_pts_rect), axis=0)
+        pts_intensity = np.concatenate((pts_intensity, new_pts_intensity), axis=0)
+
+        return True, pts_rect, pts_intensity, extra_gt_boxes3d, extra_gt_obj_list
+
+    def aug_one_epoch_scene(self, base_id, data_save_dir, label_save_dir, split_list, log_fp=None):
+        for idx, sample_id in enumerate(self.image_idx_list):
+            sample_id = int(sample_id)
+            print('process gt sample (%s, id=%06d)' % (args.split, sample_id))
+
+            pts_lidar = self.get_lidar(sample_id)
+            calib = self.get_calib(sample_id)
+            pts_rect = calib.lidar_to_rect(pts_lidar[:, 0:3])
+            pts_img, pts_rect_depth = calib.rect_to_img(pts_rect)
+            img_shape = self.get_image_shape(sample_id)
+
+            pts_valid_flag = self.get_valid_flag(pts_rect, pts_img, pts_rect_depth, img_shape)
+            pts_rect = pts_rect[pts_valid_flag][:, 0:3]
+            pts_intensity = pts_lidar[pts_valid_flag][:, 3]
+
+            # all labels for checking overlapping
+            all_obj_list = self.filtrate_dc_objects(self.get_label(sample_id))
+            all_gt_boxes3d = np.zeros((all_obj_list.__len__(), 7), dtype=np.float32)
+            for k, obj in enumerate(all_obj_list):
+                all_gt_boxes3d[k, 0:3], all_gt_boxes3d[k, 3], all_gt_boxes3d[k, 4], all_gt_boxes3d[k, 5], \
+                all_gt_boxes3d[k, 6] = obj.pos, obj.h, obj.w, obj.l, obj.ry
+
+            # gt_boxes3d of current label
+            obj_list = self.filtrate_objects(self.get_label(sample_id))
+            if args.class_name != 'Car' and obj_list.__len__() == 0:
+                continue
+
+            # augment one scene
+            aug_flag, pts_rect, pts_intensity, extra_gt_boxes3d, extra_gt_obj_list = \
+                self.aug_one_scene(sample_id, pts_rect, pts_intensity, all_gt_boxes3d)
+
+            # save augment result to file
+            pts_info = np.concatenate((pts_rect, pts_intensity.reshape(-1, 1)), axis=1)
+            bin_file = os.path.join(data_save_dir, '%06d.bin' % (base_id + sample_id))
+            pts_info.astype(np.float32).tofile(bin_file)
+
+            # save filtered original gt_boxes3d
+            label_save_file = os.path.join(label_save_dir, '%06d.txt' % (base_id + sample_id))
+            with open(label_save_file, 'w') as f:
+                for obj in obj_list:
+                    f.write(obj.to_kitti_format() + '\n')
+
+                if aug_flag:
+                    # augment successfully
+                    save_kitti_format(calib, extra_gt_boxes3d, extra_gt_obj_list, img_shape=img_shape, save_fp=f)
+                else:
+                    extra_gt_boxes3d = np.zeros((0, 7), dtype=np.float32)
+            log_print('Save to file (new_obj: %s): %s' % (extra_gt_boxes3d.__len__(), label_save_file), fp=log_fp)
+            split_list.append('%06d' % (base_id + sample_id))
+
+    def generate_aug_scene(self, aug_times, log_fp=None):
+        data_save_dir = os.path.join(args.save_dir, 'rectified_data')
+        label_save_dir = os.path.join(args.save_dir, 'aug_label')
+        if not os.path.isdir(data_save_dir):
+            os.makedirs(data_save_dir)
+        if not os.path.isdir(label_save_dir):
+            os.makedirs(label_save_dir)
+
+        split_file = os.path.join(args.save_dir, '%s_aug.txt' % args.split)
+        split_list = self.image_idx_list[:]
+        for epoch in range(aug_times):
+            base_id = (epoch + 1) * 10000
+            self.aug_one_epoch_scene(base_id, data_save_dir, label_save_dir, split_list, log_fp=log_fp)
+
+        with open(split_file, 'w') as f:
+            for idx, sample_id in enumerate(split_list):
+                f.write(str(sample_id) + '\n')
+        log_print('Save split file to %s' % split_file, fp=log_fp)
+        target_dir = os.path.join(args.data_dir, 'KITTI/ImageSets/')
+        os.system('cp %s %s' % (split_file, target_dir))
+        log_print('Copy split file from %s to %s' % (split_file, target_dir), fp=log_fp)
+
+
+if __name__ == '__main__':
+    if not os.path.isdir(args.save_dir):
+        os.makedirs(args.save_dir)
+    info_file = os.path.join(args.save_dir, 'log_info.txt')
+
+    if args.mode == 'generator':
+        log_fp = open(info_file, 'w')
+
+        gt_database = pickle.load(open(args.gt_database_dir, 'rb'))
+        log_print('Loading gt_database(%d) from %s' % (gt_database.__len__(), args.gt_database_dir), fp=log_fp)
+
+        dataset = AugSceneGenerator(root_dir=args.data_dir, gt_database=gt_database, split=args.split)
+        dataset.generate_aug_scene(aug_times=args.aug_times, log_fp=log_fp)
+
+        log_fp.close()
+
+    else:
+        pass
+
--- a/PaddleCV/Paddle3D/PointRCNN/tools/generate_gt_database.py
+++ b/PaddleCV/Paddle3D/PointRCNN/tools/generate_gt_database.py
+"""
+Generate GT database
+This code is based on https://github.com/sshaoshuai/PointRCNN/blob/master/tools/generate_gt_database.py
+"""
+
+import os
+import numpy as np
+import pickle
+
+from data.kitti_dataset import KittiDataset
+import pts_utils 
+import argparse
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--data_dir', type=str, default='./data')
+parser.add_argument('--save_dir', type=str, default='./data/gt_database')
+parser.add_argument('--class_name', type=str, default='Car')
+parser.add_argument('--split', type=str, default='train')
+args = parser.parse_args()
+
+
+class GTDatabaseGenerator(KittiDataset):
+    def __init__(self, root_dir, split='train', classes=args.class_name):
+        super(GTDatabaseGenerator, self).__init__(root_dir, split=split)
+        self.gt_database = None
+        if classes == 'Car':
+            self.classes = ('Background', 'Car')
+        elif classes == 'People':
+            self.classes = ('Background', 'Pedestrian', 'Cyclist')
+        elif classes == 'Pedestrian':
+            self.classes = ('Background', 'Pedestrian')
+        elif classes == 'Cyclist':
+            self.classes = ('Background', 'Cyclist')
+        else:
+            assert False, "Invalid classes: %s" % classes
+
+    def __len__(self):
+        raise NotImplementedError
+
+    def __getitem__(self, item):
+        raise NotImplementedError
+
+    def filtrate_objects(self, obj_list):
+        valid_obj_list = []
+        for obj in obj_list:
+            if obj.cls_type not in self.classes:
+                continue
+            if obj.level_str not in ['Easy', 'Moderate', 'Hard']:
+                continue
+            valid_obj_list.append(obj)
+
+        return valid_obj_list
+
+    def generate_gt_database(self):
+        gt_database = []
+        for idx, sample_id in enumerate(self.image_idx_list):
+            sample_id = int(sample_id)
+            print('process gt sample (id=%06d)' % sample_id)
+
+            pts_lidar = self.get_lidar(sample_id)
+            calib = self.get_calib(sample_id)
+            pts_rect = calib.lidar_to_rect(pts_lidar[:, 0:3])
+            pts_intensity = pts_lidar[:, 3]
+
+            obj_list = self.filtrate_objects(self.get_label(sample_id))
+
+            gt_boxes3d = np.zeros((obj_list.__len__(), 7), dtype=np.float32)
+            for k, obj in enumerate(obj_list):
+                gt_boxes3d[k, 0:3], gt_boxes3d[k, 3], gt_boxes3d[k, 4], gt_boxes3d[k, 5], gt_boxes3d[k, 6] \
+                    = obj.pos, obj.h, obj.w, obj.l, obj.ry
+
+            if gt_boxes3d.__len__() == 0:
+                print('No gt object')
+                continue
+
+            boxes_pts_mask_list = pts_utils.pts_in_boxes3d(pts_rect, gt_boxes3d)
+
+            for k in range(boxes_pts_mask_list.shape[0]):
+                pt_mask_flag = (boxes_pts_mask_list[k] == 1)
+                cur_pts = pts_rect[pt_mask_flag].astype(np.float32)
+                cur_pts_intensity = pts_intensity[pt_mask_flag].astype(np.float32)
+                sample_dict = {'sample_id': sample_id,
+                               'cls_type': obj_list[k].cls_type,
+                               'gt_box3d': gt_boxes3d[k],
+                               'points': cur_pts,
+                               'intensity': cur_pts_intensity,
+                               'obj': obj_list[k]}
+                gt_database.append(sample_dict)
+
+        save_file_name = os.path.join(args.save_dir, '%s_gt_database_3level_%s.pkl' % (args.split, self.classes[-1]))
+        with open(save_file_name, 'wb') as f:
+            pickle.dump(gt_database, f)
+
+        self.gt_database = gt_database
+        print('Save refine training sample info file to %s' % save_file_name)
+
+
+if __name__ == '__main__':
+    dataset = GTDatabaseGenerator(root_dir=args.data_dir, split=args.split)
+    if not os.path.isdir(args.save_dir):
+        os.makedirs(args.save_dir)
+
+    dataset.generate_gt_database()
+
--- a/PaddleCV/Paddle3D/PointRCNN/tools/kitti_eval.py
+++ b/PaddleCV/Paddle3D/PointRCNN/tools/kitti_eval.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os 
+import sys
+import argparse
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        "KITTI mAP evaluation script")
+    parser.add_argument(
+        '--result_dir',
+        type=str,
+        default='./result_dir',
+        help='detection result directory to evaluate')
+    parser.add_argument(
+        '--data_dir',
+        type=str,
+        default='./data',
+        help='KITTI dataset root directory')
+    parser.add_argument(
+        '--split',
+        type=str,
+        default='val',
+        help='evaluation split, default val')
+    parser.add_argument(
+        '--class_name',
+        type=str,
+        default='Car',
+        help='evaluation class name, default Car')
+    args = parser.parse_args()
+    return args
+
+
+def kitti_eval():
+    if float(sys.version[:3]) < 3.6:
+        print("KITTI mAP evaluation can only run with python3.6+")
+        sys.exit(1)
+
+    args = parse_args()
+
+    label_dir = os.path.join(args.data_dir, 'KITTI/object/training', 'label_2')
+    split_file = os.path.join(args.data_dir, 'KITTI/ImageSets',
+                              '{}.txt'.format(args.split))
+    final_output_dir = os.path.join(args.result_dir, 'final_result', 'data')
+    name_to_class = {'Car': 0, 'Pedestrian': 1, 'Cyclist': 2}
+
+    from tools.kitti_object_eval_python.evaluate import evaluate as kitti_evaluate 
+    ap_result_str, ap_dict = kitti_evaluate(
+        label_dir, final_output_dir, label_split_file=split_file,
+         current_class=name_to_class[args.class_name])
+
+    print("KITTI evaluate: ", ap_result_str, ap_dict)
+
+
+if __name__ == "__main__":
+    kitti_eval()
+
+
--- a/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/LICENSE
+++ b/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/LICENSE
+MIT License
+
+Copyright (c) 2018 
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/README.md
+++ b/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/README.md
+# kitti-object-eval-python
+**NOTE**: This is borrowed from [traveller59/kitti-object-eval-python](https://github.com/traveller59/kitti-object-eval-python)
+
+Fast kitti object detection eval in python(finish eval in less than 10 second), support 2d/bev/3d/aos. , support coco-style AP. If you use command line interface, numba need some time to compile jit functions.
+## Dependencies
+Only support python 3.6+, need `numpy`, `skimage`, `numba`, `fire`. If you have Anaconda, just install `cudatoolkit` in anaconda. Otherwise, please reference to this [page](https://github.com/numba/numba#custom-python-environments) to set up llvm and cuda for numba.
+* Install by conda:
+```
+conda install -c numba cudatoolkit=x.x  (8.0, 9.0, 9.1, depend on your environment) 
+```
+## Usage
+* commandline interface:
+```
+python evaluate.py evaluate --label_path=/path/to/your_gt_label_folder --result_path=/path/to/your_result_folder --label_split_file=/path/to/val.txt --current_class=0 --coco=False
+```
+* python interface:
+```Python
+import kitti_common as kitti
+from eval import get_official_eval_result, get_coco_eval_result
+def _read_imageset_file(path):
+    with open(path, 'r') as f:
+        lines = f.readlines()
+    return [int(line) for line in lines]
+det_path = "/path/to/your_result_folder"
+dt_annos = kitti.get_label_annos(det_path)
+gt_path = "/path/to/your_gt_label_folder"
+gt_split_file = "/path/to/val.txt" # from https://xiaozhichen.github.io/files/mv3d/imagesets.tar.gz
+val_image_ids = _read_imageset_file(gt_split_file)
+gt_annos = kitti.get_label_annos(gt_path, val_image_ids)
+print(get_official_eval_result(gt_annos, dt_annos, 0)) # 6s in my computer
+print(get_coco_eval_result(gt_annos, dt_annos, 0)) # 18s in my computer
+```
--- a/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/eval.py
+++ b/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/eval.py
+import numpy as np
+import numba
+import io as sysio
+from tools.kitti_object_eval_python.rotate_iou import rotate_iou_gpu_eval
+
+
+@numba.jit
+def get_thresholds(scores: np.ndarray, num_gt, num_sample_pts=41):
+    scores.sort()
+    scores = scores[::-1]
+    current_recall = 0
+    thresholds = []
+    for i, score in enumerate(scores):
+        l_recall = (i + 1) / num_gt
+        if i < (len(scores) - 1):
+            r_recall = (i + 2) / num_gt
+        else:
+            r_recall = l_recall
+        if (((r_recall - current_recall) < (current_recall - l_recall))
+                and (i < (len(scores) - 1))):
+            continue
+        # recall = l_recall
+        thresholds.append(score)
+        current_recall += 1 / (num_sample_pts - 1.0)
+    return thresholds
+
+
+def clean_data(gt_anno, dt_anno, current_class, difficulty):
+    CLASS_NAMES = ['car', 'pedestrian', 'cyclist']
+    MIN_HEIGHT = [40, 25, 25]
+    MAX_OCCLUSION = [0, 1, 2]
+    MAX_TRUNCATION = [0.15, 0.3, 0.5]
+    dc_bboxes, ignored_gt, ignored_dt = [], [], []
+    current_cls_name = CLASS_NAMES[current_class].lower()
+    num_gt = len(gt_anno["name"])
+    num_dt = len(dt_anno["name"])
+    num_valid_gt = 0
+    for i in range(num_gt):
+        bbox = gt_anno["bbox"][i]
+        gt_name = gt_anno["name"][i].lower()
+        height = bbox[3] - bbox[1]
+        valid_class = -1
+        if (gt_name == current_cls_name):
+            valid_class = 1
+        elif (current_cls_name == "Pedestrian".lower()
+              and "Person_sitting".lower() == gt_name):
+            valid_class = 0
+        elif (current_cls_name == "Car".lower() and "Van".lower() == gt_name):
+            valid_class = 0
+        else:
+            valid_class = -1
+        ignore = False
+        if ((gt_anno["occluded"][i] > MAX_OCCLUSION[difficulty])
+                or (gt_anno["truncated"][i] > MAX_TRUNCATION[difficulty])
+                or (height <= MIN_HEIGHT[difficulty])):
+            # if gt_anno["difficulty"][i] > difficulty or gt_anno["difficulty"][i] == -1:
+            ignore = True
+        if valid_class == 1 and not ignore:
+            ignored_gt.append(0)
+            num_valid_gt += 1
+        elif (valid_class == 0 or (ignore and (valid_class == 1))):
+            ignored_gt.append(1)
+        else:
+            ignored_gt.append(-1)
+    # for i in range(num_gt):
+        if gt_anno["name"][i] == "DontCare":
+            dc_bboxes.append(gt_anno["bbox"][i])
+    for i in range(num_dt):
+        if (dt_anno["name"][i].lower() == current_cls_name):
+            valid_class = 1
+        else:
+            valid_class = -1
+        height = abs(dt_anno["bbox"][i, 3] - dt_anno["bbox"][i, 1])
+        if height < MIN_HEIGHT[difficulty]:
+            ignored_dt.append(1)
+        elif valid_class == 1:
+            ignored_dt.append(0)
+        else:
+            ignored_dt.append(-1)
+
+    return num_valid_gt, ignored_gt, ignored_dt, dc_bboxes
+
+
+@numba.jit(nopython=True)
+def image_box_overlap(boxes, query_boxes, criterion=-1):
+    N = boxes.shape[0]
+    K = query_boxes.shape[0]
+    overlaps = np.zeros((N, K), dtype=boxes.dtype)
+    for k in range(K):
+        qbox_area = ((query_boxes[k, 2] - query_boxes[k, 0]) *
+                     (query_boxes[k, 3] - query_boxes[k, 1]))
+        for n in range(N):
+            iw = (min(boxes[n, 2], query_boxes[k, 2]) -
+                  max(boxes[n, 0], query_boxes[k, 0]))
+            if iw > 0:
+                ih = (min(boxes[n, 3], query_boxes[k, 3]) -
+                      max(boxes[n, 1], query_boxes[k, 1]))
+                if ih > 0:
+                    if criterion == -1:
+                        ua = (
+                            (boxes[n, 2] - boxes[n, 0]) *
+                            (boxes[n, 3] - boxes[n, 1]) + qbox_area - iw * ih)
+                    elif criterion == 0:
+                        ua = ((boxes[n, 2] - boxes[n, 0]) *
+                              (boxes[n, 3] - boxes[n, 1]))
+                    elif criterion == 1:
+                        ua = qbox_area
+                    else:
+                        ua = 1.0
+                    overlaps[n, k] = iw * ih / ua
+    return overlaps
+
+
+def bev_box_overlap(boxes, qboxes, criterion=-1):
+    riou = rotate_iou_gpu_eval(boxes, qboxes, criterion)
+    return riou
+
+
+@numba.jit(nopython=True, parallel=True)
+def d3_box_overlap_kernel(boxes, qboxes, rinc, criterion=-1):
+    # ONLY support overlap in CAMERA, not lider.
+    N, K = boxes.shape[0], qboxes.shape[0]
+    for i in range(N):
+        for j in range(K):
+            if rinc[i, j] > 0:
+                # iw = (min(boxes[i, 1] + boxes[i, 4], qboxes[j, 1] +
+                #         qboxes[j, 4]) - max(boxes[i, 1], qboxes[j, 1]))
+                iw = (min(boxes[i, 1], qboxes[j, 1]) - max(
+                    boxes[i, 1] - boxes[i, 4], qboxes[j, 1] - qboxes[j, 4]))
+
+                if iw > 0:
+                    area1 = boxes[i, 3] * boxes[i, 4] * boxes[i, 5]
+                    area2 = qboxes[j, 3] * qboxes[j, 4] * qboxes[j, 5]
+                    inc = iw * rinc[i, j]
+                    if criterion == -1:
+                        ua = (area1 + area2 - inc)
+                    elif criterion == 0:
+                        ua = area1
+                    elif criterion == 1:
+                        ua = area2
+                    else:
+                        ua = inc
+                    rinc[i, j] = inc / ua
+                else:
+                    rinc[i, j] = 0.0
+
+
+def d3_box_overlap(boxes, qboxes, criterion=-1):
+    rinc = rotate_iou_gpu_eval(boxes[:, [0, 2, 3, 5, 6]],
+                               qboxes[:, [0, 2, 3, 5, 6]], 2)
+    d3_box_overlap_kernel(boxes, qboxes, rinc, criterion)
+    return rinc
+
+
+@numba.jit(nopython=True)
+def compute_statistics_jit(overlaps,
+                           gt_datas,
+                           dt_datas,
+                           ignored_gt,
+                           ignored_det,
+                           dc_bboxes,
+                           metric,
+                           min_overlap,
+                           thresh=0,
+                           compute_fp=False,
+                           compute_aos=False):
+
+    det_size = dt_datas.shape[0]
+    gt_size = gt_datas.shape[0]
+    dt_scores = dt_datas[:, -1]
+    dt_alphas = dt_datas[:, 4]
+    gt_alphas = gt_datas[:, 4]
+    dt_bboxes = dt_datas[:, :4]
+    gt_bboxes = gt_datas[:, :4]
+
+    assigned_detection = [False] * det_size
+    ignored_threshold = [False] * det_size
+    if compute_fp:
+        for i in range(det_size):
+            if (dt_scores[i] < thresh):
+                ignored_threshold[i] = True
+    NO_DETECTION = -10000000
+    tp, fp, fn, similarity = 0, 0, 0, 0
+    # thresholds = [0.0]
+    # delta = [0.0]
+    thresholds = np.zeros((gt_size, ))
+    thresh_idx = 0
+    delta = np.zeros((gt_size, ))
+    delta_idx = 0
+    for i in range(gt_size):
+        if ignored_gt[i] == -1:
+            continue
+        det_idx = -1
+        valid_detection = NO_DETECTION
+        max_overlap = 0
+        assigned_ignored_det = False
+
+        for j in range(det_size):
+            if (ignored_det[j] == -1):
+                continue
+            if (assigned_detection[j]):
+                continue
+            if (ignored_threshold[j]):
+                continue
+            overlap = overlaps[j, i]
+            dt_score = dt_scores[j]
+            if (not compute_fp and (overlap > min_overlap)
+                    and dt_score > valid_detection):
+                det_idx = j
+                valid_detection = dt_score
+            elif (compute_fp and (overlap > min_overlap)
+                  and (overlap > max_overlap or assigned_ignored_det)
+                  and ignored_det[j] == 0):
+                max_overlap = overlap
+                det_idx = j
+                valid_detection = 1
+                assigned_ignored_det = False
+            elif (compute_fp and (overlap > min_overlap)
+                  and (valid_detection == NO_DETECTION)
+                  and ignored_det[j] == 1):
+                det_idx = j
+                valid_detection = 1
+                assigned_ignored_det = True
+
+        if (valid_detection == NO_DETECTION) and ignored_gt[i] == 0:
+            fn += 1
+        elif ((valid_detection != NO_DETECTION)
+              and (ignored_gt[i] == 1 or ignored_det[det_idx] == 1)):
+            assigned_detection[det_idx] = True
+        elif valid_detection != NO_DETECTION:
+            tp += 1
+            # thresholds.append(dt_scores[det_idx])
+            thresholds[thresh_idx] = dt_scores[det_idx]
+            thresh_idx += 1
+            if compute_aos:
+                # delta.append(gt_alphas[i] - dt_alphas[det_idx])
+                delta[delta_idx] = gt_alphas[i] - dt_alphas[det_idx]
+                delta_idx += 1
+
+            assigned_detection[det_idx] = True
+    if compute_fp:
+        for i in range(det_size):
+            if (not (assigned_detection[i] or ignored_det[i] == -1
+                     or ignored_det[i] == 1 or ignored_threshold[i])):
+                fp += 1
+        nstuff = 0
+        if metric == 0:
+            overlaps_dt_dc = image_box_overlap(dt_bboxes, dc_bboxes, 0)
+            for i in range(dc_bboxes.shape[0]):
+                for j in range(det_size):
+                    if (assigned_detection[j]):
+                        continue
+                    if (ignored_det[j] == -1 or ignored_det[j] == 1):
+                        continue
+                    if (ignored_threshold[j]):
+                        continue
+                    if overlaps_dt_dc[j, i] > min_overlap:
+                        assigned_detection[j] = True
+                        nstuff += 1
+        fp -= nstuff
+        if compute_aos:
+            tmp = np.zeros((fp + delta_idx, ))
+            # tmp = [0] * fp
+            for i in range(delta_idx):
+                tmp[i + fp] = (1.0 + np.cos(delta[i])) / 2.0
+                # tmp.append((1.0 + np.cos(delta[i])) / 2.0)
+            # assert len(tmp) == fp + tp
+            # assert len(delta) == tp
+            if tp > 0 or fp > 0:
+                similarity = np.sum(tmp)
+            else:
+                similarity = -1
+    return tp, fp, fn, similarity, thresholds[:thresh_idx]
+
+
+def get_split_parts(num, num_part):
+    same_part = num // num_part
+    remain_num = num % num_part
+    if remain_num == 0:
+        return [same_part] * num_part
+    else:
+        return [same_part] * num_part + [remain_num]
+
+
+@numba.jit(nopython=True)
+def fused_compute_statistics(overlaps,
+                             pr,
+                             gt_nums,
+                             dt_nums,
+                             dc_nums,
+                             gt_datas,
+                             dt_datas,
+                             dontcares,
+                             ignored_gts,
+                             ignored_dets,
+                             metric,
+                             min_overlap,
+                             thresholds,
+                             compute_aos=False):
+    gt_num = 0
+    dt_num = 0
+    dc_num = 0
+    for i in range(gt_nums.shape[0]):
+        for t, thresh in enumerate(thresholds):
+            overlap = overlaps[dt_num:dt_num + dt_nums[i], gt_num:
+                               gt_num + gt_nums[i]]
+
+            gt_data = gt_datas[gt_num:gt_num + gt_nums[i]]
+            dt_data = dt_datas[dt_num:dt_num + dt_nums[i]]
+            ignored_gt = ignored_gts[gt_num:gt_num + gt_nums[i]]
+            ignored_det = ignored_dets[dt_num:dt_num + dt_nums[i]]
+            dontcare = dontcares[dc_num:dc_num + dc_nums[i]]
+            tp, fp, fn, similarity, _ = compute_statistics_jit(
+                overlap,
+                gt_data,
+                dt_data,
+                ignored_gt,
+                ignored_det,
+                dontcare,
+                metric,
+                min_overlap=min_overlap,
+                thresh=thresh,
+                compute_fp=True,
+                compute_aos=compute_aos)
+            pr[t, 0] += tp
+            pr[t, 1] += fp
+            pr[t, 2] += fn
+            if similarity != -1:
+                pr[t, 3] += similarity
+        gt_num += gt_nums[i]
+        dt_num += dt_nums[i]
+        dc_num += dc_nums[i]
+
+
+def calculate_iou_partly(gt_annos, dt_annos, metric, num_parts=50):
+    """fast iou algorithm. this function can be used independently to
+    do result analysis. Must be used in CAMERA coordinate system.
+    Args:
+        gt_annos: dict, must from get_label_annos() in kitti_common.py
+        dt_annos: dict, must from get_label_annos() in kitti_common.py
+        metric: eval type. 0: bbox, 1: bev, 2: 3d
+        num_parts: int. a parameter for fast calculate algorithm
+    """
+    assert len(gt_annos) == len(dt_annos)
+    total_dt_num = np.stack([len(a["name"]) for a in dt_annos], 0)
+    total_gt_num = np.stack([len(a["name"]) for a in gt_annos], 0)
+    num_examples = len(gt_annos)
+    split_parts = get_split_parts(num_examples, num_parts)
+    parted_overlaps = []
+    example_idx = 0
+
+    for num_part in split_parts:
+        gt_annos_part = gt_annos[example_idx:example_idx + num_part]
+        dt_annos_part = dt_annos[example_idx:example_idx + num_part]
+        if metric == 0:
+            gt_boxes = np.concatenate([a["bbox"] for a in gt_annos_part], 0)
+            dt_boxes = np.concatenate([a["bbox"] for a in dt_annos_part], 0)
+            overlap_part = image_box_overlap(gt_boxes, dt_boxes)
+        elif metric == 1:
+            loc = np.concatenate(
+                [a["location"][:, [0, 2]] for a in gt_annos_part], 0)
+            dims = np.concatenate(
+                [a["dimensions"][:, [0, 2]] for a in gt_annos_part], 0)
+            rots = np.concatenate([a["rotation_y"] for a in gt_annos_part], 0)
+            gt_boxes = np.concatenate(
+                [loc, dims, rots[..., np.newaxis]], axis=1)
+            loc = np.concatenate(
+                [a["location"][:, [0, 2]] for a in dt_annos_part], 0)
+            dims = np.concatenate(
+                [a["dimensions"][:, [0, 2]] for a in dt_annos_part], 0)
+            rots = np.concatenate([a["rotation_y"] for a in dt_annos_part], 0)
+            dt_boxes = np.concatenate(
+                [loc, dims, rots[..., np.newaxis]], axis=1)
+            overlap_part = bev_box_overlap(gt_boxes, dt_boxes).astype(
+                np.float64)
+        elif metric == 2:
+            loc = np.concatenate([a["location"] for a in gt_annos_part], 0)
+            dims = np.concatenate([a["dimensions"] for a in gt_annos_part], 0)
+            rots = np.concatenate([a["rotation_y"] for a in gt_annos_part], 0)
+            gt_boxes = np.concatenate(
+                [loc, dims, rots[..., np.newaxis]], axis=1)
+            loc = np.concatenate([a["location"] for a in dt_annos_part], 0)
+            dims = np.concatenate([a["dimensions"] for a in dt_annos_part], 0)
+            rots = np.concatenate([a["rotation_y"] for a in dt_annos_part], 0)
+            dt_boxes = np.concatenate(
+                [loc, dims, rots[..., np.newaxis]], axis=1)
+            overlap_part = d3_box_overlap(gt_boxes, dt_boxes).astype(
+                np.float64)
+        else:
+            raise ValueError("unknown metric")
+        parted_overlaps.append(overlap_part)
+        example_idx += num_part
+    overlaps = []
+    example_idx = 0
+    for j, num_part in enumerate(split_parts):
+        gt_annos_part = gt_annos[example_idx:example_idx + num_part]
+        dt_annos_part = dt_annos[example_idx:example_idx + num_part]
+        gt_num_idx, dt_num_idx = 0, 0
+        for i in range(num_part):
+            gt_box_num = total_gt_num[example_idx + i]
+            dt_box_num = total_dt_num[example_idx + i]
+            overlaps.append(
+                parted_overlaps[j][gt_num_idx:gt_num_idx + gt_box_num,
+                                   dt_num_idx:dt_num_idx + dt_box_num])
+            gt_num_idx += gt_box_num
+            dt_num_idx += dt_box_num
+        example_idx += num_part
+
+    return overlaps, parted_overlaps, total_gt_num, total_dt_num
+
+
+def _prepare_data(gt_annos, dt_annos, current_class, difficulty):
+    gt_datas_list = []
+    dt_datas_list = []
+    total_dc_num = []
+    ignored_gts, ignored_dets, dontcares = [], [], []
+    total_num_valid_gt = 0
+    for i in range(len(gt_annos)):
+        rets = clean_data(gt_annos[i], dt_annos[i], current_class, difficulty)
+        num_valid_gt, ignored_gt, ignored_det, dc_bboxes = rets
+        ignored_gts.append(np.array(ignored_gt, dtype=np.int64))
+        ignored_dets.append(np.array(ignored_det, dtype=np.int64))
+        if len(dc_bboxes) == 0:
+            dc_bboxes = np.zeros((0, 4)).astype(np.float64)
+        else:
+            dc_bboxes = np.stack(dc_bboxes, 0).astype(np.float64)
+        total_dc_num.append(dc_bboxes.shape[0])
+        dontcares.append(dc_bboxes)
+        total_num_valid_gt += num_valid_gt
+        gt_datas = np.concatenate(
+            [gt_annos[i]["bbox"], gt_annos[i]["alpha"][..., np.newaxis]], 1)
+        dt_datas = np.concatenate([
+            dt_annos[i]["bbox"], dt_annos[i]["alpha"][..., np.newaxis],
+            dt_annos[i]["score"][..., np.newaxis]
+        ], 1)
+        gt_datas_list.append(gt_datas)
+        dt_datas_list.append(dt_datas)
+    total_dc_num = np.stack(total_dc_num, axis=0)
+    return (gt_datas_list, dt_datas_list, ignored_gts, ignored_dets, dontcares,
+            total_dc_num, total_num_valid_gt)
+
+
+def eval_class(gt_annos,
+               dt_annos,
+               current_classes,
+               difficultys,
+               metric,
+               min_overlaps,
+               compute_aos=False,
+               num_parts=50):
+    """Kitti eval. support 2d/bev/3d/aos eval. support 0.5:0.05:0.95 coco AP.
+    Args:
+        gt_annos: dict, must from get_label_annos() in kitti_common.py
+        dt_annos: dict, must from get_label_annos() in kitti_common.py
+        current_classes: list of int, 0: car, 1: pedestrian, 2: cyclist
+        difficultys: list of int. eval difficulty, 0: easy, 1: normal, 2: hard
+        metric: eval type. 0: bbox, 1: bev, 2: 3d
+        min_overlaps: float, min overlap. format: [num_overlap, metric, class].
+        num_parts: int. a parameter for fast calculate algorithm
+
+    Returns:
+        dict of recall, precision and aos
+    """
+    assert len(gt_annos) == len(dt_annos)
+    num_examples = len(gt_annos)
+    split_parts = get_split_parts(num_examples, num_parts)
+
+    rets = calculate_iou_partly(dt_annos, gt_annos, metric, num_parts)
+    overlaps, parted_overlaps, total_dt_num, total_gt_num = rets
+    N_SAMPLE_PTS = 41
+    num_minoverlap = len(min_overlaps)
+    num_class = len(current_classes)
+    num_difficulty = len(difficultys)
+    precision = np.zeros(
+        [num_class, num_difficulty, num_minoverlap, N_SAMPLE_PTS])
+    recall = np.zeros(
+        [num_class, num_difficulty, num_minoverlap, N_SAMPLE_PTS])
+    aos = np.zeros([num_class, num_difficulty, num_minoverlap, N_SAMPLE_PTS])
+    for m, current_class in enumerate(current_classes):
+        for l, difficulty in enumerate(difficultys):
+            rets = _prepare_data(gt_annos, dt_annos, current_class, difficulty)
+            (gt_datas_list, dt_datas_list, ignored_gts, ignored_dets,
+             dontcares, total_dc_num, total_num_valid_gt) = rets
+            for k, min_overlap in enumerate(min_overlaps[:, metric, m]):
+                thresholdss = []
+                for i in range(len(gt_annos)):
+                    rets = compute_statistics_jit(
+                        overlaps[i],
+                        gt_datas_list[i],
+                        dt_datas_list[i],
+                        ignored_gts[i],
+                        ignored_dets[i],
+                        dontcares[i],
+                        metric,
+                        min_overlap=min_overlap,
+                        thresh=0.0,
+                        compute_fp=False)
+                    tp, fp, fn, similarity, thresholds = rets
+                    thresholdss += thresholds.tolist()
+                thresholdss = np.array(thresholdss)
+                thresholds = get_thresholds(thresholdss, total_num_valid_gt)
+                thresholds = np.array(thresholds)
+                pr = np.zeros([len(thresholds), 4])
+                idx = 0
+                for j, num_part in enumerate(split_parts):
+                    gt_datas_part = np.concatenate(
+                        gt_datas_list[idx:idx + num_part], 0)
+                    dt_datas_part = np.concatenate(
+                        dt_datas_list[idx:idx + num_part], 0)
+                    dc_datas_part = np.concatenate(
+                        dontcares[idx:idx + num_part], 0)
+                    ignored_dets_part = np.concatenate(
+                        ignored_dets[idx:idx + num_part], 0)
+                    ignored_gts_part = np.concatenate(
+                        ignored_gts[idx:idx + num_part], 0)
+                    fused_compute_statistics(
+                        parted_overlaps[j],
+                        pr,
+                        total_gt_num[idx:idx + num_part],
+                        total_dt_num[idx:idx + num_part],
+                        total_dc_num[idx:idx + num_part],
+                        gt_datas_part,
+                        dt_datas_part,
+                        dc_datas_part,
+                        ignored_gts_part,
+                        ignored_dets_part,
+                        metric,
+                        min_overlap=min_overlap,
+                        thresholds=thresholds,
+                        compute_aos=compute_aos)
+                    idx += num_part
+                for i in range(len(thresholds)):
+                    recall[m, l, k, i] = pr[i, 0] / (pr[i, 0] + pr[i, 2])
+                    precision[m, l, k, i] = pr[i, 0] / (pr[i, 0] + pr[i, 1])
+                    if compute_aos:
+                        aos[m, l, k, i] = pr[i, 3] / (pr[i, 0] + pr[i, 1])
+                for i in range(len(thresholds)):
+                    precision[m, l, k, i] = np.max(
+                        precision[m, l, k, i:], axis=-1)
+                    recall[m, l, k, i] = np.max(recall[m, l, k, i:], axis=-1)
+                    if compute_aos:
+                        aos[m, l, k, i] = np.max(aos[m, l, k, i:], axis=-1)
+    ret_dict = {
+        "recall": recall,
+        "precision": precision,
+        "orientation": aos,
+    }
+    return ret_dict
+
+
+def get_mAP(prec):
+    sums = 0
+    for i in range(0, prec.shape[-1], 4):
+        sums = sums + prec[..., i]
+    return sums / 11 * 100
+
+
+def print_str(value, *arg, sstream=None):
+    if sstream is None:
+        sstream = sysio.StringIO()
+    sstream.truncate(0)
+    sstream.seek(0)
+    print(value, *arg, file=sstream)
+    return sstream.getvalue()
+
+
+def do_eval(gt_annos,
+            dt_annos,
+            current_classes,
+            min_overlaps,
+            compute_aos=False):
+    # min_overlaps: [num_minoverlap, metric, num_class]
+    difficultys = [0, 1, 2]
+    ret = eval_class(gt_annos, dt_annos, current_classes, difficultys, 0,
+                     min_overlaps, compute_aos)
+    # ret: [num_class, num_diff, num_minoverlap, num_sample_points]
+    mAP_bbox = get_mAP(ret["precision"])
+    mAP_aos = None
+    if compute_aos:
+        mAP_aos = get_mAP(ret["orientation"])
+    ret = eval_class(gt_annos, dt_annos, current_classes, difficultys, 1,
+                     min_overlaps)
+    mAP_bev = get_mAP(ret["precision"])
+    ret = eval_class(gt_annos, dt_annos, current_classes, difficultys, 2,
+                     min_overlaps)
+    mAP_3d = get_mAP(ret["precision"])
+    return mAP_bbox, mAP_bev, mAP_3d, mAP_aos
+
+
+def do_coco_style_eval(gt_annos, dt_annos, current_classes, overlap_ranges,
+                       compute_aos):
+    # overlap_ranges: [range, metric, num_class]
+    min_overlaps = np.zeros([10, *overlap_ranges.shape[1:]])
+    for i in range(overlap_ranges.shape[1]):
+        for j in range(overlap_ranges.shape[2]):
+            min_overlaps[:, i, j] = np.linspace(*overlap_ranges[:, i, j])
+    mAP_bbox, mAP_bev, mAP_3d, mAP_aos = do_eval(
+        gt_annos, dt_annos, current_classes, min_overlaps, compute_aos)
+    # ret: [num_class, num_diff, num_minoverlap]
+    mAP_bbox = mAP_bbox.mean(-1)
+    mAP_bev = mAP_bev.mean(-1)
+    mAP_3d = mAP_3d.mean(-1)
+    if mAP_aos is not None:
+        mAP_aos = mAP_aos.mean(-1)
+    return mAP_bbox, mAP_bev, mAP_3d, mAP_aos
+
+
+def get_official_eval_result(gt_annos, dt_annos, current_classes):
+    overlap_0_7 = np.array([[0.7, 0.5, 0.5, 0.7,
+                             0.5], [0.7, 0.5, 0.5, 0.7, 0.5],
+                            [0.7, 0.5, 0.5, 0.7, 0.5]])
+    overlap_0_5 = np.array([[0.7, 0.5, 0.5, 0.7,
+                             0.5], [0.5, 0.25, 0.25, 0.5, 0.25],
+                            [0.5, 0.25, 0.25, 0.5, 0.25]])
+    min_overlaps = np.stack([overlap_0_7, overlap_0_5], axis=0)  # [2, 3, 5]
+    class_to_name = {
+        0: 'Car',
+        1: 'Pedestrian',
+        2: 'Cyclist',
+        3: 'Van',
+        4: 'Person_sitting',
+    }
+    name_to_class = {v: n for n, v in class_to_name.items()}
+    if not isinstance(current_classes, (list, tuple)):
+        current_classes = [current_classes]
+    current_classes_int = []
+    for curcls in current_classes:
+        if isinstance(curcls, str):
+            current_classes_int.append(name_to_class[curcls])
+        else:
+            current_classes_int.append(curcls)
+    current_classes = current_classes_int
+    min_overlaps = min_overlaps[:, :, current_classes]
+    result = ''
+    # check whether alpha is valid
+    compute_aos = False
+    for anno in dt_annos:
+        if anno['alpha'].shape[0] != 0:
+            if anno['alpha'][0] != -10:
+                compute_aos = True
+            break
+    mAPbbox, mAPbev, mAP3d, mAPaos = do_eval(
+        gt_annos, dt_annos, current_classes, min_overlaps, compute_aos)
+
+    ret_dict = {}
+    for j, curcls in enumerate(current_classes):
+        # mAP threshold array: [num_minoverlap, metric, class]
+        # mAP result: [num_class, num_diff, num_minoverlap]
+        for i in range(min_overlaps.shape[0]):
+            result += print_str(
+                (f"{class_to_name[curcls]} "
+                 "AP@{:.2f}, {:.2f}, {:.2f}:".format(*min_overlaps[i, :, j])))
+            result += print_str((f"bbox AP:{mAPbbox[j, 0, i]:.4f}, "
+                                 f"{mAPbbox[j, 1, i]:.4f}, "
+                                 f"{mAPbbox[j, 2, i]:.4f}"))
+            result += print_str((f"bev  AP:{mAPbev[j, 0, i]:.4f}, "
+                                 f"{mAPbev[j, 1, i]:.4f}, "
+                                 f"{mAPbev[j, 2, i]:.4f}"))
+            result += print_str((f"3d   AP:{mAP3d[j, 0, i]:.4f}, "
+                                 f"{mAP3d[j, 1, i]:.4f}, "
+                                 f"{mAP3d[j, 2, i]:.4f}"))
+
+
+            if compute_aos:
+                result += print_str((f"aos  AP:{mAPaos[j, 0, i]:.2f}, "
+                                     f"{mAPaos[j, 1, i]:.2f}, "
+                                     f"{mAPaos[j, 2, i]:.2f}"))
+    ret_dict['Car_3d_easy'] = mAP3d[0, 0, 0]
+    ret_dict['Car_3d_moderate'] = mAP3d[0, 1, 0]
+    ret_dict['Car_3d_hard'] = mAP3d[0, 2, 0]
+    ret_dict['Car_bev_easy'] = mAPbev[0, 0, 0]
+    ret_dict['Car_bev_moderate'] = mAPbev[0, 1, 0]
+    ret_dict['Car_bev_hard'] = mAPbev[0, 2, 0]
+    ret_dict['Car_image_easy'] = mAPbbox[0, 0, 0]
+    ret_dict['Car_image_moderate'] = mAPbbox[0, 1, 0]
+    ret_dict['Car_image_hard'] = mAPbbox[0, 2, 0]
+
+    return result, ret_dict
+
+
+def get_coco_eval_result(gt_annos, dt_annos, current_classes):
+    class_to_name = {
+        0: 'Car',
+        1: 'Pedestrian',
+        2: 'Cyclist',
+        3: 'Van',
+        4: 'Person_sitting',
+    }
+    class_to_range = {
+        0: [0.5, 0.95, 10],
+        1: [0.25, 0.7, 10],
+        2: [0.25, 0.7, 10],
+        3: [0.5, 0.95, 10],
+        4: [0.25, 0.7, 10],
+    }
+    name_to_class = {v: n for n, v in class_to_name.items()}
+    if not isinstance(current_classes, (list, tuple)):
+        current_classes = [current_classes]
+    current_classes_int = []
+    for curcls in current_classes:
+        if isinstance(curcls, str):
+            current_classes_int.append(name_to_class[curcls])
+        else:
+            current_classes_int.append(curcls)
+    current_classes = current_classes_int
+    overlap_ranges = np.zeros([3, 3, len(current_classes)])
+    for i, curcls in enumerate(current_classes):
+        overlap_ranges[:, :, i] = np.array(
+            class_to_range[curcls])[:, np.newaxis]
+    result = ''
+    # check whether alpha is valid
+    compute_aos = False
+    for anno in dt_annos:
+        if anno['alpha'].shape[0] != 0:
+            if anno['alpha'][0] != -10:
+                compute_aos = True
+            break
+    mAPbbox, mAPbev, mAP3d, mAPaos = do_coco_style_eval(
+        gt_annos, dt_annos, current_classes, overlap_ranges, compute_aos)
+    for j, curcls in enumerate(current_classes):
+        # mAP threshold array: [num_minoverlap, metric, class]
+        # mAP result: [num_class, num_diff, num_minoverlap]
+        o_range = np.array(class_to_range[curcls])[[0, 2, 1]]
+        o_range[1] = (o_range[2] - o_range[0]) / (o_range[1] - 1)
+        result += print_str((f"{class_to_name[curcls]} "
+                             "coco AP@{:.2f}:{:.2f}:{:.2f}:".format(*o_range)))
+        result += print_str((f"bbox AP:{mAPbbox[j, 0]:.2f}, "
+                             f"{mAPbbox[j, 1]:.2f}, "
+                             f"{mAPbbox[j, 2]:.2f}"))
+        result += print_str((f"bev  AP:{mAPbev[j, 0]:.2f}, "
+                             f"{mAPbev[j, 1]:.2f}, "
+                             f"{mAPbev[j, 2]:.2f}"))
+        result += print_str((f"3d   AP:{mAP3d[j, 0]:.2f}, "
+                             f"{mAP3d[j, 1]:.2f}, "
+                             f"{mAP3d[j, 2]:.2f}"))
+        if compute_aos:
+            result += print_str((f"aos  AP:{mAPaos[j, 0]:.2f}, "
+                                 f"{mAPaos[j, 1]:.2f}, "
+                                 f"{mAPaos[j, 2]:.2f}"))
+    return result
--- a/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/evaluate.py
+++ b/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/evaluate.py
+import time
+import fire
+
+import tools.kitti_object_eval_python.kitti_common as kitti
+from tools.kitti_object_eval_python.eval import get_official_eval_result, get_coco_eval_result
+
+
+def _read_imageset_file(path):
+    with open(path, 'r') as f:
+        lines = f.readlines()
+    return [int(line) for line in lines]
+
+
+def evaluate(label_path,
+             result_path,
+             label_split_file,
+             current_class=0,
+             coco=False,
+             score_thresh=-1):
+    dt_annos = kitti.get_label_annos(result_path)
+    if score_thresh > 0:
+        dt_annos = kitti.filter_annos_low_score(dt_annos, score_thresh)
+    val_image_ids = _read_imageset_file(label_split_file)
+    gt_annos = kitti.get_label_annos(label_path, val_image_ids)
+    if coco:
+        return get_coco_eval_result(gt_annos, dt_annos, current_class)
+    else:
+        return get_official_eval_result(gt_annos, dt_annos, current_class)
+
+
+if __name__ == '__main__':
+    fire.Fire()
--- a/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/kitti_common.py
+++ b/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/kitti_common.py
+import concurrent.futures as futures
+import os
+import pathlib
+import re
+from collections import OrderedDict
+
+import numpy as np
+from skimage import io
+
+def get_image_index_str(img_idx):
+    return "{:06d}".format(img_idx)
+
+
+def get_kitti_info_path(idx,
+                        prefix,
+                        info_type='image_2',
+                        file_tail='.png',
+                        training=True,
+                        relative_path=True):
+    img_idx_str = get_image_index_str(idx)
+    img_idx_str += file_tail
+    prefix = pathlib.Path(prefix)
+    if training:
+        file_path = pathlib.Path('training') / info_type / img_idx_str
+    else:
+        file_path = pathlib.Path('testing') / info_type / img_idx_str
+    if not (prefix / file_path).exists():
+        raise ValueError("file not exist: {}".format(file_path))
+    if relative_path:
+        return str(file_path)
+    else:
+        return str(prefix / file_path)
+
+
+def get_image_path(idx, prefix, training=True, relative_path=True):
+    return get_kitti_info_path(idx, prefix, 'image_2', '.png', training,
+                               relative_path)
+
+
+def get_label_path(idx, prefix, training=True, relative_path=True):
+    return get_kitti_info_path(idx, prefix, 'label_2', '.txt', training,
+                               relative_path)
+
+
+def get_velodyne_path(idx, prefix, training=True, relative_path=True):
+    return get_kitti_info_path(idx, prefix, 'velodyne', '.bin', training,
+                               relative_path)
+
+
+def get_calib_path(idx, prefix, training=True, relative_path=True):
+    return get_kitti_info_path(idx, prefix, 'calib', '.txt', training,
+                               relative_path)
+
+
+def _extend_matrix(mat):
+    mat = np.concatenate([mat, np.array([[0., 0., 0., 1.]])], axis=0)
+    return mat
+
+
+def get_kitti_image_info(path,
+                         training=True,
+                         label_info=True,
+                         velodyne=False,
+                         calib=False,
+                         image_ids=7481,
+                         extend_matrix=True,
+                         num_worker=8,
+                         relative_path=True,
+                         with_imageshape=True):
+    # image_infos = []
+    root_path = pathlib.Path(path)
+    if not isinstance(image_ids, list):
+        image_ids = list(range(image_ids))
+
+    def map_func(idx):
+        image_info = {'image_idx': idx}
+        annotations = None
+        if velodyne:
+            image_info['velodyne_path'] = get_velodyne_path(
+                idx, path, training, relative_path)
+        image_info['img_path'] = get_image_path(idx, path, training,
+                                                relative_path)
+        if with_imageshape:
+            img_path = image_info['img_path']
+            if relative_path:
+                img_path = str(root_path / img_path)
+            image_info['img_shape'] = np.array(
+                io.imread(img_path).shape[:2], dtype=np.int32)
+        if label_info:
+            label_path = get_label_path(idx, path, training, relative_path)
+            if relative_path:
+                label_path = str(root_path / label_path)
+            annotations = get_label_anno(label_path)
+        if calib:
+            calib_path = get_calib_path(
+                idx, path, training, relative_path=False)
+            with open(calib_path, 'r') as f:
+                lines = f.readlines()
+            P0 = np.array(
+                [float(info) for info in lines[0].split(' ')[1:13]]).reshape(
+                    [3, 4])
+            P1 = np.array(
+                [float(info) for info in lines[1].split(' ')[1:13]]).reshape(
+                    [3, 4])
+            P2 = np.array(
+                [float(info) for info in lines[2].split(' ')[1:13]]).reshape(
+                    [3, 4])
+            P3 = np.array(
+                [float(info) for info in lines[3].split(' ')[1:13]]).reshape(
+                    [3, 4])
+            if extend_matrix:
+                P0 = _extend_matrix(P0)
+                P1 = _extend_matrix(P1)
+                P2 = _extend_matrix(P2)
+                P3 = _extend_matrix(P3)
+            image_info['calib/P0'] = P0
+            image_info['calib/P1'] = P1
+            image_info['calib/P2'] = P2
+            image_info['calib/P3'] = P3
+            R0_rect = np.array([
+                float(info) for info in lines[4].split(' ')[1:10]
+            ]).reshape([3, 3])
+            if extend_matrix:
+                rect_4x4 = np.zeros([4, 4], dtype=R0_rect.dtype)
+                rect_4x4[3, 3] = 1.
+                rect_4x4[:3, :3] = R0_rect
+            else:
+                rect_4x4 = R0_rect
+            image_info['calib/R0_rect'] = rect_4x4
+            Tr_velo_to_cam = np.array([
+                float(info) for info in lines[5].split(' ')[1:13]
+            ]).reshape([3, 4])
+            Tr_imu_to_velo = np.array([
+                float(info) for info in lines[6].split(' ')[1:13]
+            ]).reshape([3, 4])
+            if extend_matrix:
+                Tr_velo_to_cam = _extend_matrix(Tr_velo_to_cam)
+                Tr_imu_to_velo = _extend_matrix(Tr_imu_to_velo)
+            image_info['calib/Tr_velo_to_cam'] = Tr_velo_to_cam
+            image_info['calib/Tr_imu_to_velo'] = Tr_imu_to_velo
+        if annotations is not None:
+            image_info['annos'] = annotations
+            add_difficulty_to_annos(image_info)
+        return image_info
+
+    with futures.ThreadPoolExecutor(num_worker) as executor:
+        image_infos = executor.map(map_func, image_ids)
+    return list(image_infos)
+
+
+def filter_kitti_anno(image_anno,
+                      used_classes,
+                      used_difficulty=None,
+                      dontcare_iou=None):
+    if not isinstance(used_classes, (list, tuple)):
+        used_classes = [used_classes]
+    img_filtered_annotations = {}
+    relevant_annotation_indices = [
+        i for i, x in enumerate(image_anno['name']) if x in used_classes
+    ]
+    for key in image_anno.keys():
+        img_filtered_annotations[key] = (
+            image_anno[key][relevant_annotation_indices])
+    if used_difficulty is not None:
+        relevant_annotation_indices = [
+            i for i, x in enumerate(img_filtered_annotations['difficulty'])
+            if x in used_difficulty
+        ]
+        for key in image_anno.keys():
+            img_filtered_annotations[key] = (
+                img_filtered_annotations[key][relevant_annotation_indices])
+
+    if 'DontCare' in used_classes and dontcare_iou is not None:
+        dont_care_indices = [
+            i for i, x in enumerate(img_filtered_annotations['name'])
+            if x == 'DontCare'
+        ]
+        # bounding box format [y_min, x_min, y_max, x_max]
+        all_boxes = img_filtered_annotations['bbox']
+        ious = iou(all_boxes, all_boxes[dont_care_indices])
+
+        # Remove all bounding boxes that overlap with a dontcare region.
+        if ious.size > 0:
+            boxes_to_remove = np.amax(ious, axis=1) > dontcare_iou
+            for key in image_anno.keys():
+                img_filtered_annotations[key] = (img_filtered_annotations[key][
+                    np.logical_not(boxes_to_remove)])
+    return img_filtered_annotations
+
+def filter_annos_low_score(image_annos, thresh):
+    new_image_annos = []
+    for anno in image_annos:
+        img_filtered_annotations = {}
+        relevant_annotation_indices = [
+            i for i, s in enumerate(anno['score']) if s >= thresh
+        ]
+        for key in anno.keys():
+            img_filtered_annotations[key] = (
+                anno[key][relevant_annotation_indices])
+        new_image_annos.append(img_filtered_annotations)
+    return new_image_annos
+
+def kitti_result_line(result_dict, precision=4):
+    prec_float = "{" + ":.{}f".format(precision) + "}"
+    res_line = []
+    all_field_default = OrderedDict([
+        ('name', None),
+        ('truncated', -1),
+        ('occluded', -1),
+        ('alpha', -10),
+        ('bbox', None),
+        ('dimensions', [-1, -1, -1]),
+        ('location', [-1000, -1000, -1000]),
+        ('rotation_y', -10),
+        ('score', None),
+    ])
+    res_dict = [(key, None) for key, val in all_field_default.items()]
+    res_dict = OrderedDict(res_dict)
+    for key, val in result_dict.items():
+        if all_field_default[key] is None and val is None:
+            raise ValueError("you must specify a value for {}".format(key))
+        res_dict[key] = val
+
+    for key, val in res_dict.items():
+        if key == 'name':
+            res_line.append(val)
+        elif key in ['truncated', 'alpha', 'rotation_y', 'score']:
+            if val is None:
+                res_line.append(str(all_field_default[key]))
+            else:
+                res_line.append(prec_float.format(val))
+        elif key == 'occluded':
+            if val is None:
+                res_line.append(str(all_field_default[key]))
+            else:
+                res_line.append('{}'.format(val))
+        elif key in ['bbox', 'dimensions', 'location']:
+            if val is None:
+                res_line += [str(v) for v in all_field_default[key]]
+            else:
+                res_line += [prec_float.format(v) for v in val]
+        else:
+            raise ValueError("unknown key. supported key:{}".format(
+                res_dict.keys()))
+    return ' '.join(res_line)
+
+
+def add_difficulty_to_annos(info):
+    min_height = [40, 25,
+                  25]  # minimum height for evaluated groundtruth/detections
+    max_occlusion = [
+        0, 1, 2
+    ]  # maximum occlusion level of the groundtruth used for evaluation
+    max_trunc = [
+        0.15, 0.3, 0.5
+    ]  # maximum truncation level of the groundtruth used for evaluation
+    annos = info['annos']
+    dims = annos['dimensions']  # lhw format
+    bbox = annos['bbox']
+    height = bbox[:, 3] - bbox[:, 1]
+    occlusion = annos['occluded']
+    truncation = annos['truncated']
+    diff = []
+    easy_mask = np.ones((len(dims), ), dtype=np.bool)
+    moderate_mask = np.ones((len(dims), ), dtype=np.bool)
+    hard_mask = np.ones((len(dims), ), dtype=np.bool)
+    i = 0
+    for h, o, t in zip(height, occlusion, truncation):
+        if o > max_occlusion[0] or h <= min_height[0] or t > max_trunc[0]:
+            easy_mask[i] = False
+        if o > max_occlusion[1] or h <= min_height[1] or t > max_trunc[1]:
+            moderate_mask[i] = False
+        if o > max_occlusion[2] or h <= min_height[2] or t > max_trunc[2]:
+            hard_mask[i] = False
+        i += 1
+    is_easy = easy_mask
+    is_moderate = np.logical_xor(easy_mask, moderate_mask)
+    is_hard = np.logical_xor(hard_mask, moderate_mask)
+
+    for i in range(len(dims)):
+        if is_easy[i]:
+            diff.append(0)
+        elif is_moderate[i]:
+            diff.append(1)
+        elif is_hard[i]:
+            diff.append(2)
+        else:
+            diff.append(-1)
+    annos["difficulty"] = np.array(diff, np.int32)
+    return diff
+
+
+def get_label_anno(label_path):
+    annotations = {}
+    annotations.update({
+        'name': [],
+        'truncated': [],
+        'occluded': [],
+        'alpha': [],
+        'bbox': [],
+        'dimensions': [],
+        'location': [],
+        'rotation_y': []
+    })
+    with open(label_path, 'r') as f:
+        lines = f.readlines()
+    # if len(lines) == 0 or len(lines[0]) < 15:
+    #     content = []
+    # else:
+    content = [line.strip().split(' ') for line in lines]
+    annotations['name'] = np.array([x[0] for x in content])
+    annotations['truncated'] = np.array([float(x[1]) for x in content])
+    annotations['occluded'] = np.array([int(x[2]) for x in content])
+    annotations['alpha'] = np.array([float(x[3]) for x in content])
+    annotations['bbox'] = np.array(
+        [[float(info) for info in x[4:8]] for x in content]).reshape(-1, 4)
+    # dimensions will convert hwl format to standard lhw(camera) format.
+    annotations['dimensions'] = np.array(
+        [[float(info) for info in x[8:11]] for x in content]).reshape(
+            -1, 3)[:, [2, 0, 1]]
+    annotations['location'] = np.array(
+        [[float(info) for info in x[11:14]] for x in content]).reshape(-1, 3)
+    annotations['rotation_y'] = np.array(
+        [float(x[14]) for x in content]).reshape(-1)
+    if len(content) != 0 and len(content[0]) == 16:  # have score
+        annotations['score'] = np.array([float(x[15]) for x in content])
+    else:
+        annotations['score'] = np.zeros([len(annotations['bbox'])])
+    return annotations
+
+def get_label_annos(label_folder, image_ids=None):
+    if image_ids is None:
+        filepaths = pathlib.Path(label_folder).glob('*.txt')
+        prog = re.compile(r'^\d{6}.txt$')
+        filepaths = filter(lambda f: prog.match(f.name), filepaths)
+        image_ids = [int(p.stem) for p in filepaths]
+        image_ids = sorted(image_ids)
+    if not isinstance(image_ids, list):
+        image_ids = list(range(image_ids))
+    annos = []
+    label_folder = pathlib.Path(label_folder)
+    for idx in image_ids:
+        image_idx = get_image_index_str(idx)
+        label_filename = label_folder / (image_idx + '.txt')
+        annos.append(get_label_anno(label_filename))
+    return annos
+
+def area(boxes, add1=False):
+    """Computes area of boxes.
+
+    Args:
+        boxes: Numpy array with shape [N, 4] holding N boxes
+
+    Returns:
+        a numpy array with shape [N*1] representing box areas
+    """
+    if add1:
+        return (boxes[:, 2] - boxes[:, 0] + 1.0) * (
+            boxes[:, 3] - boxes[:, 1] + 1.0)
+    else:
+        return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
+
+
+def intersection(boxes1, boxes2, add1=False):
+    """Compute pairwise intersection areas between boxes.
+
+    Args:
+        boxes1: a numpy array with shape [N, 4] holding N boxes
+        boxes2: a numpy array with shape [M, 4] holding M boxes
+
+    Returns:
+        a numpy array with shape [N*M] representing pairwise intersection area
+    """
+    [y_min1, x_min1, y_max1, x_max1] = np.split(boxes1, 4, axis=1)
+    [y_min2, x_min2, y_max2, x_max2] = np.split(boxes2, 4, axis=1)
+
+    all_pairs_min_ymax = np.minimum(y_max1, np.transpose(y_max2))
+    all_pairs_max_ymin = np.maximum(y_min1, np.transpose(y_min2))
+    if add1:
+        all_pairs_min_ymax += 1.0
+    intersect_heights = np.maximum(
+        np.zeros(all_pairs_max_ymin.shape),
+        all_pairs_min_ymax - all_pairs_max_ymin)
+
+    all_pairs_min_xmax = np.minimum(x_max1, np.transpose(x_max2))
+    all_pairs_max_xmin = np.maximum(x_min1, np.transpose(x_min2))
+    if add1:
+        all_pairs_min_xmax += 1.0
+    intersect_widths = np.maximum(
+        np.zeros(all_pairs_max_xmin.shape),
+        all_pairs_min_xmax - all_pairs_max_xmin)
+    return intersect_heights * intersect_widths
+
+
+def iou(boxes1, boxes2, add1=False):
+    """Computes pairwise intersection-over-union between box collections.
+
+    Args:
+        boxes1: a numpy array with shape [N, 4] holding N boxes.
+        boxes2: a numpy array with shape [M, 4] holding N boxes.
+
+    Returns:
+        a numpy array with shape [N, M] representing pairwise iou scores.
+    """
+    intersect = intersection(boxes1, boxes2, add1)
+    area1 = area(boxes1, add1)
+    area2 = area(boxes2, add1)
+    union = np.expand_dims(
+        area1, axis=1) + np.expand_dims(
+            area2, axis=0) - intersect
+    return intersect / union
--- a/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/rotate_iou.py
+++ b/PaddleCV/Paddle3D/PointRCNN/tools/kitti_object_eval_python/rotate_iou.py
+#####################
+# Based on https://github.com/hongzhenwang/RRPN-revise
+# Licensed under The MIT License
+# Author: yanyan, scrin@foxmail.com
+#####################
+import math
+
+import numba
+import numpy as np
+from numba import cuda
+
+@numba.jit(nopython=True)
+def div_up(m, n):
+    return m // n + (m % n > 0)
+
+@cuda.jit('(float32[:], float32[:], float32[:])', device=True, inline=True)
+def trangle_area(a, b, c):
+    return ((a[0] - c[0]) * (b[1] - c[1]) - (a[1] - c[1]) *
+            (b[0] - c[0])) / 2.0
+
+
+@cuda.jit('(float32[:], int32)', device=True, inline=True)
+def area(int_pts, num_of_inter):
+    area_val = 0.0
+    for i in range(num_of_inter - 2):
+        area_val += abs(
+            trangle_area(int_pts[:2], int_pts[2 * i + 2:2 * i + 4],
+                         int_pts[2 * i + 4:2 * i + 6]))
+    return area_val
+
+
+@cuda.jit('(float32[:], int32)', device=True, inline=True)
+def sort_vertex_in_convex_polygon(int_pts, num_of_inter):
+    if num_of_inter > 0:
+        center = cuda.local.array((2, ), dtype=numba.float32)
+        center[:] = 0.0
+        for i in range(num_of_inter):
+            center[0] += int_pts[2 * i]
+            center[1] += int_pts[2 * i + 1]
+        center[0] /= num_of_inter
+        center[1] /= num_of_inter
+        v = cuda.local.array((2, ), dtype=numba.float32)
+        vs = cuda.local.array((16, ), dtype=numba.float32)
+        for i in range(num_of_inter):
+            v[0] = int_pts[2 * i] - center[0]
+            v[1] = int_pts[2 * i + 1] - center[1]
+            d = math.sqrt(v[0] * v[0] + v[1] * v[1])
+            v[0] = v[0] / d
+            v[1] = v[1] / d
+            if v[1] < 0:
+                v[0] = -2 - v[0]
+            vs[i] = v[0]
+        j = 0
+        temp = 0
+        for i in range(1, num_of_inter):
+            if vs[i - 1] > vs[i]:
+                temp = vs[i]
+                tx = int_pts[2 * i]
+                ty = int_pts[2 * i + 1]
+                j = i
+                while j > 0 and vs[j - 1] > temp:
+                    vs[j] = vs[j - 1]
+                    int_pts[j * 2] = int_pts[j * 2 - 2]
+                    int_pts[j * 2 + 1] = int_pts[j * 2 - 1]
+                    j -= 1
+
+                vs[j] = temp
+                int_pts[j * 2] = tx
+                int_pts[j * 2 + 1] = ty
+
+
+@cuda.jit(
+    '(float32[:], float32[:], int32, int32, float32[:])',
+    device=True,
+    inline=True)
+def line_segment_intersection(pts1, pts2, i, j, temp_pts):
+    A = cuda.local.array((2, ), dtype=numba.float32)
+    B = cuda.local.array((2, ), dtype=numba.float32)
+    C = cuda.local.array((2, ), dtype=numba.float32)
+    D = cuda.local.array((2, ), dtype=numba.float32)
+
+    A[0] = pts1[2 * i]
+    A[1] = pts1[2 * i + 1]
+
+    B[0] = pts1[2 * ((i + 1) % 4)]
+    B[1] = pts1[2 * ((i + 1) % 4) + 1]
+
+    C[0] = pts2[2 * j]
+    C[1] = pts2[2 * j + 1]
+
+    D[0] = pts2[2 * ((j + 1) % 4)]
+    D[1] = pts2[2 * ((j + 1) % 4) + 1]
+    BA0 = B[0] - A[0]
+    BA1 = B[1] - A[1]
+    DA0 = D[0] - A[0]
+    CA0 = C[0] - A[0]
+    DA1 = D[1] - A[1]
+    CA1 = C[1] - A[1]
+    acd = DA1 * CA0 > CA1 * DA0
+    bcd = (D[1] - B[1]) * (C[0] - B[0]) > (C[1] - B[1]) * (D[0] - B[0])
+    if acd != bcd:
+        abc = CA1 * BA0 > BA1 * CA0
+        abd = DA1 * BA0 > BA1 * DA0
+        if abc != abd:
+            DC0 = D[0] - C[0]
+            DC1 = D[1] - C[1]
+            ABBA = A[0] * B[1] - B[0] * A[1]
+            CDDC = C[0] * D[1] - D[0] * C[1]
+            DH = BA1 * DC0 - BA0 * DC1
+            Dx = ABBA * DC0 - BA0 * CDDC
+            Dy = ABBA * DC1 - BA1 * CDDC
+            temp_pts[0] = Dx / DH
+            temp_pts[1] = Dy / DH
+            return True
+    return False
+
+
+@cuda.jit(
+    '(float32[:], float32[:], int32, int32, float32[:])',
+    device=True,
+    inline=True)
+def line_segment_intersection_v1(pts1, pts2, i, j, temp_pts):
+    a = cuda.local.array((2, ), dtype=numba.float32)
+    b = cuda.local.array((2, ), dtype=numba.float32)
+    c = cuda.local.array((2, ), dtype=numba.float32)
+    d = cuda.local.array((2, ), dtype=numba.float32)
+
+    a[0] = pts1[2 * i]
+    a[1] = pts1[2 * i + 1]
+
+    b[0] = pts1[2 * ((i + 1) % 4)]
+    b[1] = pts1[2 * ((i + 1) % 4) + 1]
+
+    c[0] = pts2[2 * j]
+    c[1] = pts2[2 * j + 1]
+
+    d[0] = pts2[2 * ((j + 1) % 4)]
+    d[1] = pts2[2 * ((j + 1) % 4) + 1]
+
+    area_abc = trangle_area(a, b, c)
+    area_abd = trangle_area(a, b, d)
+
+    if area_abc * area_abd >= 0:
+        return False
+
+    area_cda = trangle_area(c, d, a)
+    area_cdb = area_cda + area_abc - area_abd
+
+    if area_cda * area_cdb >= 0:
+        return False
+    t = area_cda / (area_abd - area_abc)
+
+    dx = t * (b[0] - a[0])
+    dy = t * (b[1] - a[1])
+    temp_pts[0] = a[0] + dx
+    temp_pts[1] = a[1] + dy
+    return True
+
+
+@cuda.jit('(float32, float32, float32[:])', device=True, inline=True)
+def point_in_quadrilateral(pt_x, pt_y, corners):
+    ab0 = corners[2] - corners[0]
+    ab1 = corners[3] - corners[1]
+
+    ad0 = corners[6] - corners[0]
+    ad1 = corners[7] - corners[1]
+
+    ap0 = pt_x - corners[0]
+    ap1 = pt_y - corners[1]
+
+    abab = ab0 * ab0 + ab1 * ab1
+    abap = ab0 * ap0 + ab1 * ap1
+    adad = ad0 * ad0 + ad1 * ad1
+    adap = ad0 * ap0 + ad1 * ap1
+
+    return abab >= abap and abap >= 0 and adad >= adap and adap >= 0
+
+
+@cuda.jit('(float32[:], float32[:], float32[:])', device=True, inline=True)
+def quadrilateral_intersection(pts1, pts2, int_pts):
+    num_of_inter = 0
+    for i in range(4):
+        if point_in_quadrilateral(pts1[2 * i], pts1[2 * i + 1], pts2):
+            int_pts[num_of_inter * 2] = pts1[2 * i]
+            int_pts[num_of_inter * 2 + 1] = pts1[2 * i + 1]
+            num_of_inter += 1
+        if point_in_quadrilateral(pts2[2 * i], pts2[2 * i + 1], pts1):
+            int_pts[num_of_inter * 2] = pts2[2 * i]
+            int_pts[num_of_inter * 2 + 1] = pts2[2 * i + 1]
+            num_of_inter += 1
+    temp_pts = cuda.local.array((2, ), dtype=numba.float32)
+    for i in range(4):
+        for j in range(4):
+            has_pts = line_segment_intersection(pts1, pts2, i, j, temp_pts)
+            if has_pts:
+                int_pts[num_of_inter * 2] = temp_pts[0]
+                int_pts[num_of_inter * 2 + 1] = temp_pts[1]
+                num_of_inter += 1
+
+    return num_of_inter
+
+
+@cuda.jit('(float32[:], float32[:])', device=True, inline=True)
+def rbbox_to_corners(corners, rbbox):
+    # generate clockwise corners and rotate it clockwise
+    angle = rbbox[4]
+    a_cos = math.cos(angle)
+    a_sin = math.sin(angle)
+    center_x = rbbox[0]
+    center_y = rbbox[1]
+    x_d = rbbox[2]
+    y_d = rbbox[3]
+    corners_x = cuda.local.array((4, ), dtype=numba.float32)
+    corners_y = cuda.local.array((4, ), dtype=numba.float32)
+    corners_x[0] = -x_d / 2
+    corners_x[1] = -x_d / 2
+    corners_x[2] = x_d / 2
+    corners_x[3] = x_d / 2
+    corners_y[0] = -y_d / 2
+    corners_y[1] = y_d / 2
+    corners_y[2] = y_d / 2
+    corners_y[3] = -y_d / 2
+    for i in range(4):
+        corners[2 *
+                i] = a_cos * corners_x[i] + a_sin * corners_y[i] + center_x
+        corners[2 * i
+                + 1] = -a_sin * corners_x[i] + a_cos * corners_y[i] + center_y
+
+
+@cuda.jit('(float32[:], float32[:])', device=True, inline=True)
+def inter(rbbox1, rbbox2):
+    corners1 = cuda.local.array((8, ), dtype=numba.float32)
+    corners2 = cuda.local.array((8, ), dtype=numba.float32)
+    intersection_corners = cuda.local.array((16, ), dtype=numba.float32)
+
+    rbbox_to_corners(corners1, rbbox1)
+    rbbox_to_corners(corners2, rbbox2)
+
+    num_intersection = quadrilateral_intersection(corners1, corners2,
+                                                  intersection_corners)
+    sort_vertex_in_convex_polygon(intersection_corners, num_intersection)
+    # print(intersection_corners.reshape([-1, 2])[:num_intersection])
+
+    return area(intersection_corners, num_intersection)
+
+
+@cuda.jit('(float32[:], float32[:], int32)', device=True, inline=True)
+def devRotateIoUEval(rbox1, rbox2, criterion=-1):
+    area1 = rbox1[2] * rbox1[3]
+    area2 = rbox2[2] * rbox2[3]
+    area_inter = inter(rbox1, rbox2)
+    if criterion == -1:
+        return area_inter / (area1 + area2 - area_inter)
+    elif criterion == 0:
+        return area_inter / area1
+    elif criterion == 1:
+        return area_inter / area2
+    else:
+        return area_inter
+
+@cuda.jit('(int64, int64, float32[:], float32[:], float32[:], int32)', fastmath=False)
+def rotate_iou_kernel_eval(N, K, dev_boxes, dev_query_boxes, dev_iou, criterion=-1):
+    threadsPerBlock = 8 * 8
+    row_start = cuda.blockIdx.x
+    col_start = cuda.blockIdx.y
+    tx = cuda.threadIdx.x
+    row_size = min(N - row_start * threadsPerBlock, threadsPerBlock)
+    col_size = min(K - col_start * threadsPerBlock, threadsPerBlock)
+    block_boxes = cuda.shared.array(shape=(64 * 5, ), dtype=numba.float32)
+    block_qboxes = cuda.shared.array(shape=(64 * 5, ), dtype=numba.float32)
+
+    dev_query_box_idx = threadsPerBlock * col_start + tx
+    dev_box_idx = threadsPerBlock * row_start + tx
+    if (tx < col_size):
+        block_qboxes[tx * 5 + 0] = dev_query_boxes[dev_query_box_idx * 5 + 0]
+        block_qboxes[tx * 5 + 1] = dev_query_boxes[dev_query_box_idx * 5 + 1]
+        block_qboxes[tx * 5 + 2] = dev_query_boxes[dev_query_box_idx * 5 + 2]
+        block_qboxes[tx * 5 + 3] = dev_query_boxes[dev_query_box_idx * 5 + 3]
+        block_qboxes[tx * 5 + 4] = dev_query_boxes[dev_query_box_idx * 5 + 4]
+    if (tx < row_size):
+        block_boxes[tx * 5 + 0] = dev_boxes[dev_box_idx * 5 + 0]
+        block_boxes[tx * 5 + 1] = dev_boxes[dev_box_idx * 5 + 1]
+        block_boxes[tx * 5 + 2] = dev_boxes[dev_box_idx * 5 + 2]
+        block_boxes[tx * 5 + 3] = dev_boxes[dev_box_idx * 5 + 3]
+        block_boxes[tx * 5 + 4] = dev_boxes[dev_box_idx * 5 + 4]
+    cuda.syncthreads()
+    if tx < row_size:
+        for i in range(col_size):
+            offset = row_start * threadsPerBlock * K + col_start * threadsPerBlock + tx * K + i
+            dev_iou[offset] = devRotateIoUEval(block_qboxes[i * 5:i * 5 + 5],
+                                           block_boxes[tx * 5:tx * 5 + 5], criterion)
+
+
+def rotate_iou_gpu_eval(boxes, query_boxes, criterion=-1, device_id=0):
+    """rotated box iou running in gpu. 500x faster than cpu version
+    (take 5ms in one example with numba.cuda code).
+    convert from [this project](
+        https://github.com/hongzhenwang/RRPN-revise/tree/master/lib/rotation).
+    
+    Args:
+        boxes (float tensor: [N, 5]): rbboxes. format: centers, dims, 
+            angles(clockwise when positive)
+        query_boxes (float tensor: [K, 5]): [description]
+        device_id (int, optional): Defaults to 0. [description]
+    
+    Returns:
+        [type]: [description]
+    """
+    box_dtype = boxes.dtype
+    boxes = boxes.astype(np.float32)
+    query_boxes = query_boxes.astype(np.float32)
+    N = boxes.shape[0]
+    K = query_boxes.shape[0]
+    iou = np.zeros((N, K), dtype=np.float32)
+    if N == 0 or K == 0:
+        return iou
+    threadsPerBlock = 8 * 8
+    cuda.select_device(device_id)
+    blockspergrid = (div_up(N, threadsPerBlock), div_up(K, threadsPerBlock))
+    
+    stream = cuda.stream()
+    with stream.auto_synchronize():
+        boxes_dev = cuda.to_device(boxes.reshape([-1]), stream)
+        query_boxes_dev = cuda.to_device(query_boxes.reshape([-1]), stream)
+        iou_dev = cuda.to_device(iou.reshape([-1]), stream)
+        rotate_iou_kernel_eval[blockspergrid, threadsPerBlock, stream](
+            N, K, boxes_dev, query_boxes_dev, iou_dev, criterion)
+        iou_dev.copy_to_host(iou.reshape([-1]), stream=stream)
+    return iou.astype(boxes.dtype)
--- a/PaddleCV/Paddle3D/PointRCNN/train.py
+++ b/PaddleCV/Paddle3D/PointRCNN/train.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import sys
+import time
+import shutil
+import argparse
+import logging
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.layers import control_flow
+from paddle.fluid.contrib.extend_optimizer import extend_with_decoupled_weight_decay
+import paddle.fluid.layers.learning_rate_scheduler as lr_scheduler
+
+from models.point_rcnn import PointRCNN
+from data.kitti_rcnn_reader import KittiRCNNReader
+from utils.run_utils import *
+from utils.config import cfg, load_config, set_config_from_list
+from utils.optimizer import optimize
+
+logging.root.handlers = []
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
+logger = logging.getLogger(__name__)
+
+
+def parse_args():
+    parser = argparse.ArgumentParser("PointRCNN semantic segmentation train script")
+    parser.add_argument(
+        '--cfg',
+        type=str,
+        default='cfgs/default.yml',
+        help='specify the config for training')
+    parser.add_argument(
+        '--train_mode',
+        type=str,
+        default='rpn',
+        required=True,
+        help='specify the training mode')
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=16,
+        required=True,
+        help='training batch size, default 16')
+    parser.add_argument(
+        '--epoch',
+        type=int,
+        default=200,
+        required=True,
+        help='epoch number. default 200.')
+    parser.add_argument(
+        '--save_dir',
+        type=str,
+        default='checkpoints',
+        help='directory name to save train snapshoot')
+    parser.add_argument(
+        '--resume',
+        type=str,
+        default=None,
+        help='path to resume training based on previous checkpoints. '
+        'None for not resuming any checkpoints.')
+    parser.add_argument(
+        '--resume_epoch',
+        type=int,
+        default=0,
+        help='resume epoch id')
+    parser.add_argument(
+        '--data_dir',
+        type=str,
+        default='./data',
+        help='KITTI dataset root directory')
+    parser.add_argument(
+        '--gt_database',
+        type=str,
+        default='data/gt_database/train_gt_database_3level_Car.pkl',
+        help='generated gt database for augmentation')
+    parser.add_argument(
+        '--rcnn_training_roi_dir',
+        type=str,
+        default=None,
+	help='specify the saved rois for rcnn training when using rcnn_offline mode')
+    parser.add_argument(
+        '--rcnn_training_feature_dir',
+        type=str,
+        default=None,
+	help='specify the saved features for rcnn training when using rcnn_offline mode')
+    parser.add_argument(
+        '--log_interval',
+        type=int,
+        default=1,
+        help='mini-batch interval to log.')
+    parser.add_argument(
+        '--set',
+        dest='set_cfgs',
+        default=None,
+        nargs=argparse.REMAINDER,
+        help='set extra config keys if needed.')
+    args = parser.parse_args()
+    return args
+
+
+def train():
+    args = parse_args()
+    print_arguments(args)
+    # check whether the installed paddle is compiled with GPU
+    # PointRCNN model can only run on GPU
+    check_gpu(True)
+
+    load_config(args.cfg)
+    if args.set_cfgs is not None:
+        set_config_from_list(args.set_cfgs)
+
+    if args.train_mode == 'rpn':
+        cfg.RPN.ENABLED = True
+        cfg.RCNN.ENABLED = False
+    elif args.train_mode == 'rcnn':
+        cfg.RCNN.ENABLED = True
+        cfg.RPN.ENABLED = cfg.RPN.FIXED = True
+    elif args.train_mode == 'rcnn_offline':
+        cfg.RCNN.ENABLED = True
+        cfg.RPN.ENABLED = False
+    else:
+        raise NotImplementedError("unknown train mode: {}".format(args.train_mode))
+
+    checkpoints_dir = os.path.join(args.save_dir, args.train_mode)
+    if not os.path.isdir(checkpoints_dir):
+        os.makedirs(checkpoints_dir)
+
+    kitti_rcnn_reader = KittiRCNNReader(data_dir=args.data_dir,
+                                    npoints=cfg.RPN.NUM_POINTS,
+                                    split=cfg.TRAIN.SPLIT,
+                                    mode='TRAIN',
+                                    classes=cfg.CLASSES,
+                                    rcnn_training_roi_dir=args.rcnn_training_roi_dir,
+                                    rcnn_training_feature_dir=args.rcnn_training_feature_dir,
+                                    gt_database_dir=args.gt_database)
+    num_samples = len(kitti_rcnn_reader)
+    steps_per_epoch = int(num_samples / args.batch_size)
+    logger.info("Total {} samples, {} batch per epoch.".format(num_samples, steps_per_epoch))
+    boundaries = [i * steps_per_epoch for i in cfg.TRAIN.DECAY_STEP_LIST]
+    values = [cfg.TRAIN.LR * (cfg.TRAIN.LR_DECAY ** i) for i in range(len(boundaries) + 1)]
+
+    place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+
+    # build model
+    startup = fluid.Program()
+    train_prog = fluid.Program()
+    with fluid.program_guard(train_prog, startup):
+        with fluid.unique_name.guard():
+            train_model = PointRCNN(cfg, args.batch_size, True, 'TRAIN')
+            train_model.build()
+            train_pyreader = train_model.get_pyreader()
+            train_feeds = train_model.get_feeds()
+            train_outputs = train_model.get_outputs()
+            train_loss = train_outputs['loss']
+            lr = optimize(train_loss,
+                          learning_rate=cfg.TRAIN.LR,
+                          warmup_factor=1. / cfg.TRAIN.DIV_FACTOR,
+                          decay_factor=1e-5,
+                          total_step=steps_per_epoch * args.epoch,
+                          warmup_pct=cfg.TRAIN.PCT_START,
+                          train_program=train_prog,
+                          startup_prog=startup,
+                          weight_decay=cfg.TRAIN.WEIGHT_DECAY,
+                          clip_norm=cfg.TRAIN.GRAD_NORM_CLIP)
+    train_keys, train_values = parse_outputs(train_outputs, 'loss')
+
+    exe.run(startup)
+
+    if args.resume:
+        assert os.path.exists(args.resume), \
+                "Given resume weight dir {} not exist.".format(args.resume)
+        def if_exist(var):
+            logger.debug("{}: {}".format(var.name, os.path.exists(os.path.join(args.resume, var.name))))
+            return os.path.exists(os.path.join(args.resume, var.name))
+        fluid.io.load_vars(
+            exe, args.resume, predicate=if_exist, main_program=train_prog)
+
+    build_strategy = fluid.BuildStrategy()
+    build_strategy.memory_optimize = False
+    build_strategy.enable_inplace = False
+    build_strategy.fuse_all_optimizer_ops = False
+    train_compile_prog = fluid.compiler.CompiledProgram(
+            train_prog).with_data_parallel(loss_name=train_loss.name,
+                    build_strategy=build_strategy)
+
+    def save_model(exe, prog, path):
+        if os.path.isdir(path):
+            shutil.rmtree(path)
+        logger.info("Save model to {}".format(path))
+        fluid.io.save_persistables(exe, path, prog)
+
+    # get reader
+    train_reader = kitti_rcnn_reader.get_multiprocess_reader(args.batch_size, train_feeds, drop_last=True)
+    train_pyreader.decorate_sample_list_generator(train_reader, place)
+
+    train_stat = Stat()
+    for epoch_id in range(args.resume_epoch, args.epoch):
+        try:
+            train_pyreader.start()
+            train_iter = 0
+            train_periods = []
+            while True:
+                cur_time = time.time()
+                train_outs = exe.run(train_compile_prog, fetch_list=train_values + [lr.name])
+                period = time.time() - cur_time
+                train_periods.append(period)
+                train_stat.update(train_keys, train_outs[:-1])
+                if train_iter % args.log_interval == 0:
+                    log_str = ""
+                    for name, values in zip(train_keys + ['learning_rate'], train_outs):
+                        log_str += "{}: {:.6f}, ".format(name, np.mean(values))
+                    logger.info("[TRAIN] Epoch {}, batch {}: {}time: {:.2f}".format(epoch_id, train_iter, log_str, period))
+                train_iter += 1
+        except fluid.core.EOFException:
+            logger.info("[TRAIN] Epoch {} finished, {}average time: {:.2f}".format(epoch_id, train_stat.get_mean_log(), np.mean(train_periods[2:])))
+            save_model(exe, train_prog, os.path.join(checkpoints_dir, str(epoch_id)))
+            train_stat.reset()
+            train_periods = []
+        finally:
+            train_pyreader.reset()
+
+
+if __name__ == "__main__":
+    train()
--- a/PaddleCV/Paddle3D/PointRCNN/utils/__init__.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/__init__.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
--- a/PaddleCV/Paddle3D/PointRCNN/utils/box_utils.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/box_utils.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+"""
+Contains proposal functions
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import paddle.fluid as fluid
+
+from utils.config import cfg
+
+__all__ = ["boxes3d_to_bev", "box_overlap_rotate", "boxes3d_to_bev", "box_iou", "box_nms"]
+
+
+def boxes3d_to_bev(boxes3d):
+    """
+    Args:
+        boxes3d: [N, 7], (x, y, z, h, w, l, ry)
+    Return:
+        boxes_bev: [N, 5], (x1, y1, x2, y2, ry)
+    """
+    boxes_bev = np.zeros((boxes3d.shape[0], 5), dtype='float32')
+
+    cu, cv = boxes3d[:, 0], boxes3d[:, 2]
+    half_l, half_w = boxes3d[:, 5] / 2, boxes3d[:, 4] / 2
+    boxes_bev[:, 0], boxes_bev[:, 1] = cu - half_l, cv - half_w
+    boxes_bev[:, 2], boxes_bev[:, 3] = cu + half_l, cv + half_w
+    boxes_bev[:, 4] = boxes3d[:, 6]
+    return boxes_bev
+
+
+def rotate_around_center(center, angle_cos, angle_sin, corners):
+    new_x = (corners[:, 0] - center[0]) * angle_cos + \
+            (corners[:, 1] - center[1]) * angle_sin + center[0]
+    new_y = -(corners[:, 0] - center[0]) * angle_sin + \
+            (corners[:, 1] - center[1]) * angle_cos + center[1]
+    return np.concatenate([new_x[:, np.newaxis], new_y[:, np.newaxis]], axis=-1)
+
+
+def check_rect_cross(p1, p2, q1, q2):
+    return min(p1[0], p2[0]) <= max(q1[0], q2[0]) and \
+           min(q1[0], q2[0]) <= max(p1[0], p2[0]) and \
+           min(p1[1], p2[1]) <= max(q1[1], q2[1]) and \
+           min(q1[1], q2[1]) <= max(p1[1], p2[1])
+
+
+def cross(p1, p2, p0):
+    return (p1[0] - p0[0]) * (p2[1] - p0[1]) - (p2[0] - p0[0]) * (p1[1] - p0[1]);
+
+
+def cross_area(a, b):
+    return a[0] * b[1] - a[1] * b[0]
+
+
+def intersection(p1, p0, q1, q0):
+    if not check_rect_cross(p1, p0, q1, q0):
+        return None
+
+    s1 = cross(q0, p1, p0)
+    s2 = cross(p1, q1, p0)
+    s3 = cross(p0, q1, q0)
+    s4 = cross(q1, p1, q0)
+    if not (s1 * s2 > 0 and s3 * s4 > 0):
+        return None
+
+    s5 = cross(q1, p1, p0)
+    if np.abs(s5 - s1) > 1e-8:
+        return np.array([(s5 * q0[0] - s1 * q1[0]) / (s5 - s1),
+                (s5 * q0[1] - s1 * q1[1]) / (s5 - s1)], dtype='float32')
+    else:
+        a0 = p0[1] - p1[1]
+        b0 = p1[0] - p0[0]
+        c0 = p0[0] * p1[1] - p1[0] * p0[1]
+        a0 = q0[1] - q1[1]
+        b0 = q1[0] - q0[0]
+        c0 = q0[0] * q1[1] - q1[0] * q0[1]
+        D = a0 * b1 - a1 * b0
+        return np.array([(b0 * c1 - b1 * c0) / D, (a1 * c0 - a0 * c1) / D], dtype='float32')
+
+
+def check_in_box2d(box, p):
+    center_x = (box[0] + box[2]) / 2.
+    center_y = (box[1] + box[3]) / 2.
+    angle_cos = np.cos(-box[4])
+    angle_sin = np.sin(-box[4])
+    rot_x = (p[0] - center_x) * angle_cos + (p[1] - center_y) * angle_sin + center_x
+    rot_y = -(p[0] - center_x) * angle_sin + (p[1] - center_y) * angle_cos + center_y
+    return rot_x > box[0] - 1e-5 and rot_x < box[2] + 1e-5 and \
+            rot_y > box[1] - 1e-5 and rot_y < box[3] + 1e-5
+
+
+def point_cmp(a, b, center):
+    return np.arctan2(a[1] - center[1], a[0] - center[0]) > \
+            np.arctan2(b[1] - center[1], b[0] - center[0])
+
+
+def box_overlap_rotate(cur_box, boxes):
+    """
+    Calculate box overlap with rotate, box: [x1, y1, x2, y2, angle]
+    """
+    areas = np.zeros((len(boxes), ), dtype='float32')
+    cur_center = [(cur_box[0] + cur_box[2]) / 2., (cur_box[1] + cur_box[3]) / 2.]
+    cur_corners = np.array([
+            [cur_box[0], cur_box[1]], # (x1, y1)
+            [cur_box[2], cur_box[1]], # (x2, y1)
+            [cur_box[2], cur_box[3]], # (x2, y2)
+            [cur_box[0], cur_box[3]], # (x1, y2)
+            [cur_box[0], cur_box[1]], # (x1, y1)
+            ], dtype='float32')
+    cur_angle_cos = np.cos(cur_box[4])
+    cur_angle_sin = np.sin(cur_box[4])
+    cur_corners = rotate_around_center(cur_center, cur_angle_cos, cur_angle_sin, cur_corners)
+
+    for i, box in enumerate(boxes):
+        box_center = [(box[0] + box[2]) / 2., (box[1] + box[3]) / 2.]
+        box_corners = np.array([
+                [box[0], box[1]],
+                [box[2], box[1]],
+                [box[2], box[3]],
+                [box[0], box[3]],
+                [box[0], box[1]],
+                ], dtype='float32')
+        box_angle_cos = np.cos(box[4])
+        box_angle_sin = np.sin(box[4])
+        box_corners = rotate_around_center(box_center, box_angle_cos, box_angle_sin, box_corners)
+
+        cross_points = np.zeros((16, 2), dtype='float32')
+        cnt = 0
+        # get intersection of lines
+        for j in range(4):
+            for k in range(4):
+                inters = intersection(cur_corners[j + 1], cur_corners[j],
+                                      box_corners[k + 1], box_corners[k])
+                if inters is not None:
+                    cross_points[cnt, :] = inters
+                    cnt += 1
+        # check corners
+        for l in range(4):
+            if check_in_box2d(cur_box, box_corners[l]):
+                cross_points[cnt, :] = box_corners[l]
+                cnt += 1
+            if check_in_box2d(box, cur_corners[l]):
+                cross_points[cnt, :] = cur_corners[l]
+                cnt += 1
+
+        if cnt > 0:
+            poly_center = np.sum(cross_points[:cnt, :], axis=0) / cnt
+        else:
+            poly_center = np.zeros((2,))
+
+        # sort the points of polygon
+        for j in range(cnt - 1):
+            for k in range(cnt - j - 1):
+                if point_cmp(cross_points[k], cross_points[k + 1], poly_center):
+                    cross_points[k], cross_points[k + 1] = \
+                            cross_points[k + 1].copy(), cross_points[k].copy()
+
+        # get the overlap areas
+        area = 0.
+        for j in range(cnt - 1):
+            area += cross_area(cross_points[j] - cross_points[0],
+                               cross_points[j + 1] - cross_points[0])
+        areas[i] = np.abs(area) / 2.
+    
+    return areas
+
+
+def box_iou(cur_box, boxes, box_type='normal'):
+    cur_S = (cur_box[2] - cur_box[0]) * (cur_box[3] - cur_box[1])
+    boxes_S = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
+    
+    if box_type == 'normal':
+        inter_x1 = np.maximum(cur_box[0], boxes[:, 0])
+        inter_y1 = np.maximum(cur_box[1], boxes[:, 1])
+        inter_x2 = np.minimum(cur_box[2], boxes[:, 2])
+        inter_y2 = np.minimum(cur_box[3], boxes[:, 3])
+        inter_w = np.maximum(inter_x2 - inter_x1, 0.)
+        inter_h = np.maximum(inter_y2 - inter_y1, 0.)
+        inter_area = inter_w * inter_h
+    elif box_type == 'rotate':
+        inter_area = box_overlap_rotate(cur_box, boxes)
+    else:
+        raise NotImplementedError
+
+    return inter_area / np.maximum(cur_S + boxes_S - inter_area, 1e-8)
+
+
+def box_nms(boxes, scores, proposals, thresh, topk, nms_type='normal'):
+    assert nms_type in ['normal', 'rotate'], \
+            "unknown nms type {}".format(nms_type)
+    order = np.argsort(-scores)
+    boxes = boxes[order]
+    scores = scores[order]
+    proposals = proposals[order]
+
+    nmsed_scores = []
+    nmsed_proposals = []
+    cnt = 0
+    while boxes.shape[0]:
+        nmsed_scores.append(scores[0])
+        nmsed_proposals.append(proposals[0])
+        cnt +=1
+        if cnt >= topk or boxes.shape[0] == 1:
+            break
+        iou = box_iou(boxes[0], boxes[1:], nms_type)
+        boxes = boxes[1:][iou < thresh]
+        scores = scores[1:][iou < thresh]
+        proposals = proposals[1:][iou < thresh]
+    return nmsed_scores, nmsed_proposals
+
+
+def box_nms_eval(boxes, scores, proposals, thresh, nms_type='rotate'):
+    assert nms_type in ['normal', 'rotate'], \
+            "unknown nms type {}".format(nms_type)
+    order = np.argsort(-scores)
+    boxes = boxes[order]
+    scores = scores[order]
+    proposals = proposals[order]
+
+    nmsed_scores = []
+    nmsed_proposals = []
+    while boxes.shape[0]:
+        nmsed_scores.append(scores[0])
+        nmsed_proposals.append(proposals[0])
+        iou = box_iou(boxes[0], boxes[1:], nms_type)
+        inds = iou < thresh
+        boxes = boxes[1:][inds]
+        scores = scores[1:][inds]
+        proposals = proposals[1:][inds]
+    nmsed_scores = np.asarray(nmsed_scores)
+    nmsed_proposals = np.asarray(nmsed_proposals)
+    return nmsed_scores, nmsed_proposals 
+
+def boxes_iou3d(boxes1, boxes2):
+    boxes1_bev = boxes3d_to_bev(boxes1)
+    boxes2_bev = boxes3d_to_bev(boxes2)
+
+    # bev overlap
+    overlaps_bev = np.zeros((boxes1_bev.shape[0], boxes2_bev.shape[0]))
+    for i in range(boxes1_bev.shape[0]):
+        overlaps_bev[i, :] = box_overlap_rotate(boxes1_bev[i], boxes2_bev)
+
+    # height overlap
+    boxes1_height_min = (boxes1[:, 1] - boxes1[:, 3]).reshape(-1, 1)
+    boxes1_height_max = boxes1[:, 1].reshape(-1, 1)
+    boxes2_height_min = (boxes2[:, 1] - boxes2[:, 3]).reshape(1, -1)
+    boxes2_height_max = boxes2[:, 1].reshape(1, -1)
+
+    max_of_min = np.maximum(boxes1_height_min, boxes2_height_min)
+    min_of_max = np.minimum(boxes1_height_max, boxes2_height_max)
+    overlaps_h = np.maximum(min_of_max - max_of_min, 0.)
+
+    # 3d iou
+    overlaps_3d = overlaps_bev * overlaps_h
+
+    vol_a = (boxes1[:, 3] * boxes1[:, 4] * boxes1[:, 5]).reshape(-1, 1)
+    vol_b = (boxes2[:, 3] * boxes2[:, 4] * boxes2[:, 5]).reshape(1, -1)
+    iou3d = overlaps_3d / np.maximum(vol_a + vol_b - overlaps_3d, 1e-7)
+
+    return iou3d
--- a/PaddleCV/Paddle3D/PointRCNN/utils/calibration.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/calibration.py
+"""
+This code is borrow from https://github.com/sshaoshuai/PointRCNN/blob/master/lib/utils/kitti_utils.py
+"""
+import numpy as np
+import os
+
+
+def get_calib_from_file(calib_file):
+    with open(calib_file) as f:
+        lines = f.readlines()
+
+    obj = lines[2].strip().split(' ')[1:]
+    P2 = np.array(obj, dtype=np.float32)
+    obj = lines[3].strip().split(' ')[1:]
+    P3 = np.array(obj, dtype=np.float32)
+    obj = lines[4].strip().split(' ')[1:]
+    R0 = np.array(obj, dtype=np.float32)
+    obj = lines[5].strip().split(' ')[1:]
+    Tr_velo_to_cam = np.array(obj, dtype=np.float32)
+
+    return {'P2': P2.reshape(3, 4),
+            'P3': P3.reshape(3, 4),
+            'R0': R0.reshape(3, 3),
+            'Tr_velo2cam': Tr_velo_to_cam.reshape(3, 4)}
+
+
+class Calibration(object):
+    def __init__(self, calib_file):
+        if isinstance(calib_file, str):
+            calib = get_calib_from_file(calib_file)
+        else:
+            calib = calib_file
+
+        self.P2 = calib['P2']  # 3 x 4
+        self.R0 = calib['R0']  # 3 x 3
+        self.V2C = calib['Tr_velo2cam']  # 3 x 4
+
+        # Camera intrinsics and extrinsics
+        self.cu = self.P2[0, 2]
+        self.cv = self.P2[1, 2]
+        self.fu = self.P2[0, 0]
+        self.fv = self.P2[1, 1]
+        self.tx = self.P2[0, 3] / (-self.fu)
+        self.ty = self.P2[1, 3] / (-self.fv)
+
+    def cart_to_hom(self, pts):
+        """
+        :param pts: (N, 3 or 2)
+        :return pts_hom: (N, 4 or 3)
+        """
+        pts_hom = np.hstack((pts, np.ones((pts.shape[0], 1), dtype=np.float32)))
+        return pts_hom
+
+    def lidar_to_rect(self, pts_lidar):
+        """
+        :param pts_lidar: (N, 3)
+        :return pts_rect: (N, 3)
+        """
+        pts_lidar_hom = self.cart_to_hom(pts_lidar)
+        pts_rect = np.dot(pts_lidar_hom, np.dot(self.V2C.T, self.R0.T))
+        # pts_rect = reduce(np.dot, (pts_lidar_hom, self.V2C.T, self.R0.T))
+        return pts_rect
+
+    def rect_to_img(self, pts_rect):
+        """
+        :param pts_rect: (N, 3)
+        :return pts_img: (N, 2)
+        """
+        pts_rect_hom = self.cart_to_hom(pts_rect)
+        pts_2d_hom = np.dot(pts_rect_hom, self.P2.T)
+        pts_img = (pts_2d_hom[:, 0:2].T / pts_rect_hom[:, 2]).T  # (N, 2)
+        pts_rect_depth = pts_2d_hom[:, 2] - self.P2.T[3, 2]  # depth in rect camera coord
+        return pts_img, pts_rect_depth
+
+    def lidar_to_img(self, pts_lidar):
+        """
+        :param pts_lidar: (N, 3)
+        :return pts_img: (N, 2)
+        """
+        pts_rect = self.lidar_to_rect(pts_lidar)
+        pts_img, pts_depth = self.rect_to_img(pts_rect)
+        return pts_img, pts_depth
+
+    def img_to_rect(self, u, v, depth_rect):
+        """
+        :param u: (N)
+        :param v: (N)
+        :param depth_rect: (N)
+        :return:
+        """
+        x = ((u - self.cu) * depth_rect) / self.fu + self.tx
+        y = ((v - self.cv) * depth_rect) / self.fv + self.ty
+        pts_rect = np.concatenate((x.reshape(-1, 1), y.reshape(-1, 1), depth_rect.reshape(-1, 1)), axis=1)
+        return pts_rect
+
+    def depthmap_to_rect(self, depth_map):
+        """
+        :param depth_map: (H, W), depth_map
+        :return:
+        """
+        x_range = np.arange(0, depth_map.shape[1])
+        y_range = np.arange(0, depth_map.shape[0])
+        x_idxs, y_idxs = np.meshgrid(x_range, y_range)
+        x_idxs, y_idxs = x_idxs.reshape(-1), y_idxs.reshape(-1)
+        depth = depth_map[y_idxs, x_idxs]
+        pts_rect = self.img_to_rect(x_idxs, y_idxs, depth)
+        return pts_rect, x_idxs, y_idxs
+
+    def corners3d_to_img_boxes(self, corners3d):
+        """
+        :param corners3d: (N, 8, 3) corners in rect coordinate
+        :return: boxes: (None, 4) [x1, y1, x2, y2] in rgb coordinate
+        :return: boxes_corner: (None, 8) [xi, yi] in rgb coordinate
+        """
+        sample_num = corners3d.shape[0]
+        corners3d_hom = np.concatenate((corners3d, np.ones((sample_num, 8, 1))), axis=2)  # (N, 8, 4)
+
+        img_pts = np.matmul(corners3d_hom, self.P2.T)  # (N, 8, 3)
+
+        x, y = img_pts[:, :, 0] / img_pts[:, :, 2], img_pts[:, :, 1] / img_pts[:, :, 2]
+        x1, y1 = np.min(x, axis=1), np.min(y, axis=1)
+        x2, y2 = np.max(x, axis=1), np.max(y, axis=1)
+
+        boxes = np.concatenate((x1.reshape(-1, 1), y1.reshape(-1, 1), x2.reshape(-1, 1), y2.reshape(-1, 1)), axis=1)
+        boxes_corner = np.concatenate((x.reshape(-1, 8, 1), y.reshape(-1, 8, 1)), axis=2)
+
+        return boxes, boxes_corner
+
+    def camera_dis_to_rect(self, u, v, d):
+        """
+        Can only process valid u, v, d, which means u, v can not beyond the image shape, reprojection error 0.02
+        :param u: (N)
+        :param v: (N)
+        :param d: (N), the distance between camera and 3d points, d^2 = x^2 + y^2 + z^2
+        :return:
+        """
+        assert self.fu == self.fv, '%.8f != %.8f' % (self.fu, self.fv)
+        fd = np.sqrt((u - self.cu)**2 + (v - self.cv)**2 + self.fu**2)
+        x = ((u - self.cu) * d) / fd + self.tx
+        y = ((v - self.cv) * d) / fd + self.ty
+        z = np.sqrt(d**2 - x**2 - y**2)
+        pts_rect = np.concatenate((x.reshape(-1, 1), y.reshape(-1, 1), z.reshape(-1, 1)), axis=1)
+        return pts_rect
--- a/PaddleCV/Paddle3D/PointRCNN/utils/config.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/config.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+"""
+This code is bases on https://github.com/sshaoshuai/PointRCNN/blob/master/lib/config.py
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import yaml
+import numpy as np
+from ast import literal_eval
+
+__all__ = ["load_config", "cfg"]
+
+
+class AttrDict(dict):
+    def __init__(self, *args, **kwargs):
+        for arg in args:
+            for k, v in arg.items():
+                if isinstance(v, dict):
+                    arg[k] = AttrDict(v)
+                else:
+                    arg[k] = v
+        super(AttrDict, self).__init__(*args, **kwargs)
+
+    def __getattr__(self, name):
+        if name in self.__dict__:
+            return self.__dict__[name]
+        elif name in self:
+            return self[name]
+        else:
+            raise AttributeError(name)
+
+    def __setattr__(self, name, value):
+        if name in self.__dict__:
+            self.__dict__[name] = value
+        else:
+            self[name] = value
+
+
+__C = AttrDict()
+cfg = __C
+
+# 0. basic config
+__C.TAG = 'default'
+__C.CLASSES = 'Car'
+
+__C.INCLUDE_SIMILAR_TYPE = False
+
+# config of augmentation
+__C.AUG_DATA = True
+__C.AUG_METHOD_LIST = ['rotation', 'scaling', 'flip']
+__C.AUG_METHOD_PROB = [0.5, 0.5, 0.5]
+__C.AUG_ROT_RANGE = 18
+
+__C.GT_AUG_ENABLED = False
+__C.GT_EXTRA_NUM = 15
+__C.GT_AUG_RAND_NUM = False
+__C.GT_AUG_APPLY_PROB = 0.75
+__C.GT_AUG_HARD_RATIO = 0.6
+
+__C.PC_REDUCE_BY_RANGE = True
+__C.PC_AREA_SCOPE = np.array([[-40, 40],
+                              [-1,   3],
+                              [0, 70.4]])  # x, y, z scope in rect camera coords
+
+__C.CLS_MEAN_SIZE = np.array([[1.52, 1.63, 3.88]], dtype=np.float32)
+
+
+# 1. config of rpn network
+__C.RPN = AttrDict()
+__C.RPN.ENABLED = True
+__C.RPN.FIXED = False
+
+__C.RPN.USE_INTENSITY = True
+
+# config of bin-based loss
+__C.RPN.LOC_XZ_FINE = False
+__C.RPN.LOC_SCOPE = 3.0
+__C.RPN.LOC_BIN_SIZE = 0.5
+__C.RPN.NUM_HEAD_BIN = 12
+
+# config of network structure
+__C.RPN.BACKBONE = 'pointnet2_msg'
+
+__C.RPN.USE_BN = True
+__C.RPN.NUM_POINTS = 16384
+
+__C.RPN.SA_CONFIG = AttrDict()
+__C.RPN.SA_CONFIG.NPOINTS = [4096, 1024, 256, 64]
+__C.RPN.SA_CONFIG.RADIUS = [[0.1, 0.5], [0.5, 1.0], [1.0, 2.0], [2.0, 4.0]]
+__C.RPN.SA_CONFIG.NSAMPLE = [[16, 32], [16, 32], [16, 32], [16, 32]]
+__C.RPN.SA_CONFIG.MLPS = [[[16, 16, 32], [32, 32, 64]],
+                          [[64, 64, 128], [64, 96, 128]],
+                          [[128, 196, 256], [128, 196, 256]],
+                          [[256, 256, 512], [256, 384, 512]]]
+__C.RPN.FP_MLPS = [[128, 128], [256, 256], [512, 512], [512, 512]]
+__C.RPN.CLS_FC = [128]
+__C.RPN.REG_FC = [128]
+__C.RPN.DP_RATIO = 0.5
+
+# config of training
+__C.RPN.LOSS_CLS = 'DiceLoss'
+__C.RPN.FG_WEIGHT = 15
+__C.RPN.FOCAL_ALPHA = [0.25, 0.75]
+__C.RPN.FOCAL_GAMMA = 2.0
+__C.RPN.REG_LOSS_WEIGHT = [1.0, 1.0, 1.0, 1.0]
+__C.RPN.LOSS_WEIGHT = [1.0, 1.0]
+__C.RPN.NMS_TYPE = 'normal'  # normal, rotate
+
+# config of testing
+__C.RPN.SCORE_THRESH = 0.3
+
+
+# 2. config of rcnn network
+__C.RCNN = AttrDict()
+__C.RCNN.ENABLED = False
+
+# config of input
+__C.RCNN.USE_RPN_FEATURES = True
+__C.RCNN.USE_MASK = True
+__C.RCNN.MASK_TYPE = 'seg'
+__C.RCNN.USE_INTENSITY = False
+__C.RCNN.USE_DEPTH = True
+__C.RCNN.USE_SEG_SCORE = False
+__C.RCNN.ROI_SAMPLE_JIT = False
+__C.RCNN.ROI_FG_AUG_TIMES = 10
+
+__C.RCNN.REG_AUG_METHOD = 'multiple'  # multiple, single, normal
+__C.RCNN.POOL_EXTRA_WIDTH = 1.0
+
+# config of bin-based loss
+__C.RCNN.LOC_SCOPE = 1.5
+__C.RCNN.LOC_BIN_SIZE = 0.5
+__C.RCNN.NUM_HEAD_BIN = 9
+__C.RCNN.LOC_Y_BY_BIN = False
+__C.RCNN.LOC_Y_SCOPE = 0.5
+__C.RCNN.LOC_Y_BIN_SIZE = 0.25
+__C.RCNN.SIZE_RES_ON_ROI = False
+
+# config of network structure
+__C.RCNN.USE_BN = False
+__C.RCNN.DP_RATIO = 0.0
+
+__C.RCNN.BACKBONE = 'pointnet'  # pointnet, pointsift
+__C.RCNN.XYZ_UP_LAYER = [128, 128]
+
+__C.RCNN.NUM_POINTS = 512
+__C.RCNN.SA_CONFIG = AttrDict()
+__C.RCNN.SA_CONFIG.NPOINTS = [128, 32, -1]
+__C.RCNN.SA_CONFIG.RADIUS = [0.2, 0.4, 100]
+__C.RCNN.SA_CONFIG.NSAMPLE = [64, 64, 64]
+__C.RCNN.SA_CONFIG.MLPS = [[128, 128, 128],
+                           [128, 128, 256],
+                           [256, 256, 512]]
+__C.RCNN.CLS_FC = [256, 256]
+__C.RCNN.REG_FC = [256, 256]
+
+# config of training
+__C.RCNN.LOSS_CLS = 'BinaryCrossEntropy'
+__C.RCNN.FOCAL_ALPHA = [0.25, 0.75]
+__C.RCNN.FOCAL_GAMMA = 2.0
+__C.RCNN.CLS_WEIGHT = np.array([1.0, 1.0, 1.0], dtype=np.float32)
+__C.RCNN.CLS_FG_THRESH = 0.6
+__C.RCNN.CLS_BG_THRESH = 0.45
+__C.RCNN.CLS_BG_THRESH_LO = 0.05
+__C.RCNN.REG_FG_THRESH = 0.55
+__C.RCNN.FG_RATIO = 0.5
+__C.RCNN.ROI_PER_IMAGE = 64
+__C.RCNN.HARD_BG_RATIO = 0.6
+
+# config of testing
+__C.RCNN.SCORE_THRESH = 0.3
+__C.RCNN.NMS_THRESH = 0.1
+
+
+# general training config
+__C.TRAIN = AttrDict()
+__C.TRAIN.SPLIT = 'train'
+__C.TRAIN.VAL_SPLIT = 'smallval'
+
+__C.TRAIN.LR = 0.002
+__C.TRAIN.LR_CLIP = 0.00001
+__C.TRAIN.LR_DECAY = 0.5
+__C.TRAIN.DECAY_STEP_LIST = [50, 100, 150, 200, 250, 300]
+__C.TRAIN.LR_WARMUP = False
+__C.TRAIN.WARMUP_MIN = 0.0002
+__C.TRAIN.WARMUP_EPOCH = 5
+
+__C.TRAIN.BN_MOMENTUM = 0.9
+__C.TRAIN.BN_DECAY = 0.5
+__C.TRAIN.BNM_CLIP = 0.01
+__C.TRAIN.BN_DECAY_STEP_LIST = [50, 100, 150, 200, 250, 300]
+
+__C.TRAIN.OPTIMIZER = 'adam'
+__C.TRAIN.WEIGHT_DECAY = 0.0  # "L2 regularization coeff [default: 0.0]"
+__C.TRAIN.MOMENTUM = 0.9
+
+__C.TRAIN.MOMS = [0.95, 0.85]
+__C.TRAIN.DIV_FACTOR = 10.0
+__C.TRAIN.PCT_START = 0.4
+
+__C.TRAIN.GRAD_NORM_CLIP = 1.0
+
+__C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
+__C.TRAIN.RPN_POST_NMS_TOP_N = 2048
+__C.TRAIN.RPN_NMS_THRESH = 0.85
+__C.TRAIN.RPN_DISTANCE_BASED_PROPOSE = True
+
+
+__C.TEST = AttrDict()
+__C.TEST.SPLIT = 'val'
+__C.TEST.RPN_PRE_NMS_TOP_N = 9000
+__C.TEST.RPN_POST_NMS_TOP_N = 300
+__C.TEST.RPN_NMS_THRESH = 0.7
+__C.TEST.RPN_DISTANCE_BASED_PROPOSE = True
+
+
+def load_config(fname):
+    """
+    Load config from yaml file and merge into global cfg
+    """
+    with open(fname) as f:
+        yml_cfg = AttrDict(yaml.load(f.read(), Loader=yaml.Loader))
+    _merge_cfg_a_to_b(yml_cfg, __C)
+
+
+def set_config_from_list(cfg_list):
+    assert len(cfg_list) % 2 == 0, "cfgs list length invalid"
+    for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
+        key_list = k.split('.')
+        d = __C
+        for subkey in key_list[:-1]:
+            assert subkey in d
+            d = d[subkey]
+        subkey = key_list[-1]
+        assert subkey in d
+        try:
+            value = literal_eval(v)
+        except:
+            # handle the case when v is a string literal
+            value = v
+        assert type(value) == type(d[subkey]), \
+            'type {} does not match original type {}'.format(type(value), type(d[subkey]))
+        d[subkey] = value
+
+
+def _merge_cfg_a_to_b(a, b):
+    assert isinstance(a, AttrDict), \
+            "unknown type {}".format(type(a))
+
+    for k, v in a.items():
+        assert k in b, "unknown key {}".format(k)
+        if type(v) is not type(b[k]):
+            if isinstance(b[k], np.ndarray):
+                b[k] = np.array(v, dtype=b[k].dtype)
+            else:
+                raise TypeError("Config type mismatch")
+        if isinstance(v, AttrDict):
+            _merge_cfg_a_to_b(v, b[k])
+        else:
+            b[k] = v
+
+
+if __name__ == "__main__":
+    load_config("./cfgs/default.yml")
--- a/PaddleCV/Paddle3D/PointRCNN/utils/cyops/__init__.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/cyops/__init__.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+
--- a/PaddleCV/Paddle3D/PointRCNN/utils/cyops/iou3d_utils.pyx
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/cyops/iou3d_utils.pyx
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import cython 
+from math import pi, cos, sin
+import numpy as np 
+cimport numpy as np 
+
+
+cdef class Point:
+    cdef float x, y 
+    def __cinit__(self, x, y):
+        self.x = x
+        self.y = y
+
+    def __add__(self, v):
+        if not isinstance(v, Point):
+            return NotImplemented
+        return Point(self.x + v.x, self.y + v.y)
+
+    def __sub__(self, v):
+        if not isinstance(v, Point):
+            return NotImplemented
+        return Point(self.x - v.x, self.y - v.y)
+
+    def cross(self, v):
+        if not isinstance(v, Point):
+            return NotImplemented
+        return self.x*v.y - self.y*v.x
+
+
+cdef class Line:
+    cdef float a, b, c 
+    # ax + by + c = 0
+    def __cinit__(self, v1, v2):
+        self.a = v2.y - v1.y
+        self.b = v1.x - v2.x
+        self.c = v2.cross(v1)
+
+    def __call__(self, p):
+        return self.a*p.x + self.b*p.y + self.c
+
+    def intersection(self, other):
+        if not isinstance(other, Line):
+            return NotImplemented
+        w = self.a*other.b - self.b*other.a
+        return Point(
+            (self.b*other.c - self.c*other.b)/w,
+            (self.c*other.a - self.a*other.c)/w
+        )
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def rectangle_vertices_(x1, y1, x2, y2, r):
+    
+    cx = (x1 + x2) / 2
+    cy = (y1 + y2) / 2
+    angle = r 
+    cr = cos(angle)
+    sr = sin(angle)
+    # rotate around center
+    return (
+        Point(
+            x=(x1-cx)*cr+(y1-cy)*sr+cx,
+            y=-(x1-cx)*sr+(y1-cy)*cr+cy
+        ),
+        Point(
+            x=(x2-cx)*cr+(y1-cy)*sr+cx,
+            y=-(x2-cx)*sr+(y1-cy)*cr+cy
+        ),
+        Point(
+            x=(x2-cx)*cr+(y2-cy)*sr+cx,
+            y=-(x2-cx)*sr+(y2-cy)*cr+cy
+        ),
+        Point(
+            x=(x1-cx)*cr+(y2-cy)*sr+cx,
+            y=-(x1-cx)*sr+(y2-cy)*cr+cy
+        )
+    )
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def intersection_area(r1, r2):
+    # r1 and r2 are in (center, width, height, rotation) representation
+    # First convert these into a sequence of vertices
+
+    rect1 = rectangle_vertices_(*r1)
+    rect2 = rectangle_vertices_(*r2)
+
+    # Use the vertices of the first rectangle as
+    # starting vertices of the intersection polygon.
+    intersection = rect1
+
+    # Loop over the edges of the second rectangle
+    for p, q in zip(rect2, rect2[1:] + rect2[:1]):
+        if len(intersection) <= 2:
+            break # No intersection
+
+        line = Line(p, q)
+
+        # Any point p with line(p) <= 0 is on the "inside" (or on the boundary),
+        # any point p with line(p) > 0 is on the "outside".
+
+        # Loop over the edges of the intersection polygon,
+        # and determine which part is inside and which is outside.
+        new_intersection = []
+        line_values = [line(t) for t in intersection]
+        for s, t, s_value, t_value in zip(
+            intersection, intersection[1:] + intersection[:1],
+            line_values, line_values[1:] + line_values[:1]):
+            if s_value <= 0:
+                new_intersection.append(s)
+            if s_value * t_value < 0:
+                # Points are on opposite sides.
+                # Add the intersection of the lines to new_intersection.
+                intersection_point = line.intersection(Line(s, t))
+                new_intersection.append(intersection_point)
+
+        intersection = new_intersection
+
+    # Calculate area
+    if len(intersection) <= 2:
+        return 0
+
+    return 0.5 * sum(p.x*q.y - p.y*q.x for p, q in zip(intersection, intersection[1:] + intersection[:1]))
+
+
+def boxes3d_to_bev_(boxes3d):
+    """
+    Args:
+        boxes3d: [N, 7], (x, y, z, h, w, l, ry)
+    Return:
+        boxes_bev: [N, 5], (x1, y1, x2, y2, ry)
+    """
+    boxes_bev = np.zeros((boxes3d.shape[0], 5), dtype='float32')
+    cu, cv = boxes3d[:, 0], boxes3d[:, 2]
+    half_l, half_w = boxes3d[:, 5] / 2, boxes3d[:, 4] / 2
+    boxes_bev[:, 0], boxes_bev[:, 1] = cu - half_l, cv - half_w
+    boxes_bev[:, 2], boxes_bev[:, 3] = cu + half_l, cv + half_w
+    boxes_bev[:, 4] = boxes3d[:, 6]
+    return boxes_bev
+
+
+def boxes_iou3d(boxes_a, boxes_b):
+    """
+    :param boxes_a: (N, 7) [x, y, z, h, w, l, ry]
+    :param boxes_b: (M, 7) [x, y, z, h, w, l, ry]
+    :return:
+        ans_iou: (M, N)
+    """
+    boxes_a_bev = boxes3d_to_bev_(boxes_a)
+    boxes_b_bev = boxes3d_to_bev_(boxes_b)
+    # bev overlap
+    num_a = boxes_a_bev.shape[0]
+    num_b = boxes_b_bev.shape[0]
+    overlaps_bev = np.zeros((num_a, num_b), dtype=np.float32)
+    for i in range(num_a):
+        for j in range(num_b):
+            overlaps_bev[i][j] = intersection_area(boxes_a_bev[i], boxes_b_bev[j])
+
+    # height overlap
+    boxes_a_height_min = (boxes_a[:, 1] - boxes_a[:, 3]).reshape(-1, 1)
+    boxes_a_height_max = boxes_a[:, 1].reshape(-1, 1)
+    boxes_b_height_min = (boxes_b[:, 1] - boxes_b[:, 3]).reshape(1, -1)
+    boxes_b_height_max = boxes_b[:, 1].reshape(1, -1)
+
+    max_of_min = np.maximum(boxes_a_height_min, boxes_b_height_min)
+    min_of_max = np.minimum(boxes_a_height_max, boxes_b_height_max)
+    overlaps_h = np.clip(min_of_max - max_of_min, a_min=0, a_max=np.inf)
+    # 3d iou
+    overlaps_3d = overlaps_bev * overlaps_h
+
+    vol_a = (boxes_a[:, 3] * boxes_a[:, 4] * boxes_a[:, 5]).reshape(-1, 1)
+    vol_b = (boxes_b[:, 3] * boxes_b[:, 4] * boxes_b[:, 5]).reshape(1, -1)
+
+    iou3d = overlaps_3d / np.clip(vol_a + vol_b - overlaps_3d, a_min=1e-7, a_max=np.inf)
+    return iou3d
+
+#if __name__ == '__main__':
+#    # (center, width, height, rotation)
+#    r1 = (10, 15, 15, 10, 30)
+#    r2 = (15, 15, 20, 10, 0)
+#    print(intersection_area(r1, r2))
--- a/PaddleCV/Paddle3D/PointRCNN/utils/cyops/kitti_utils.pyx
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/cyops/kitti_utils.pyx
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import cython
+import numpy as np
+cimport numpy as np
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def pts_in_boxes3d(np.ndarray pts_rect, np.ndarray boxes3d):
+    """
+    :param pts: (N, 3) in rect-camera coords
+    :param boxes3d: (M, 7)
+    :return: boxes_pts_mask_list: (M), list with [(N), (N), ..]
+    """
+    cdef float MAX_DIS = 10.0
+    cdef np.ndarray boxes_pts_mask_list = np.zeros((boxes3d.shape[0], pts_rect.shape[0]), dtype='int32')
+    cdef int boxes3d_num = boxes3d.shape[0]
+    cdef int pts_rect_num = pts_rect.shape[0]
+    cdef float cx, by, cz, h, w, l, angle, cy, cosa, sina, x_rot, z_rot
+    cdef int x, y, z
+
+    for i in range(boxes3d_num):
+        cx, by, cz, h, w, l, angle = boxes3d[i, :]
+        cy = by - h / 2.
+        cosa = np.cos(angle)
+        sina = np.sin(angle)
+        for j in range(pts_rect_num):
+            x, y, z = pts_rect[j, :]
+
+            if np.abs(x - cx) > MAX_DIS or np.abs(y - cy) > h / 2. or np.abs(z - cz) > MAX_DIS:
+                continue
+
+            x_rot = (x - cx) * cosa + (z - cz) * (-sina)
+            z_rot = (x - cx) * sina + (z - cz) * cosa
+            boxes_pts_mask_list[i, j] = int(x_rot >= -l / 2. and x_rot <= l / 2. and
+                                            z_rot >= -w / 2. and z_rot <= w / 2.)
+    return boxes_pts_mask_list
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def rotate_pc_along_y(np.ndarray pc, float rot_angle):
+    """
+    params pc: (N, 3+C), (N, 3) is in the rectified camera coordinate
+    params rot_angle: rad scalar
+    Output pc: updated pc with XYZ rotated
+    """
+    cosval = np.cos(rot_angle)
+    sinval = np.sin(rot_angle)
+    rotmat = np.array([[cosval, -sinval], [sinval, cosval]])
+    pc[:, [0, 2]] = np.dot(pc[:, [0, 2]], np.transpose(rotmat))
+    return pc
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def rotate_pc_along_y_np(np.ndarray pc, np.ndarray rot_angle):
+    """
+    :param pc: (N, 512, 3 + C)
+    :param rot_angle: (N)
+    :return:
+    TODO: merge with rotate_pc_along_y_torch in bbox_transform.py
+    """
+    cdef np.ndarray cosa, sina, raw_1, raw_2, R, pc_temp
+    cosa = np.cos(rot_angle).reshape(-1, 1)
+    sina = np.sin(rot_angle).reshape(-1, 1)
+    raw_1 = np.concatenate([cosa, -sina], axis=1)
+    raw_2 = np.concatenate([sina, cosa], axis=1)
+    # # (N, 2, 2)
+    R = np.concatenate((np.expand_dims(raw_1, axis=1), np.expand_dims(raw_2, axis=1)), axis=1)
+    pc_temp = pc[:, :, [0, 2]]
+    pc[:, :, [0, 2]] = np.matmul(pc_temp, R.transpose(0, 2, 1))
+    
+    return pc
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def enlarge_box3d(np.ndarray boxes3d, float extra_width):
+    """
+    :param boxes3d: (N, 7) [x, y, z, h, w, l, ry]
+    """
+    cdef np.ndarray large_boxes3d
+    if isinstance(boxes3d, np.ndarray):
+        large_boxes3d = boxes3d.copy()
+    else:
+        large_boxes3d = boxes3d.clone()
+    large_boxes3d[:, 3:6] += extra_width * 2
+    large_boxes3d[:, 1] += extra_width
+
+    return large_boxes3d
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def boxes3d_to_corners3d(np.ndarray boxes3d, bint rotate=True):
+    """
+    :param boxes3d: (N, 7) [x, y, z, h, w, l, ry]
+    :param rotate:
+    :return: corners3d: (N, 8, 3)
+    """
+    cdef int boxes_num = boxes3d.shape[0]
+    cdef np.ndarray h, w, l
+    h, w, l = boxes3d[:, 3], boxes3d[:, 4], boxes3d[:, 5]
+    cdef np.ndarray x_corners, y_corners
+    x_corners = np.array([l / 2., l / 2., -l / 2., -l / 2., l / 2., l / 2., -l / 2., -l / 2.], dtype=np.float32).T  # (N, 8)
+    z_corners = np.array([w / 2., -w / 2., -w / 2., w / 2., w / 2., -w / 2., -w / 2., w / 2.], dtype=np.float32).T  # (N, 8)
+
+    y_corners = np.zeros((boxes_num, 8), dtype=np.float32)
+    y_corners[:, 4:8] = -h.reshape(boxes_num, 1).repeat(4, axis=1)  # (N, 8)
+    
+    cdef np.ndarray ry, zeros, ones, rot_list, R_list, temp_corners, rotated_corners
+    if rotate:
+        ry = boxes3d[:, 6]
+        zeros, ones = np.zeros(ry.size, dtype=np.float32), np.ones(ry.size, dtype=np.float32)
+        rot_list = np.array([[np.cos(ry), zeros, -np.sin(ry)],
+                             [zeros,       ones,       zeros],
+                             [np.sin(ry), zeros,  np.cos(ry)]])  # (3, 3, N)
+        R_list = np.transpose(rot_list, (2, 0, 1))  # (N, 3, 3)
+
+        temp_corners = np.concatenate((x_corners.reshape(-1, 8, 1), y_corners.reshape(-1, 8, 1),
+                                       z_corners.reshape(-1, 8, 1)), axis=2)  # (N, 8, 3)
+        rotated_corners = np.matmul(temp_corners, R_list)  # (N, 8, 3)
+        x_corners, y_corners, z_corners = rotated_corners[:, :, 0], rotated_corners[:, :, 1], rotated_corners[:, :, 2]
+
+    cdef np.ndarray x_loc, y_loc, z_loc
+    x_loc, y_loc, z_loc = boxes3d[:, 0], boxes3d[:, 1], boxes3d[:, 2]
+
+    cdef np.ndarray x, y, z, corners 
+    x = x_loc.reshape(-1, 1) + x_corners.reshape(-1, 8)
+    y = y_loc.reshape(-1, 1) + y_corners.reshape(-1, 8)
+    z = z_loc.reshape(-1, 1) + z_corners.reshape(-1, 8)
+
+    corners = np.concatenate((x.reshape(-1, 8, 1), y.reshape(-1, 8, 1), z.reshape(-1, 8, 1)), axis=2).astype(np.float32)
+
+    return corners
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def objs_to_boxes3d(obj_list):
+    cdef np.ndarray boxes3d = np.zeros((obj_list.__len__(), 7), dtype=np.float32)
+    cdef int k 
+    for k, obj in enumerate(obj_list):
+        boxes3d[k, 0:3], boxes3d[k, 3], boxes3d[k, 4], boxes3d[k, 5], boxes3d[k, 6] \
+            = obj.pos, obj.h, obj.w, obj.l, obj.ry
+    return boxes3d
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def objs_to_scores(obj_list):
+    cdef np.ndarray scores = np.zeros((obj_list.__len__()), dtype=np.float32)
+    cdef int k 
+    for k, obj in enumerate(obj_list):
+        scores[k] = obj.score
+    return scores
+
+
+def get_iou3d(np.ndarray corners3d, np.ndarray query_corners3d, bint need_bev=False):
+    """
+    :param corners3d: (N, 8, 3) in rect coords
+    :param query_corners3d: (M, 8, 3)
+    :return:
+    """
+    from shapely.geometry import Polygon
+    A, B = corners3d, query_corners3d
+    N, M = A.shape[0], B.shape[0]
+    iou3d = np.zeros((N, M), dtype=np.float32)
+    iou_bev = np.zeros((N, M), dtype=np.float32)
+
+    # for height overlap, since y face down, use the negative y
+    min_h_a = -A[:, 0:4, 1].sum(axis=1) / 4.0
+    max_h_a = -A[:, 4:8, 1].sum(axis=1) / 4.0
+    min_h_b = -B[:, 0:4, 1].sum(axis=1) / 4.0
+    max_h_b = -B[:, 4:8, 1].sum(axis=1) / 4.0
+
+    for i in range(N):
+        for j in range(M):
+            max_of_min = np.max([min_h_a[i], min_h_b[j]])
+            min_of_max = np.min([max_h_a[i], max_h_b[j]])
+            h_overlap = np.max([0, min_of_max - max_of_min])
+            if h_overlap == 0:
+                continue
+
+            bottom_a, bottom_b = Polygon(A[i, 0:4, [0, 2]].T), Polygon(B[j, 0:4, [0, 2]].T)
+            if bottom_a.is_valid and bottom_b.is_valid:
+                # check is valid,  A valid Polygon may not possess any overlapping exterior or interior rings.
+                bottom_overlap = bottom_a.intersection(bottom_b).area
+            else:
+                bottom_overlap = 0.
+            overlap3d = bottom_overlap * h_overlap
+            union3d = bottom_a.area * (max_h_a[i] - min_h_a[i]) + bottom_b.area * (max_h_b[j] - min_h_b[j]) - overlap3d
+            iou3d[i][j] = overlap3d / union3d
+            iou_bev[i][j] = bottom_overlap / (bottom_a.area + bottom_b.area - bottom_overlap)
+
+    if need_bev:
+        return iou3d, iou_bev
+
+    return iou3d
+
+
+def get_objects_from_label(label_file):
+    import utils.object3d as object3d
+
+    with open(label_file, 'r') as f:
+        lines = f.readlines()
+    objects = [object3d.Object3d(line) for line in lines]
+    return objects
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def _rotate_pc_along_y(np.ndarray pc, np.ndarray angle):
+    cdef np.ndarray cosa = np.cos(angle)
+    cosa=cosa.reshape(-1, 1)
+    cdef np.ndarray sina = np.sin(angle)
+    sina = sina.reshape(-1, 1)
+
+    cdef np.ndarray R = np.concatenate([cosa, -sina, sina, cosa], axis=-1)
+    R = R.reshape(-1, 2, 2)
+    cdef np.ndarray pc_temp = pc[:, [0, 2]]
+    pc_temp = pc_temp.reshape(-1, 1, 2)
+    cdef np.ndarray pc_temp_1 = np.matmul(pc_temp, R.transpose(0, 2, 1))
+    pc_temp_1 = pc_temp_1.reshape(-1, 2)
+    pc[:,[0,2]] = pc_temp_1 
+
+    return pc
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def decode_bbox_target(
+    np.ndarray roi_box3d, 
+    np.ndarray pred_reg, 
+    np.ndarray anchor_size, 
+    float loc_scope,
+    float loc_bin_size, 
+    int num_head_bin, 
+    bint get_xz_fine=True,
+    float loc_y_scope=0.5, 
+    float loc_y_bin_size=0.25,
+    bint get_y_by_bin=False, 
+    bint get_ry_fine=False):
+    
+    cdef int per_loc_bin_num = int(loc_scope / loc_bin_size) * 2
+    cdef int loc_y_bin_num = int(loc_y_scope / loc_y_bin_size) * 2
+
+    # recover xz localization
+    cdef int x_bin_l = 0
+    cdef int x_bin_r = per_loc_bin_num
+    cdef int z_bin_l = per_loc_bin_num, 
+    cdef int z_bin_r = per_loc_bin_num * 2
+    cdef int start_offset = z_bin_r
+    cdef np.ndarray x_bin = np.argmax(pred_reg[:, x_bin_l: x_bin_r], axis=1)
+    cdef np.ndarray z_bin = np.argmax(pred_reg[:, z_bin_l: z_bin_r], axis=1)
+
+    cdef np.ndarray pos_x = x_bin.astype('float32') * loc_bin_size + loc_bin_size / 2 - loc_scope
+    cdef np.ndarray pos_z = z_bin.astype('float32') * loc_bin_size + loc_bin_size / 2 - loc_scope
+
+    if get_xz_fine:
+        x_res_l, x_res_r = per_loc_bin_num * 2, per_loc_bin_num * 3
+        z_res_l, z_res_r = per_loc_bin_num * 3, per_loc_bin_num * 4
+        start_offset = z_res_r
+
+        x_res_norm = pred_reg[:, x_res_l:x_res_r][np.arange(len(x_bin)), x_bin]
+        z_res_norm = pred_reg[:, z_res_l:z_res_r][np.arange(len(z_bin)), z_bin]
+
+        x_res = x_res_norm * loc_bin_size
+        z_res = z_res_norm * loc_bin_size
+        pos_x += x_res
+        pos_z += z_res
+
+    # recover y localization
+    if get_y_by_bin:
+        y_bin_l, y_bin_r = start_offset, start_offset + loc_y_bin_num
+        y_res_l, y_res_r = y_bin_r, y_bin_r + loc_y_bin_num
+        start_offset = y_res_r
+
+        y_bin = np.argmax(pred_reg[:, y_bin_l: y_bin_r], axis=1)
+        y_res_norm = pred_reg[:, y_res_l:y_res_r][np.arange(len(y_bin)), y_bin]
+        y_res = y_res_norm * loc_y_bin_size
+        pos_y = y_bin.astype('float32') * loc_y_bin_size + loc_y_bin_size / 2 - loc_y_scope + y_res
+        pos_y = pos_y + np.array(roi_box3d[:, 1]).reshape(-1)
+    else:
+        y_offset_l, y_offset_r = start_offset, start_offset + 1
+        start_offset = y_offset_r
+
+        pos_y = np.array(roi_box3d[:, 1]) + np.array(pred_reg[:, y_offset_l])
+        pos_y = pos_y.reshape(-1)
+
+    # recover ry rotation
+    cdef int  ry_bin_l = start_offset, 
+    cdef int ry_bin_r = start_offset + num_head_bin
+    cdef int ry_res_l = ry_bin_r, 
+    cdef int ry_res_r = ry_bin_r + num_head_bin
+
+    cdef np.ndarray ry_bin = np.argmax(pred_reg[:, ry_bin_l: ry_bin_r], axis=1)
+    cdef np.ndarray ry_res_norm = pred_reg[:, ry_res_l:ry_res_r][np.arange(len(ry_bin)), ry_bin]
+    if get_ry_fine:
+        # divide pi/2 into several bins
+        angle_per_class = (np.pi / 2) / num_head_bin
+        ry_res = ry_res_norm * (angle_per_class / 2)
+        ry = (ry_bin.astype('float32') * angle_per_class + angle_per_class / 2) + ry_res - np.pi / 4
+    else:
+        angle_per_class = (2 * np.pi) / num_head_bin
+        ry_res = ry_res_norm * (angle_per_class / 2)
+
+        # bin_center is (0, 30, 60, 90, 120, ..., 270, 300, 330)
+        ry = np.fmod(ry_bin.astype('float32') * angle_per_class + ry_res, 2 * np.pi)
+        ry[ry > np.pi] -= 2 * np.pi
+
+    # recover size
+    cdef int size_res_l = ry_res_r 
+    cdef int size_res_r = ry_res_r + 3
+    assert size_res_r == pred_reg.shape[1]
+
+    cdef np.ndarray size_res_norm = pred_reg[:, size_res_l: size_res_r]
+    cdef np.ndarray hwl = size_res_norm * anchor_size + anchor_size
+
+    # shift to original coords
+    cdef np.ndarray roi_center = np.array(roi_box3d[:, 0:3])
+    cdef np.ndarray shift_ret_box3d = np.concatenate((
+        pos_x.reshape(-1, 1),
+        pos_y.reshape(-1, 1),
+        pos_z.reshape(-1, 1),
+        hwl, ry.reshape(-1, 1)), axis=1)
+    ret_box3d = shift_ret_box3d
+    if roi_box3d.shape[1] == 7:
+        roi_ry = np.array(roi_box3d[:, 6]).reshape(-1)
+        ret_box3d = _rotate_pc_along_y(np.array(shift_ret_box3d), -roi_ry)
+        ret_box3d[:, 6] += roi_ry
+    ret_box3d[:, [0, 2]] += roi_center[:, [0, 2]]
+    
+    return ret_box3d
--- a/PaddleCV/Paddle3D/PointRCNN/utils/cyops/object3d.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/cyops/object3d.py
+"""
+This code is borrow from https://github.com/sshaoshuai/PointRCNN/blob/master/lib/utils/object3d.py
+"""
+import numpy as np
+
+
+def cls_type_to_id(cls_type):
+    type_to_id = {'Car': 1, 'Pedestrian': 2, 'Cyclist': 3, 'Van': 4}
+    if cls_type not in type_to_id.keys():
+        return -1
+    return type_to_id[cls_type]
+
+
+class Object3d(object):
+
+    def __init__(self, line):
+        label = line.strip().split(' ')
+        self.src = line
+        self.cls_type = label[0]
+        self.cls_id = cls_type_to_id(self.cls_type)
+        self.trucation = float(label[1])
+        self.occlusion = float(label[2])  # 0:fully visible 1:partly occluded 2:largely occluded 3:unknown
+        self.alpha = float(label[3])
+        self.box2d = np.array((float(label[4]), float(label[5]), float(label[6]), float(label[7])), dtype=np.float32)
+        self.h = float(label[8])
+        self.w = float(label[9])
+        self.l = float(label[10])
+        self.pos = np.array((float(label[11]), float(label[12]), float(label[13])), dtype=np.float32)
+        self.dis_to_cam = np.linalg.norm(self.pos)
+        self.ry = float(label[14])
+        self.score = float(label[15]) if label.__len__() == 16 else -1.0
+        self.level_str = None
+        self.level = self.get_obj_level()
+
+    def get_obj_level(self):
+        height = float(self.box2d[3]) - float(self.box2d[1]) + 1
+
+        if height >= 40 and self.trucation <= 0.15 and self.occlusion <= 0:
+            self.level_str = 'Easy'
+            return 1  # Easy
+        elif height >= 25 and self.trucation <= 0.3 and self.occlusion <= 1:
+            self.level_str = 'Moderate'
+            return 2  # Moderate
+        elif height >= 25 and self.trucation <= 0.5 and self.occlusion <= 2:
+            self.level_str = 'Hard'
+            return 3  # Hard
+        else:
+            self.level_str = 'UnKnown'
+            return 4
+
+    def generate_corners3d(self):
+        """
+        generate corners3d representation for this object
+        :return corners_3d: (8, 3) corners of box3d in camera coord
+        """
+        l, h, w = self.l, self.h, self.w
+        x_corners = [l / 2, l / 2, -l / 2, -l / 2, l / 2, l / 2, -l / 2, -l / 2]
+        y_corners = [0, 0, 0, 0, -h, -h, -h, -h]
+        z_corners = [w / 2, -w / 2, -w / 2, w / 2, w / 2, -w / 2, -w / 2, w / 2]
+
+        R = np.array([[np.cos(self.ry), 0, np.sin(self.ry)],
+                      [0, 1, 0],
+                      [-np.sin(self.ry), 0, np.cos(self.ry)]])
+        corners3d = np.vstack([x_corners, y_corners, z_corners])  # (3, 8)
+        corners3d = np.dot(R, corners3d).T
+        corners3d = corners3d + self.pos
+        return corners3d
+
+    def to_bev_box2d(self, oblique=True, voxel_size=0.1):
+        """
+        :param bev_shape: (2) for bev shape (h, w), => (y_max, x_max) in image
+        :param voxel_size: float, 0.1m
+        :param oblique:
+        :return: box2d (4, 2)/ (4) in image coordinate
+        """
+        if oblique:
+            corners3d = self.generate_corners3d()
+            xz_corners = corners3d[0:4, [0, 2]]
+            box2d = np.zeros((4, 2), dtype=np.int32)
+            box2d[:, 0] = ((xz_corners[:, 0] - Object3d.MIN_XZ[0]) / voxel_size).astype(np.int32)
+            box2d[:, 1] = Object3d.BEV_SHAPE[0] - 1 - ((xz_corners[:, 1] - Object3d.MIN_XZ[1]) / voxel_size).astype(np.int32)
+            box2d[:, 0] = np.clip(box2d[:, 0], 0, Object3d.BEV_SHAPE[1])
+            box2d[:, 1] = np.clip(box2d[:, 1], 0, Object3d.BEV_SHAPE[0])
+        else:
+            box2d = np.zeros(4, dtype=np.int32)
+            # discrete_center = np.floor((self.pos / voxel_size)).astype(np.int32)
+            cu = np.floor((self.pos[0] - Object3d.MIN_XZ[0]) / voxel_size).astype(np.int32)
+            cv = Object3d.BEV_SHAPE[0] - 1 - ((self.pos[2] - Object3d.MIN_XZ[1]) / voxel_size).astype(np.int32)
+            half_l, half_w = int(self.l / voxel_size / 2), int(self.w / voxel_size / 2)
+            box2d[0], box2d[1] = cu - half_l, cv - half_w
+            box2d[2], box2d[3] = cu + half_l, cv + half_w
+
+        return box2d
+
+    def to_str(self):
+        print_str = '%s %.3f %.3f %.3f box2d: %s hwl: [%.3f %.3f %.3f] pos: %s ry: %.3f' \
+                     % (self.cls_type, self.trucation, self.occlusion, self.alpha, self.box2d, self.h, self.w, self.l,
+                        self.pos, self.ry)
+        return print_str
+
+    def to_kitti_format(self):
+        kitti_str = '%s %.2f %d %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f' \
+                    % (self.cls_type, self.trucation, int(self.occlusion), self.alpha, self.box2d[0], self.box2d[1],
+                       self.box2d[2], self.box2d[3], self.h, self.w, self.l, self.pos[0], self.pos[1], self.pos[2],
+                       self.ry)
+        return kitti_str
+
--- a/PaddleCV/Paddle3D/PointRCNN/utils/cyops/roipool3d_utils.pyx
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/cyops/roipool3d_utils.pyx
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+ 
+import numpy as np 
+cimport numpy as np 
+cimport cython 
+from libc.math cimport sin, cos 
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cdef enlarge_box3d(np.ndarray boxes3d, int extra_width):
+    """
+    :param boxes3d: (N, 7) [x, y, z, h, w, l, ry]
+    """
+    if isinstance(boxes3d, np.ndarray):
+        large_boxes3d = boxes3d.copy()
+    else:
+        large_boxes3d = boxes3d.clone()
+    large_boxes3d[:, 3:6] += extra_width * 2
+    large_boxes3d[:, 1] += extra_width
+    return large_boxes3d
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cdef pt_in_box(float x, float y, float z, float cx, float bottom_y, float cz, float h, float w, float l, float angle):
+    cdef float max_ids = 10.0
+    cdef float cy = bottom_y - h / 2.0
+    if ((abs(x - cx) > max_ids) or (abs(y - cy) > h / 2.0) or (abs(z - cz) > max_ids)):
+        return 0
+    cdef float cosa = cos(angle)
+    cdef float sina = sin(angle)
+    cdef float x_rot = (x - cx) * cosa + (z - cz) * (-sina)
+
+    cdef float z_rot = (x - cx) * sina + (z - cz) * cosa
+
+    cdef float flag = (x_rot >= -l / 2.0) and (x_rot <= l / 2.0) and (z_rot >= -w / 2.0) and (z_rot <= w / 2.0)
+    return flag
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cdef _rotate_pc_along_y(np.ndarray pc, float rot_angle):
+    """
+    params pc: (N, 3+C), (N, 3) is in the rectified camera coordinate
+    params rot_angle: rad scalar
+    Output pc: updated pc with XYZ rotated
+    """
+    cosval = np.cos(rot_angle)
+    sinval = np.sin(rot_angle)
+    rotmat = np.array([[cosval, -sinval], [sinval, cosval]])
+    pc[:, [0, 2]] = np.dot(pc[:, [0, 2]], np.transpose(rotmat))
+    return pc
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def roipool3d_cpu(
+        np.ndarray[float, ndim=2] pts, 
+        np.ndarray[float, ndim=2] pts_feature, 
+        np.ndarray[float, ndim=2] boxes3d, 
+        np.ndarray[float, ndim=2] pts_extra_input, 
+        int pool_extra_width, int sampled_pt_num, int batch_size=1, bint canonical_transform=False):
+    cdef np.ndarray pts_feature_all = np.concatenate((pts_extra_input, pts_feature), axis=1)
+
+    cdef np.ndarray larged_boxes3d = enlarge_box3d(boxes3d.reshape(-1, 7), pool_extra_width).reshape(batch_size, -1, 7)
+
+    cdef int pts_num  = pts.shape[0], 
+    cdef int boxes_num = boxes3d.shape[0] 
+    cdef int feature_len = pts_feature_all.shape[1]
+    cdef np.ndarray pts_data = np.zeros((batch_size, boxes_num, sampled_pt_num, 3))
+    cdef np.ndarray features_data = np.zeros((batch_size, boxes_num, sampled_pt_num, feature_len))
+    cdef np.ndarray empty_flag_data = np.zeros((batch_size, boxes_num))
+    
+    cdef int cnt = 0
+    cdef float cx = 0.
+    cdef float bottom_y = 0.
+    cdef float cz = 0.
+    cdef float h = 0.
+    cdef float w = 0.
+    cdef float l = 0.
+    cdef float ry = 0.
+    cdef float x = 0. 
+    cdef float y = 0.
+    cdef float z = 0.
+    cdef np.ndarray x_i
+    cdef np.ndarray feat_i 
+    cdef int bs
+    cdef int i 
+    cdef int j 
+    for bs in range(batch_size):
+        # boxes: 64,7
+        for i in range(boxes_num):
+            cnt = 0
+            # box
+            box = larged_boxes3d[bs][i]
+            cx = box[0]
+            bottom_y = box[1]
+            cz = box[2]
+            h = box[3]
+            w = box[4]
+            l = box[5]
+            ry = box[6]
+            # points: 16384,3  
+            x_i = pts
+            # features: 16384, 128 
+            feat_i = pts_feature_all
+            
+            for j in range(pts_num):
+                x = x_i[j][0]
+                y = x_i[j][1]
+                z = x_i[j][2]
+                cur_in_flag = pt_in_box(x,y,z,cx,bottom_y,cz,h,w,l,ry)
+                if cur_in_flag:
+                    if cnt < sampled_pt_num:
+                        pts_data[bs][i][cnt][:] = x_i[j]
+                        features_data[bs][i][cnt][:] = feat_i[j]
+                        cnt += 1
+                    else:
+                        break 
+
+            if cnt == 0:
+                empty_flag_data[bs][i] = 1
+            elif (cnt < sampled_pt_num):
+                for k in range(cnt, sampled_pt_num):
+                    pts_data[bs][i][k] = pts_data[bs][i][k % cnt]
+                    features_data[bs][i][k] = features_data[bs][i][k % cnt]
+
+
+    pooled_pts = pts_data.astype("float32")[0]
+    pooled_features = features_data.astype('float32')[0]
+    pooled_empty_flag = empty_flag_data.astype('int64')[0]
+
+    cdef int extra_input_len = pts_extra_input.shape[1]
+    pooled_pts = np.concatenate((pooled_pts, pooled_features[:,:,0:extra_input_len]),axis=2)
+    pooled_features = pooled_features[:,:,extra_input_len:]
+    
+    if canonical_transform:
+        # Translate to the roi coordinates
+        roi_ry = boxes3d[:, 6] % (2 * np.pi)  # 0~2pi
+        roi_center = boxes3d[:, 0:3]
+        # shift to center
+        pooled_pts[:, :, 0:3] = pooled_pts[:, :, 0:3] - roi_center[:, np.newaxis, :]
+        for k in range(pooled_pts.shape[0]):
+            pooled_pts[k] = _rotate_pc_along_y(pooled_pts[k], roi_ry[k])
+        return pooled_pts, pooled_features, pooled_empty_flag
+
+    return pooled_pts, pooled_features, pooled_empty_flag
+
+
+#def roipool3d_cpu(pts, pts_feature, boxes3d, pts_extra_input, pool_extra_width, sampled_pt_num=512, batch_size=1):
+#    return _roipool3d_cpu(pts, pts_feature, boxes3d, pts_extra_input, pool_extra_width, sampled_pt_num, batch_size)
--- a/PaddleCV/Paddle3D/PointRCNN/utils/cyops/setup.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/cyops/setup.py
+# Copyright (c) 2017-present, Facebook, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+##############################################################################
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from Cython.Build import cythonize
+from setuptools import Extension
+from setuptools import setup
+
+import numpy as np
+
+_NP_INCLUDE_DIRS = np.get_include()
+
+
+# Extension modules
+ext_modules = [
+    Extension(
+        name='utils.cyops.roipool3d_utils',
+        sources=[
+            'utils/cyops/roipool3d_utils.pyx'
+        ],
+        extra_compile_args=[
+            '-Wno-cpp'
+        ],
+        include_dirs=[
+            _NP_INCLUDE_DIRS
+        ]
+    ),
+
+    Extension(
+        name='utils.cyops.iou3d_utils',
+        sources=[
+            'utils/cyops/iou3d_utils.pyx'
+        ],
+        extra_compile_args=[
+            '-Wno-cpp'
+        ],
+        include_dirs=[
+            _NP_INCLUDE_DIRS
+        ]
+    ),
+    
+    Extension(
+        name='utils.cyops.kitti_utils',
+        sources=[
+            'utils/cyops/kitti_utils.pyx'
+        ],
+        extra_compile_args=[
+            '-Wno-cpp'
+        ],
+        include_dirs=[
+            _NP_INCLUDE_DIRS
+        ]
+    ),
+]
+
+setup(
+    name='pp_pointrcnn',
+    ext_modules=cythonize(ext_modules)
+)
--- a/PaddleCV/Paddle3D/PointRCNN/utils/metric_utils.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/metric_utils.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import logging
+import numpy as np
+import utils.cyops.kitti_utils as kitti_utils 
+from utils.config import cfg
+from utils.box_utils import boxes_iou3d, box_nms_eval, boxes3d_to_bev
+from utils.save_utils import save_rpn_feature, save_kitti_result, save_kitti_format
+
+__all__ = ['calc_iou_recall', 'rpn_metric', 'rcnn_metric']
+
+logging.root.handlers = []
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
+logger = logging.getLogger(__name__)
+
+
+def calc_iou_recall(rets, thresh_list):
+    rpn_cls_label = rets['rpn_cls_label'][0]
+    boxes3d = rets['rois'][0]
+    seg_mask = rets['seg_mask'][0]
+    sample_id = rets['sample_id'][0]
+    gt_boxes3d = rets['gt_boxes3d'][0]
+    gt_boxes3d_num = rets['gt_boxes3d'][1]
+
+    gt_box_idx = 0
+    recalled_bbox_list = [0] * len(thresh_list)
+    gt_box_num = 0
+    rpn_iou_sum = 0.
+    for i in range(len(gt_boxes3d_num)):
+        cur_rpn_cls_label = rpn_cls_label[i]
+        cur_boxes3d = boxes3d[i]
+        cur_seg_mask = seg_mask[i]
+        cur_sample_id = sample_id[i]
+        cur_gt_boxes3d = gt_boxes3d[gt_box_idx: gt_box_idx +
+                                    gt_boxes3d_num[0][i]]
+        gt_box_idx += gt_boxes3d_num[0][i]
+
+        k = cur_gt_boxes3d.__len__() - 1
+        while k >= 0 and np.sum(cur_gt_boxes3d[k]) == 0:
+            k -= 1
+        cur_gt_boxes3d = cur_gt_boxes3d[:k + 1]
+
+        if cur_gt_boxes3d.shape[0] > 0:
+            iou3d = boxes_iou3d(cur_boxes3d, cur_gt_boxes3d[:, 0:7])
+            gt_max_iou = iou3d.max(axis=0)
+
+            for idx, thresh in enumerate(thresh_list):
+                recalled_bbox_list[idx] += np.sum(gt_max_iou > thresh)
+            gt_box_num += cur_gt_boxes3d.__len__()
+
+        fg_mask = cur_rpn_cls_label > 0
+        correct = np.sum(np.logical_and(
+            cur_seg_mask == cur_rpn_cls_label, fg_mask))
+        union = np.sum(fg_mask) + np.sum(cur_seg_mask > 0) - correct
+        rpn_iou = float(correct) / max(float(union), 1.0)
+        rpn_iou_sum += rpn_iou
+        logger.debug('sample_id:{}, rpn_iou:{}, gt_box_num:{}, recalled_bbox_list:{}'.format(
+            sample_id, rpn_iou, gt_box_num, str(recalled_bbox_list)))
+
+    return len(gt_boxes3d_num), gt_box_num, rpn_iou_sum, recalled_bbox_list
+
+
+def rpn_metric(queue, mdict, lock, thresh_list, is_save_rpn_feature, kitti_feature_dir,
+               seg_output_dir, kitti_output_dir, kitti_rcnn_reader, classes):
+    while True:
+        rets_dict = queue.get()
+        if rets_dict is None:
+            lock.acquire()
+            mdict['exit_proc'] += 1
+            lock.release()
+            return 
+
+        cnt, gt_box_num, rpn_iou_sum, recalled_bbox_list = calc_iou_recall(
+            rets_dict, thresh_list)
+        lock.acquire()
+        mdict['total_cnt'] += cnt
+        mdict['total_gt_bbox'] += gt_box_num
+        mdict['total_rpn_iou'] += rpn_iou_sum
+        for i, bbox_num in enumerate(recalled_bbox_list):
+            mdict['total_recalled_bbox_list_{}'.format(i)] += bbox_num
+        logger.debug("rpn_metric: {}".format(str(mdict)))
+        lock.release()
+
+        if is_save_rpn_feature:
+            save_rpn_feature(rets_dict, kitti_feature_dir)
+            save_kitti_result(
+                rets_dict, seg_output_dir, kitti_output_dir, kitti_rcnn_reader, classes)
+
+
+def rcnn_metric(queue, mdict, lock, thresh_list, kitti_rcnn_reader, roi_output_dir,
+                refine_output_dir, final_output_dir, is_save_result=False):
+    while True:
+        rets_dict = queue.get()
+        if rets_dict is None:
+            lock.acquire()
+            mdict['exit_proc'] += 1
+            lock.release()
+            return 
+        
+        for k,v in rets_dict.items():
+            rets_dict[k] = v[0]
+
+        rcnn_cls = rets_dict['rcnn_cls']
+        rcnn_reg = rets_dict['rcnn_reg']
+        roi_boxes3d = rets_dict['roi_boxes3d']
+        roi_scores = rets_dict['roi_scores']
+
+        # bounding box regression
+        anchor_size = cfg.CLS_MEAN_SIZE[0]
+        pred_boxes3d = kitti_utils.decode_bbox_target(
+            roi_boxes3d, 
+            rcnn_reg,
+            anchor_size=np.array(anchor_size),
+            loc_scope=cfg.RCNN.LOC_SCOPE,
+            loc_bin_size=cfg.RCNN.LOC_BIN_SIZE,
+            num_head_bin=cfg.RCNN.NUM_HEAD_BIN,
+            get_xz_fine=True, 
+            get_y_by_bin=cfg.RCNN.LOC_Y_BY_BIN,
+            loc_y_scope=cfg.RCNN.LOC_Y_SCOPE,
+            loc_y_bin_size=cfg.RCNN.LOC_Y_BIN_SIZE,
+            get_ry_fine=True
+        )
+
+        # scoring
+        if rcnn_cls.shape[1] == 1:
+            raw_scores = rcnn_cls.reshape(-1)
+            norm_scores = rets_dict['norm_scores']
+            pred_classes = norm_scores > cfg.RCNN.SCORE_THRESH
+            pred_classes = pred_classes.astype(np.float32)
+        else:
+            pred_classes = np.argmax(rcnn_cls, axis=1).reshape(-1)
+            raw_scores = rcnn_cls[:, pred_classes]
+
+        # evaluation
+        gt_iou = rets_dict['gt_iou']
+        gt_boxes3d = rets_dict['gt_boxes3d']
+        
+        # recall
+        if gt_boxes3d.size > 0:
+            gt_num = gt_boxes3d.shape[1]
+            gt_boxes3d = gt_boxes3d.reshape((-1,7))
+            iou3d = boxes_iou3d(pred_boxes3d, gt_boxes3d)
+            gt_max_iou = iou3d.max(axis=0)
+            refined_iou = iou3d.max(axis=1)
+
+            recalled_num = (gt_max_iou > 0.7).sum()
+            roi_boxes3d = roi_boxes3d.reshape((-1,7))
+            iou3d_in = boxes_iou3d(roi_boxes3d, gt_boxes3d)
+            gt_max_iou_in = iou3d_in.max(axis=0)
+
+            lock.acquire()
+            mdict['total_gt_bbox'] += gt_num
+            for idx, thresh in enumerate(thresh_list):
+                recalled_bbox_num = (gt_max_iou > thresh).sum() 
+                mdict['total_recalled_bbox_list_{}'.format(idx)] += recalled_bbox_num
+            for idx, thresh in enumerate(thresh_list):
+                roi_recalled_bbox_num = (gt_max_iou_in > thresh).sum()
+                mdict['total_roi_recalled_bbox_list_{}'.format(idx)] += roi_recalled_bbox_num 
+            lock.release()
+        
+        # classification accuracy
+        cls_label = gt_iou > cfg.RCNN.CLS_FG_THRESH
+        cls_label = cls_label.astype(np.float32)
+        cls_valid_mask = (gt_iou >= cfg.RCNN.CLS_FG_THRESH) | (gt_iou <= cfg.RCNN.CLS_BG_THRESH)
+        cls_valid_mask = cls_valid_mask.astype(np.float32)
+        cls_acc = (pred_classes == cls_label).astype(np.float32)
+        cls_acc = (cls_acc * cls_valid_mask).sum() / max(cls_valid_mask.sum(), 1.0) * 1.0 
+        
+        iou_thresh = 0.7 if cfg.CLASSES == 'Car' else 0.5
+        cls_label_refined = (gt_iou >= iou_thresh)
+        cls_label_refined = cls_label_refined.astype(np.float32)
+        cls_acc_refined = (pred_classes == cls_label_refined).astype(np.float32).sum() / max(cls_label_refined.shape[0], 1.0) 
+        
+        sample_id = rets_dict['sample_id']
+        image_shape = kitti_rcnn_reader.get_image_shape(sample_id)
+        
+        if is_save_result:
+            roi_boxes3d_np = roi_boxes3d
+            pred_boxes3d_np = pred_boxes3d
+            calib = kitti_rcnn_reader.get_calib(sample_id)
+            save_kitti_format(sample_id, calib, roi_boxes3d_np, roi_output_dir, roi_scores, image_shape)
+            save_kitti_format(sample_id, calib, pred_boxes3d_np, refine_output_dir, raw_scores, image_shape)
+        
+        inds = norm_scores > cfg.RCNN.SCORE_THRESH
+        if inds.astype(np.float32).sum() == 0:
+            logger.debug("The num of 'norm_scores > thresh' of sample {} is 0".format(sample_id))
+            continue
+        pred_boxes3d_selected = pred_boxes3d[inds]
+        raw_scores_selected = raw_scores[inds]
+        # NMS thresh
+        boxes_bev_selected = boxes3d_to_bev(pred_boxes3d_selected)
+        scores_selected, pred_boxes3d_selected = box_nms_eval(boxes_bev_selected, raw_scores_selected, pred_boxes3d_selected, cfg.RCNN.NMS_THRESH)
+        calib = kitti_rcnn_reader.get_calib(sample_id)
+        save_kitti_format(sample_id, calib, pred_boxes3d_selected, final_output_dir, scores_selected, image_shape)
+        lock.acquire()
+        mdict['total_det_num'] += pred_boxes3d_selected.shape[0]
+        mdict['total_cls_acc'] += cls_acc
+        mdict['total_cls_acc_refined'] += cls_acc_refined
+        lock.release()
+        logger.debug("rcnn_metric: {}".format(str(mdict)))
+
--- a/PaddleCV/Paddle3D/PointRCNN/utils/object3d.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/object3d.py
+"""
+This code is borrow from https://github.com/sshaoshuai/PointRCNN/blob/master/lib/utils/object3d.py
+"""
+import numpy as np
+
+
+def cls_type_to_id(cls_type):
+    type_to_id = {'Car': 1, 'Pedestrian': 2, 'Cyclist': 3, 'Van': 4}
+    if cls_type not in type_to_id.keys():
+        return -1
+    return type_to_id[cls_type]
+
+
+def get_objects_from_label(label_file):
+    with open(label_file, 'r') as f:
+        lines = f.readlines()
+    objects = [Object3d(line) for line in lines]
+    return objects
+
+
+class Object3d(object):
+    def __init__(self, line):
+        label = line.strip().split(' ')
+        self.src = line
+        self.cls_type = label[0]
+        self.cls_id = cls_type_to_id(self.cls_type)
+        self.trucation = float(label[1])
+        self.occlusion = float(label[2])  # 0:fully visible 1:partly occluded 2:largely occluded 3:unknown
+        self.alpha = float(label[3])
+        self.box2d = np.array((float(label[4]), float(label[5]), float(label[6]), float(label[7])), dtype=np.float32)
+        self.h = float(label[8])
+        self.w = float(label[9])
+        self.l = float(label[10])
+        self.pos = np.array((float(label[11]), float(label[12]), float(label[13])), dtype=np.float32)
+        self.dis_to_cam = np.linalg.norm(self.pos)
+        self.ry = float(label[14])
+        self.score = float(label[15]) if label.__len__() == 16 else -1.0
+        self.level_str = None
+        self.level = self.get_obj_level()
+
+    def get_obj_level(self):
+        height = float(self.box2d[3]) - float(self.box2d[1]) + 1
+
+        if height >= 40 and self.trucation <= 0.15 and self.occlusion <= 0:
+            self.level_str = 'Easy'
+            return 1  # Easy
+        elif height >= 25 and self.trucation <= 0.3 and self.occlusion <= 1:
+            self.level_str = 'Moderate'
+            return 2  # Moderate
+        elif height >= 25 and self.trucation <= 0.5 and self.occlusion <= 2:
+            self.level_str = 'Hard'
+            return 3  # Hard
+        else:
+            self.level_str = 'UnKnown'
+            return 4
+
+    def generate_corners3d(self):
+        """
+        generate corners3d representation for this object
+        :return corners_3d: (8, 3) corners of box3d in camera coord
+        """
+        l, h, w = self.l, self.h, self.w
+        x_corners = [l / 2, l / 2, -l / 2, -l / 2, l / 2, l / 2, -l / 2, -l / 2]
+        y_corners = [0, 0, 0, 0, -h, -h, -h, -h]
+        z_corners = [w / 2, -w / 2, -w / 2, w / 2, w / 2, -w / 2, -w / 2, w / 2]
+
+        R = np.array([[np.cos(self.ry), 0, np.sin(self.ry)],
+                      [0, 1, 0],
+                      [-np.sin(self.ry), 0, np.cos(self.ry)]])
+        corners3d = np.vstack([x_corners, y_corners, z_corners])  # (3, 8)
+        corners3d = np.dot(R, corners3d).T
+        corners3d = corners3d + self.pos
+        return corners3d
+
+    def to_bev_box2d(self, oblique=True, voxel_size=0.1):
+        """
+        :param bev_shape: (2) for bev shape (h, w), => (y_max, x_max) in image
+        :param voxel_size: float, 0.1m
+        :param oblique:
+        :return: box2d (4, 2)/ (4) in image coordinate
+        """
+        if oblique:
+            corners3d = self.generate_corners3d()
+            xz_corners = corners3d[0:4, [0, 2]]
+            box2d = np.zeros((4, 2), dtype=np.int32)
+            box2d[:, 0] = ((xz_corners[:, 0] - Object3d.MIN_XZ[0]) / voxel_size).astype(np.int32)
+            box2d[:, 1] = Object3d.BEV_SHAPE[0] - 1 - ((xz_corners[:, 1] - Object3d.MIN_XZ[1]) / voxel_size).astype(np.int32)
+            box2d[:, 0] = np.clip(box2d[:, 0], 0, Object3d.BEV_SHAPE[1])
+            box2d[:, 1] = np.clip(box2d[:, 1], 0, Object3d.BEV_SHAPE[0])
+        else:
+            box2d = np.zeros(4, dtype=np.int32)
+            # discrete_center = np.floor((self.pos / voxel_size)).astype(np.int32)
+            cu = np.floor((self.pos[0] - Object3d.MIN_XZ[0]) / voxel_size).astype(np.int32)
+            cv = Object3d.BEV_SHAPE[0] - 1 - ((self.pos[2] - Object3d.MIN_XZ[1]) / voxel_size).astype(np.int32)
+            half_l, half_w = int(self.l / voxel_size / 2), int(self.w / voxel_size / 2)
+            box2d[0], box2d[1] = cu - half_l, cv - half_w
+            box2d[2], box2d[3] = cu + half_l, cv + half_w
+
+        return box2d
+
+    def to_str(self):
+        print_str = '%s %.3f %.3f %.3f box2d: %s hwl: [%.3f %.3f %.3f] pos: %s ry: %.3f' \
+                     % (self.cls_type, self.trucation, self.occlusion, self.alpha, self.box2d, self.h, self.w, self.l,
+                        self.pos, self.ry)
+        return print_str
+
+    def to_kitti_format(self):
+        kitti_str = '%s %.2f %d %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f' \
+                    % (self.cls_type, self.trucation, int(self.occlusion), self.alpha, self.box2d[0], self.box2d[1],
+                       self.box2d[2], self.box2d[3], self.h, self.w, self.l, self.pos[0], self.pos[1], self.pos[2],
+                       self.ry)
+        return kitti_str
+
--- a/PaddleCV/Paddle3D/PointRCNN/utils/optimizer.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/optimizer.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Optimization and learning rate scheduling."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import paddle.fluid as fluid
+import paddle.fluid.layers.learning_rate_scheduler as lr_scheduler
+from paddle.fluid.layers import control_flow
+
+import logging
+logger = logging.getLogger(__name__)
+
+def cosine_warmup_decay(learning_rate, betas, warmup_factor, decay_factor,
+                        total_step, warmup_pct):
+    def annealing_cos(start, end, pct):
+        "Cosine anneal from `start` to `end` as pct goes from 0.0 to 1.0."
+        cos_out = fluid.layers.cos(pct * np.pi) + 1.
+        return cos_out * (start - end) / 2. + end
+
+    warmup_start_lr = learning_rate * warmup_factor
+    decay_end_lr = learning_rate * decay_factor
+    warmup_step = total_step * warmup_pct
+
+    global_step = lr_scheduler._decay_step_counter()
+
+    lr = fluid.layers.create_global_var(
+        shape=[1],
+        value=float(learning_rate),
+        dtype='float32',
+        persistable=True,
+        name="learning_rate")
+    beta1 = fluid.layers.create_global_var(
+        shape=[1],
+        value=float(betas[0]),
+        dtype='float32',
+        persistable=True,
+        name="beta1")
+
+    warmup_step_var = fluid.layers.fill_constant(
+        shape=[1], dtype='float32', value=float(warmup_step), force_cpu=True)
+
+    with control_flow.Switch() as switch:
+        with switch.case(global_step < warmup_step_var):
+            cur_lr = annealing_cos(warmup_start_lr, learning_rate,
+                                   global_step / warmup_step_var)
+            fluid.layers.assign(cur_lr, lr)
+            cur_beta1 = annealing_cos(betas[0], betas[1],
+                                   global_step / warmup_step_var)
+            fluid.layers.assign(cur_beta1, beta1)
+        with switch.case(global_step >= warmup_step_var):
+            cur_lr = annealing_cos(learning_rate, decay_end_lr,
+                                   (global_step - warmup_step_var) / (total_step - warmup_step))
+            fluid.layers.assign(cur_lr, lr)
+            cur_beta1 = annealing_cos(betas[1], betas[0],
+                                   (global_step - warmup_step_var) / (total_step - warmup_step))
+            fluid.layers.assign(cur_beta1, beta1)
+
+    return lr, beta1
+
+
+def optimize(loss,
+             learning_rate,
+             warmup_factor,
+             decay_factor,
+             total_step,
+             warmup_pct,
+             train_program,
+             startup_prog,
+             weight_decay,
+             clip_norm,
+             beta1=[0.95, 0.85],
+             beta2=0.99,
+             scheduler='cosine_warmup_decay'):
+
+    scheduled_lr= None
+    if scheduler == 'cosine_warmup_decay':
+        scheduled_lr, scheduled_beta1 = cosine_warmup_decay(learning_rate, beta1, warmup_factor,
+                                           decay_factor, total_step,
+                                           warmup_pct)
+    else:
+        raise ValueError("Unkown learning rate scheduler, should be "
+                         "'cosine_warmup_decay'")
+
+    optimizer = fluid.optimizer.Adam(learning_rate=scheduled_lr,
+                                     beta1=scheduled_beta1,
+                                     beta2=beta2)
+    fluid.clip.set_gradient_clip(
+        clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=clip_norm))
+
+    param_list = dict()
+
+    if weight_decay > 0:
+        for param in train_program.global_block().all_parameters():
+            param_list[param.name] = param * 1.0
+            param_list[param.name].stop_gradient = True
+
+    _, param_grads = optimizer.minimize(loss)
+
+    if weight_decay > 0:
+        for param, grad in param_grads:
+            with param.block.program._optimized_guard(
+                [param, grad]), fluid.framework.name_scope("weight_decay"):
+                updated_param = param - param_list[
+                    param.name] * weight_decay * scheduled_lr
+                fluid.layers.assign(output=param, input=updated_param)
+
+    return scheduled_lr
--- a/PaddleCV/Paddle3D/PointRCNN/utils/proposal_target.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/proposal_target.py
+import numpy as np
+from utils.cyops import kitti_utils, roipool3d_utils, iou3d_utils
+
+CLOSE_RANDOM = False 
+
+def get_proposal_target_func(cfg, mode='TRAIN'):
+
+    def sample_rois_for_rcnn(roi_boxes3d, gt_boxes3d):
+        """
+        :param roi_boxes3d: (B, M, 7)
+        :param gt_boxes3d: (B, N, 8) [x, y, z, h, w, l, ry, cls]
+        :return
+            batch_rois: (B, N, 7)
+            batch_gt_of_rois: (B, N, 8)
+            batch_roi_iou: (B, N)
+        """
+
+        batch_size = roi_boxes3d.shape[0]
+        
+        #batch_size = 1
+        fg_rois_per_image = int(np.round(cfg.RCNN.FG_RATIO * cfg.RCNN.ROI_PER_IMAGE))
+
+        batch_rois = np.zeros((batch_size, cfg.RCNN.ROI_PER_IMAGE, 7))
+        batch_gt_of_rois = np.zeros((batch_size, cfg.RCNN.ROI_PER_IMAGE, 7))
+        batch_roi_iou = np.zeros((batch_size, cfg.RCNN.ROI_PER_IMAGE))
+        for idx in range(batch_size):
+            cur_roi, cur_gt = roi_boxes3d[idx], gt_boxes3d[idx]
+            k = cur_gt.shape[0] - 1
+            while cur_gt[k].sum() == 0:
+                k -= 1
+            cur_gt = cur_gt[:k + 1]
+            # include gt boxes in the candidate rois
+            iou3d = iou3d_utils.boxes_iou3d(cur_roi, cur_gt[:, 0:7])  # (M, N)
+            max_overlaps = np.max(iou3d, axis=1)
+            gt_assignment = np.argmax(iou3d, axis=1)
+            # sample fg, easy_bg, hard_bg
+            fg_thresh = min(cfg.RCNN.REG_FG_THRESH, cfg.RCNN.CLS_FG_THRESH)
+            fg_inds = np.where(max_overlaps >= fg_thresh)[0].reshape(-1)
+
+            # TODO: this will mix the fg and bg when CLS_BG_THRESH_LO < iou < CLS_BG_THRESH
+            # fg_inds = torch.cat((fg_inds, roi_assignment), dim=0)  # consider the roi which has max_iou with gt as fg
+            easy_bg_inds = np.where(max_overlaps < cfg.RCNN.CLS_BG_THRESH_LO)[0].reshape(-1)
+            hard_bg_inds = np.where((max_overlaps < cfg.RCNN.CLS_BG_THRESH) & (max_overlaps >= cfg.RCNN.CLS_BG_THRESH_LO))[0].reshape(-1)
+
+            fg_num_rois = fg_inds.shape[0]
+            bg_num_rois = hard_bg_inds.shape[0] + easy_bg_inds.shape[0]
+
+            if fg_num_rois > 0 and bg_num_rois > 0:
+                # sampling fg
+                fg_rois_per_this_image = min(fg_rois_per_image, fg_num_rois)
+                if CLOSE_RANDOM:
+                    fg_inds = fg_inds[:fg_rois_per_this_image]
+                else:
+                    rand_num = np.random.permutation(fg_num_rois)
+                    fg_inds = fg_inds[rand_num[:fg_rois_per_this_image]]
+                
+                # sampling bg
+                bg_rois_per_this_image = cfg.RCNN.ROI_PER_IMAGE - fg_rois_per_this_image
+                bg_inds = sample_bg_inds(hard_bg_inds, easy_bg_inds, bg_rois_per_this_image)
+
+            elif fg_num_rois > 0 and bg_num_rois == 0:
+                # sampling fg
+                rand_num = np.floor(np.random.rand(cfg.RCNN.ROI_PER_IMAGE) * fg_num_rois)
+                # rand_num = torch.from_numpy(rand_num).type_as(gt_boxes3d).long()
+                fg_inds = fg_inds[rand_num]
+                fg_rois_per_this_image = cfg.RCNN.ROI_PER_IMAGE
+                bg_rois_per_this_image = 0
+            elif bg_num_rois > 0 and fg_num_rois == 0:
+                # sampling bg
+                bg_rois_per_this_image = cfg.RCNN.ROI_PER_IMAGE
+                bg_inds = sample_bg_inds(hard_bg_inds, easy_bg_inds, bg_rois_per_this_image)
+
+                fg_rois_per_this_image = 0
+            else:
+                import pdb
+                pdb.set_trace()
+                raise NotImplementedError
+            # augment the rois by noise
+            roi_list, roi_iou_list, roi_gt_list = [], [], []
+            if fg_rois_per_this_image > 0:
+                fg_rois_src = cur_roi[fg_inds]
+                gt_of_fg_rois = cur_gt[gt_assignment[fg_inds]]
+                iou3d_src = max_overlaps[fg_inds]
+                fg_rois, fg_iou3d = aug_roi_by_noise(
+                    fg_rois_src, gt_of_fg_rois, iou3d_src, aug_times=cfg.RCNN.ROI_FG_AUG_TIMES)
+                roi_list.append(fg_rois)
+                roi_iou_list.append(fg_iou3d)
+                roi_gt_list.append(gt_of_fg_rois)
+
+            if bg_rois_per_this_image > 0:
+                bg_rois_src = cur_roi[bg_inds]
+                gt_of_bg_rois = cur_gt[gt_assignment[bg_inds]]
+                iou3d_src = max_overlaps[bg_inds]
+                aug_times = 1 if cfg.RCNN.ROI_FG_AUG_TIMES > 0 else 0
+                bg_rois, bg_iou3d = aug_roi_by_noise(
+                    bg_rois_src, gt_of_bg_rois, iou3d_src, aug_times=aug_times)
+                roi_list.append(bg_rois)
+                roi_iou_list.append(bg_iou3d)
+                roi_gt_list.append(gt_of_bg_rois)
+
+            
+            rois = np.concatenate(roi_list, axis=0)
+            iou_of_rois = np.concatenate(roi_iou_list, axis=0)
+            gt_of_rois = np.concatenate(roi_gt_list, axis=0)
+            batch_rois[idx] = rois
+            batch_gt_of_rois[idx] = gt_of_rois
+            batch_roi_iou[idx] = iou_of_rois
+
+        return batch_rois, batch_gt_of_rois, batch_roi_iou
+
+    def sample_bg_inds(hard_bg_inds, easy_bg_inds, bg_rois_per_this_image):
+
+        if hard_bg_inds.shape[0] > 0 and easy_bg_inds.shape[0] > 0:
+            hard_bg_rois_num = int(bg_rois_per_this_image * cfg.RCNN.HARD_BG_RATIO)
+            easy_bg_rois_num = bg_rois_per_this_image - hard_bg_rois_num
+            # sampling hard bg
+            if CLOSE_RANDOM:
+                rand_idx = list(np.arange(0,hard_bg_inds.shape[0]))*hard_bg_rois_num
+                rand_idx = rand_idx[:hard_bg_rois_num]
+            else:
+                rand_idx = np.random.randint(low=0, high=hard_bg_inds.shape[0], size=(hard_bg_rois_num,))
+            hard_bg_inds = hard_bg_inds[rand_idx]
+            # sampling easy bg
+            if CLOSE_RANDOM:
+                rand_idx = list(np.arange(0,easy_bg_inds.shape[0]))*easy_bg_rois_num
+                rand_idx = rand_idx[:easy_bg_rois_num]
+            else:
+                rand_idx = np.random.randint(low=0, high=easy_bg_inds.shape[0], size=(easy_bg_rois_num,))
+            easy_bg_inds = easy_bg_inds[rand_idx]
+            bg_inds = np.concatenate([hard_bg_inds, easy_bg_inds], axis=0)
+        elif hard_bg_inds.shape[0] > 0 and easy_bg_inds.shape[0] == 0:
+            hard_bg_rois_num = bg_rois_per_this_image
+            # sampling hard bg
+            rand_idx = np.random.randint(low=0, high=hard_bg_inds.shape[0], size=(hard_bg_rois_num,))
+            bg_inds = hard_bg_inds[rand_idx]
+        elif hard_bg_inds.shape[0] == 0 and easy_bg_inds.shape[0] > 0:
+            easy_bg_rois_num = bg_rois_per_this_image
+            # sampling easy bg
+            rand_idx = np.random.randint(low=0, high=easy_bg_inds.shape[0], size=(easy_bg_rois_num,))
+            bg_inds = easy_bg_inds[rand_idx]
+        else:
+            raise NotImplementedError
+        
+        return bg_inds
+
+    def aug_roi_by_noise(roi_boxes3d, gt_boxes3d, iou3d_src, aug_times=10):
+        iou_of_rois = np.zeros(roi_boxes3d.shape[0]).astype(gt_boxes3d.dtype)
+        pos_thresh = min(cfg.RCNN.REG_FG_THRESH, cfg.RCNN.CLS_FG_THRESH)
+
+        for k in range(roi_boxes3d.shape[0]):
+            temp_iou = cnt = 0
+            roi_box3d = roi_boxes3d[k]
+
+            gt_box3d = gt_boxes3d[k].reshape(1, 7)
+            aug_box3d = roi_box3d
+            keep = True
+            while temp_iou < pos_thresh and cnt < aug_times:
+                if True: #np.random.rand() < 0.2:
+                    aug_box3d = roi_box3d  # p=0.2 to keep the original roi box
+                    keep = True
+                else:
+                    aug_box3d = random_aug_box3d(roi_box3d)
+                    keep = False
+                aug_box3d = aug_box3d.reshape((1, 7))
+                iou3d = iou3d_utils.boxes_iou3d(aug_box3d, gt_box3d)
+                temp_iou = iou3d[0][0]
+                cnt += 1
+            roi_boxes3d[k] = aug_box3d.reshape(-1)
+            if cnt == 0 or keep:
+                iou_of_rois[k] = iou3d_src[k]
+            else:
+                iou_of_rois[k] = temp_iou
+        return roi_boxes3d, iou_of_rois
+
+    def random_aug_box3d(box3d):
+        """
+        :param box3d: (7) [x, y, z, h, w, l, ry]
+        random shift, scale, orientation
+        """
+        if cfg.RCNN.REG_AUG_METHOD == 'single':
+            
+            pos_shift = (np.random.rand(3) - 0.5)  # [-0.5 ~ 0.5]
+            hwl_scale = (np.random.rand(3) - 0.5) / (0.5 / 0.15) + 1.0  #
+            angle_rot = (np.random.rand(1) - 0.5) / (0.5 / (np.pi / 12))  # [-pi/12 ~ pi/12]
+            aug_box3d = np.concatenate([box3d[0:3] + pos_shift, box3d[3:6] * hwl_scale, box3d[6:7] + angle_rot], axis=0)
+            return aug_box3d
+        elif cfg.RCNN.REG_AUG_METHOD == 'multiple':
+            # pos_range, hwl_range, angle_range, mean_iou
+            range_config = [[0.2, 0.1, np.pi / 12, 0.7],
+                            [0.3, 0.15, np.pi / 12, 0.6],
+                            [0.5, 0.15, np.pi / 9, 0.5],
+                            [0.8, 0.15, np.pi / 6, 0.3],
+                            [1.0, 0.15, np.pi / 3, 0.2]]
+            idx = np.random.randint(low=0, high=len(range_config), size=(1,))[0]
+            pos_shift = ((np.random.rand(3) - 0.5) / 0.5) * range_config[idx][0]
+            hwl_scale = ((np.random.rand(3) - 0.5) / 0.5) * range_config[idx][1] + 1.0
+            angle_rot = ((np.random.rand(1) - 0.5) / 0.5) * range_config[idx][2]
+            aug_box3d = np.concatenate([box3d[0:3] + pos_shift, box3d[3:6] * hwl_scale, box3d[6:7] + angle_rot], axis=0)
+            return aug_box3d
+        elif cfg.RCNN.REG_AUG_METHOD == 'normal':
+            x_shift = np.random.normal(loc=0, scale=0.3)
+            y_shift = np.random.normal(loc=0, scale=0.2)
+            z_shift = np.random.normal(loc=0, scale=0.3)
+            h_shift = np.random.normal(loc=0, scale=0.25)
+            w_shift = np.random.normal(loc=0, scale=0.15)
+            l_shift = np.random.normal(loc=0, scale=0.5)
+            ry_shift = ((np.random.rand() - 0.5) / 0.5) * np.pi / 12
+            aug_box3d = np.array([box3d[0] + x_shift, box3d[1] + y_shift, box3d[2] + z_shift, box3d[3] + h_shift,
+                                  box3d[4] + w_shift, box3d[5] + l_shift, box3d[6] + ry_shift], dtype=np.float32)
+            aug_box3d = aug_box3d.astype(box3d.dtype)
+            return aug_box3d
+        else:
+            raise NotImplementedError
+
+    def data_augmentation(pts, rois, gt_of_rois):
+        """
+        :param pts: (B, M, 512, 3)
+        :param rois: (B, M. 7)
+        :param gt_of_rois: (B, M, 7)
+        :return:
+        """
+        batch_size, boxes_num = pts.shape[0], pts.shape[1]
+
+        # rotation augmentation
+        angles = (np.random.rand(batch_size, boxes_num) - 0.5 / 0.5) * (np.pi / cfg.AUG_ROT_RANGE)
+        # calculate gt alpha from gt_of_rois
+        temp_x, temp_z, temp_ry = gt_of_rois[:, :, 0], gt_of_rois[:, :, 2], gt_of_rois[:, :, 6]
+        temp_beta = np.arctan2(temp_z, temp_x)
+        gt_alpha = -np.sign(temp_beta) * np.pi / 2 + temp_beta + temp_ry  # (B, M)
+
+        temp_x, temp_z, temp_ry = rois[:, :, 0], rois[:, :, 2], rois[:, :, 6]
+        temp_beta = np.arctan2(temp_z, temp_x)
+        roi_alpha = -np.sign(temp_beta) * np.pi / 2 + temp_beta + temp_ry  # (B, M)
+
+        for k in range(batch_size):
+            pts[k] = kitti_utils.rotate_pc_along_y_np(pts[k], angles[k])
+            gt_of_rois[k] = np.squeeze(kitti_utils.rotate_pc_along_y_np(
+                np.expand_dims(gt_of_rois[k], axis=1), angles[k]), axis=1)
+            rois[k] = np.squeeze(kitti_utils.rotate_pc_along_y_np(
+                np.expand_dims(rois[k], axis=1), angles[k]),axis=1)
+
+            # calculate the ry after rotation
+            temp_x, temp_z = gt_of_rois[:, :, 0], gt_of_rois[:, :, 2]
+            temp_beta = np.arctan2(temp_z, temp_x)
+            gt_of_rois[:, :, 6] = np.sign(temp_beta) * np.pi / 2 + gt_alpha - temp_beta
+            temp_x, temp_z = rois[:, :, 0], rois[:, :, 2]
+            temp_beta = np.arctan2(temp_z, temp_x)
+            rois[:, :, 6] = np.sign(temp_beta) * np.pi / 2 + roi_alpha - temp_beta
+        # scaling augmentation
+        scales = 1 + ((np.random.rand(batch_size, boxes_num) - 0.5) / 0.5) * 0.05
+        pts = pts * np.expand_dims(np.expand_dims(scales, axis=2), axis=3)
+        gt_of_rois[:, :, 0:6] = gt_of_rois[:, :, 0:6] * np.expand_dims(scales, axis=2)
+        rois[:, :, 0:6] = rois[:, :, 0:6] * np.expand_dims(scales, axis=2)
+
+        # flip augmentation
+        flip_flag = np.sign(np.random.rand(batch_size, boxes_num) - 0.5)
+        pts[:, :, :, 0] = pts[:, :, :, 0] * np.expand_dims(flip_flag, axis=2)
+        gt_of_rois[:, :, 0] = gt_of_rois[:, :, 0] * flip_flag
+        # flip orientation: ry > 0: pi - ry, ry < 0: -pi - ry
+        src_ry = gt_of_rois[:, :, 6]
+        ry = (flip_flag == 1).astype(np.float32) * src_ry + (flip_flag == -1).astype(np.float32) * (np.sign(src_ry) * np.pi - src_ry)
+        gt_of_rois[:, :, 6] = ry
+
+        rois[:, :, 0] = rois[:, :, 0] * flip_flag
+        # flip orientation: ry > 0: pi - ry, ry < 0: -pi - ry
+        src_ry = rois[:, :, 6]
+        ry = (flip_flag == 1).astype(np.float32) * src_ry + (flip_flag == -1).astype(np.float32) * (np.sign(src_ry) * np.pi - src_ry)
+        rois[:, :, 6] = ry
+
+        return pts, rois, gt_of_rois
+
+    def generate_proposal_target(seg_mask,rpn_features,gt_boxes3d,rpn_xyz,pts_depth,roi_boxes3d,rpn_intensity):
+        seg_mask = np.array(seg_mask)
+        features = np.array(rpn_features)
+        gt_boxes3d = np.array(gt_boxes3d)
+        rpn_xyz = np.array(rpn_xyz)
+        pts_depth = np.array(pts_depth)
+        roi_boxes3d = np.array(roi_boxes3d)
+        rpn_intensity = np.array(rpn_intensity)
+        batch_rois, batch_gt_of_rois, batch_roi_iou = sample_rois_for_rcnn(roi_boxes3d, gt_boxes3d)
+        
+        if cfg.RCNN.USE_INTENSITY:
+            pts_extra_input_list = [np.expand_dims(rpn_intensity, axis=2),
+                                    np.expand_dims(seg_mask, axis=2)]
+        else:
+            pts_extra_input_list = [np.expand_dims(seg_mask, axis=2)]
+
+        if cfg.RCNN.USE_DEPTH:
+            pts_depth = pts_depth / 70.0 - 0.5
+            pts_extra_input_list.append(np.expand_dims(pts_depth, axis=2))
+        pts_extra_input = np.concatenate(pts_extra_input_list, axis=2)
+        
+        # point cloud pooling
+        pts_feature = np.concatenate((pts_extra_input, rpn_features), axis=2)
+        
+        batch_rois = batch_rois.astype(np.float32)
+
+        pooled_features, pooled_empty_flag = roipool3d_utils.roipool3d_gpu(
+            rpn_xyz, pts_feature, batch_rois, cfg.RCNN.POOL_EXTRA_WIDTH,
+            sampled_pt_num=cfg.RCNN.NUM_POINTS
+        )
+
+        sampled_pts, sampled_features = pooled_features[:, :, :, 0:3], pooled_features[:, :, :, 3:]
+        # data augmentation
+        if cfg.AUG_DATA:
+            # data augmentation
+            sampled_pts, batch_rois, batch_gt_of_rois = \
+                data_augmentation(sampled_pts, batch_rois, batch_gt_of_rois)
+
+        # canonical transformation
+        batch_size = batch_rois.shape[0]
+        roi_ry = batch_rois[:, :, 6] % (2 * np.pi)
+        roi_center = batch_rois[:, :, 0:3]
+        sampled_pts = sampled_pts - np.expand_dims(roi_center, axis=2)  # (B, M, 512, 3)
+        batch_gt_of_rois[:, :, 0:3] = batch_gt_of_rois[:, :, 0:3] - roi_center
+        batch_gt_of_rois[:, :, 6] = batch_gt_of_rois[:, :, 6] - roi_ry
+
+        for k in range(batch_size):
+            sampled_pts[k] = kitti_utils.rotate_pc_along_y_np(sampled_pts[k], batch_rois[k, :, 6])
+            batch_gt_of_rois[k] = np.squeeze(kitti_utils.rotate_pc_along_y_np(
+                np.expand_dims(batch_gt_of_rois[k], axis=1), roi_ry[k]), axis=1)
+
+        # regression valid mask
+        valid_mask = (pooled_empty_flag == 0)
+        reg_valid_mask = ((batch_roi_iou > cfg.RCNN.REG_FG_THRESH) & valid_mask).astype(np.float32)
+    
+        # classification label
+        batch_cls_label = (batch_roi_iou > cfg.RCNN.CLS_FG_THRESH).astype(np.int64)
+        invalid_mask = (batch_roi_iou > cfg.RCNN.CLS_BG_THRESH) & (batch_roi_iou < cfg.RCNN.CLS_FG_THRESH)
+        batch_cls_label[valid_mask == 0] = -1
+        batch_cls_label[invalid_mask > 0] = -1
+
+        output_dict = {'sampled_pts': sampled_pts.reshape(-1, cfg.RCNN.NUM_POINTS, 3).astype(np.float32),
+                       'pts_feature': sampled_features.reshape(-1, cfg.RCNN.NUM_POINTS, sampled_features.shape[3]).astype(np.float32),
+                       'cls_label': batch_cls_label.reshape(-1),
+                       'reg_valid_mask': reg_valid_mask.reshape(-1).astype(np.float32),
+                       'gt_of_rois': batch_gt_of_rois.reshape(-1, 7).astype(np.float32),
+                       'gt_iou': batch_roi_iou.reshape(-1).astype(np.float32),
+                       'roi_boxes3d': batch_rois.reshape(-1, 7).astype(np.float32)}
+        
+        return output_dict.values()
+
+    return generate_proposal_target
+
+
+if __name__ == "__main__":
+    
+    input_dict = {}
+    input_dict['roi_boxes3d'] = np.load("models/rpn_data/roi_boxes3d.npy")
+    input_dict['gt_boxes3d'] = np.load("models/rpn_data/gt_boxes3d.npy")
+    input_dict['rpn_xyz'] = np.load("models/rpn_data/rpn_xyz.npy")
+    input_dict['rpn_features'] = np.load("models/rpn_data/rpn_features.npy")
+    input_dict['rpn_intensity'] = np.load("models/rpn_data/rpn_intensity.npy")
+    input_dict['seg_mask'] = np.load("models/rpn_data/seg_mask.npy")
+    input_dict['pts_depth'] = np.load("models/rpn_data/pts_depth.npy")
+    for k, v in input_dict.items():
+        print(k, v.shape, np.sum(np.abs(v)))
+        input_dict[k] = np.expand_dims(v, axis=0)
+
+    from utils.config import cfg
+    cfg.RPN.LOC_XZ_FINE = True
+    cfg.TEST.RPN_DISTANCE_BASED_PROPOSE = False
+    cfg.RPN.NMS_TYPE = 'rotate'
+
+    proposal_target_func = get_proposal_target_func(cfg)
+    out_dict = proposal_target_func(input_dict['seg_mask'],input_dict['rpn_features'],input_dict['gt_boxes3d'],
+                                    input_dict['rpn_xyz'],input_dict['pts_depth'],input_dict['roi_boxes3d'],input_dict['rpn_intensity'])
+    for key in out_dict.keys():
+        print("name:{}, shape{}".format(key,out_dict[key].shape))
--- a/PaddleCV/Paddle3D/PointRCNN/utils/proposal_utils.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/proposal_utils.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+"""
+Contains proposal functions
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import paddle.fluid as fluid
+
+import utils.box_utils as box_utils
+from utils.config import cfg
+
+__all__ = ["get_proposal_func"]
+
+
+def get_proposal_func(cfg, mode='TRAIN'):
+    def decode_bbox_target(roi_box3d, pred_reg, anchor_size, loc_scope,
+                           loc_bin_size, num_head_bin, get_xz_fine=True,
+                           loc_y_scope=0.5, loc_y_bin_size=0.25,
+                           get_y_by_bin=False, get_ry_fine=False):
+        per_loc_bin_num = int(loc_scope / loc_bin_size) * 2
+        loc_y_bin_num = int(loc_y_scope / loc_y_bin_size) * 2
+        
+        # recover xz localization
+        x_bin_l, x_bin_r = 0, per_loc_bin_num
+        z_bin_l, z_bin_r = per_loc_bin_num, per_loc_bin_num * 2
+        start_offset = z_bin_r
+        
+        x_bin = np.argmax(pred_reg[:, x_bin_l: x_bin_r], axis=1)
+        z_bin = np.argmax(pred_reg[:, z_bin_l: z_bin_r], axis=1)
+        
+        pos_x = x_bin.astype('float32') * loc_bin_size + loc_bin_size / 2 - loc_scope
+        pos_z = z_bin.astype('float32') * loc_bin_size + loc_bin_size / 2 - loc_scope
+        if get_xz_fine:
+            x_res_l, x_res_r = per_loc_bin_num * 2, per_loc_bin_num * 3
+            z_res_l, z_res_r = per_loc_bin_num * 3, per_loc_bin_num * 4
+            start_offset = z_res_r
+            
+            x_res_norm = pred_reg[:, x_res_l:x_res_r][np.arange(len(x_bin)), x_bin]
+            z_res_norm = pred_reg[:, z_res_l:z_res_r][np.arange(len(z_bin)), z_bin]
+
+            x_res = x_res_norm * loc_bin_size
+            z_res = z_res_norm * loc_bin_size
+            pos_x += x_res
+            pos_z += z_res
+
+        # recover y localization
+        if get_y_by_bin:
+            y_bin_l, y_bin_r = start_offset, start_offset + loc_y_bin_num
+            y_res_l, y_res_r = y_bin_r, y_bin_r + loc_y_bin_num
+            start_offset = y_res_r
+
+            y_bin = np.argmax(pred_reg[:, y_bin_l: y_bin_r], axis=1)
+            y_res_norm = pred_reg[:, y_res_l:y_res_r][np.arange(len(y_bin)), y_bin]
+            y_res = y_res_norm * loc_y_bin_size
+            pos_y = y_bin.astype('float32') * loc_y_bin_size + loc_y_bin_size / 2 - loc_y_scope + y_res
+            pos_y = pos_y + np.array(roi_box3d[:, 1]).reshape(-1)
+        else:
+            y_offset_l, y_offset_r = start_offset, start_offset + 1
+            start_offset = y_offset_r
+            
+            pos_y = np.array(roi_box3d[:, 1]) + np.array(pred_reg[:, y_offset_l])
+            pos_y = pos_y.reshape(-1)
+
+        # recover ry rotation
+        ry_bin_l, ry_bin_r = start_offset, start_offset + num_head_bin
+        ry_res_l, ry_res_r = ry_bin_r, ry_bin_r + num_head_bin
+        
+        ry_bin = np.argmax(pred_reg[:, ry_bin_l: ry_bin_r], axis=1)
+        ry_res_norm = pred_reg[:, ry_res_l:ry_res_r][np.arange(len(ry_bin)), ry_bin]
+        if get_ry_fine:
+            # divide pi/2 into several bins
+            angle_per_class = (np.pi / 2) / num_head_bin
+            ry_res = ry_res_norm * (angle_per_class / 2)
+            ry = (ry_bin.astype('float32') * angle_per_class + angle_per_class / 2) + ry_res - np.pi / 4
+        else:
+            angle_per_class = (2 * np.pi) / num_head_bin
+            ry_res = ry_res_norm * (angle_per_class / 2)
+            
+            # bin_center is (0, 30, 60, 90, 120, ..., 270, 300, 330)
+            ry = np.fmod(ry_bin.astype('float32') * angle_per_class + ry_res, 2 * np.pi)
+            ry[ry > np.pi] -= 2 * np.pi
+        
+        # recover size
+        size_res_l, size_res_r = ry_res_r, ry_res_r + 3
+        assert size_res_r == pred_reg.shape[1]
+        
+        size_res_norm = pred_reg[:, size_res_l: size_res_r]
+        hwl = size_res_norm * anchor_size + anchor_size
+        
+        def rotate_pc_along_y(pc, angle):
+            cosa = np.cos(angle).reshape(-1, 1)
+            sina = np.sin(angle).reshape(-1, 1)
+            
+            R = np.concatenate([cosa, -sina, sina, cosa], axis=-1).reshape(-1, 2, 2)
+            pc_temp = pc[:, [0, 2]].reshape(-1, 1, 2)
+            pc[:, [0, 2]] = np.matmul(pc_temp, R.transpose(0, 2, 1)).reshape(-1, 2)
+            
+            return pc
+        
+        # shift to original coords
+        roi_center = np.array(roi_box3d[:, 0:3])
+        shift_ret_box3d = np.concatenate((
+            pos_x.reshape(-1, 1),
+            pos_y.reshape(-1, 1),
+            pos_z.reshape(-1, 1),
+            hwl, ry.reshape(-1, 1)), axis=1)
+        ret_box3d = shift_ret_box3d
+        if roi_box3d.shape[1] == 7:
+            roi_ry = np.array(roi_box3d[:, 6]).reshape(-1)
+            ret_box3d = rotate_pc_along_y(np.array(shift_ret_box3d), -roi_ry)
+            ret_box3d[:, 6] += roi_ry
+        ret_box3d[:, [0, 2]] += roi_center[:, [0, 2]]
+        return ret_box3d
+
+    def distance_based_proposal(scores, proposals, sorted_idxs):
+        nms_range_list = [0, 40.0, 80.0]
+        pre_tot_top_n = cfg[mode].RPN_PRE_NMS_TOP_N
+        pre_top_n_list = [0, int(pre_tot_top_n * 0.7), pre_tot_top_n - int(pre_tot_top_n * 0.7)]
+        post_tot_top_n = cfg[mode].RPN_POST_NMS_TOP_N
+        post_top_n_list = [0, int(post_tot_top_n * 0.7), post_tot_top_n - int(post_tot_top_n * 0.7)]
+
+        batch_size = scores.shape[0]
+        ret_proposals = np.zeros((batch_size, cfg[mode].RPN_POST_NMS_TOP_N, 7), dtype='float32')
+        ret_scores= np.zeros((batch_size, cfg[mode].RPN_POST_NMS_TOP_N, 1), dtype='float32')
+
+        for b, (score, proposal, sorted_idx) in enumerate(zip(scores, proposals, sorted_idxs)):
+            # sort by score
+            score_ord = score[sorted_idx]
+            proposal_ord = proposal[sorted_idx]
+
+            dist = proposal_ord[:, 2]
+            first_mask = (dist > nms_range_list[0]) & (dist <= nms_range_list[1])
+
+            scores_single_list, proposals_single_list = [], []
+            for i in range(1, len(nms_range_list)):
+                # get proposal distance mask
+                dist_mask = ((dist > nms_range_list[i - 1]) & (dist <= nms_range_list[i]))
+
+                if dist_mask.sum() != 0:
+                    # this area has points, reduce by mask
+                    cur_scores = score_ord[dist_mask]
+                    cur_proposals = proposal_ord[dist_mask]
+
+                    # fetch pre nms top K
+                    cur_scores = cur_scores[:pre_top_n_list[i]]
+                    cur_proposals = cur_proposals[:pre_top_n_list[i]]
+                else:
+                    assert i == 2, '%d' % i
+                    # this area doesn't have any points, so use rois of first area
+                    cur_scores = score_ord[first_mask]
+                    cur_proposals = proposal_ord[first_mask]
+
+                    # fetch top K of first area
+                    cur_scores = cur_scores[pre_top_n_list[i - 1]:][:pre_top_n_list[i]]
+                    cur_proposals = cur_proposals[pre_top_n_list[i - 1]:][:pre_top_n_list[i]]
+
+                # oriented nms
+                boxes_bev = box_utils.boxes3d_to_bev(cur_proposals)
+                s_scores, s_proposals = box_utils.box_nms(
+                        boxes_bev, cur_scores, cur_proposals,
+                        cfg[mode].RPN_NMS_THRESH, post_top_n_list[i],
+                        cfg.RPN.NMS_TYPE)
+                if len(s_scores) > 0:
+                    scores_single_list.append(s_scores)
+                    proposals_single_list.append(s_proposals)
+
+            scores_single = np.concatenate(scores_single_list, axis=0)
+            proposals_single = np.concatenate(proposals_single_list, axis=0)
+
+            prop_num = proposals_single.shape[0]
+            ret_scores[b, :prop_num, 0] = scores_single
+            ret_proposals[b, :prop_num] = proposals_single 
+        # ret_proposals.tofile("proposal.data")
+        # ret_scores.tofile("score.data")
+        return np.concatenate([ret_proposals, ret_scores], axis=-1)
+
+    def score_based_proposal(scores, proposals, sorted_idxs):
+        batch_size = scores.shape[0]
+        ret_proposals = np.zeros((batch_size, cfg[mode].RPN_POST_NMS_TOP_N, 7), dtype='float32')
+        ret_scores= np.zeros((batch_size, cfg[mode].RPN_POST_NMS_TOP_N, 1), dtype='float32')
+        for b, (score, proposal, sorted_idx) in enumerate(zip(scores, proposals, sorted_idxs)):
+            # sort by score
+            score_ord = score[sorted_idx]
+            proposal_ord = proposal[sorted_idx]
+
+            # pre nms top K
+            cur_scores = score_ord[:cfg[mode].RPN_PRE_NMS_TOP_N]
+            cur_proposals = proposal_ord[:cfg[mode].RPN_PRE_NMS_TOP_N]
+
+            boxes_bev = box_utils.boxes3d_to_bev(cur_proposals)
+            s_scores, s_proposals = box_utils.box_nms(
+                    boxes_bev, cur_scores, cur_proposals,
+                    cfg[mode].RPN_NMS_THRESH,
+                    cfg[mode].RPN_POST_NMS_TOP_N,
+                    'rotate')
+            prop_num = len(s_proposals)
+            ret_scores[b, :prop_num, 0] = s_scores 
+            ret_proposals[b, :prop_num] = s_proposals 
+        # ret_proposals.tofile("proposal.data")
+        # ret_scores.tofile("score.data")
+        return np.concatenate([ret_proposals, ret_scores], axis=-1)
+
+    def generate_proposal(x):
+        rpn_scores = np.array(x[:, :, 0])[:, :, 0]
+        roi_box3d = x[:, :, 1:4]
+        pred_reg = x[:, :, 4:]
+
+        proposals = decode_bbox_target(
+                np.array(roi_box3d).reshape(-1, roi_box3d.shape()[-1]), 
+                np.array(pred_reg).reshape(-1, pred_reg.shape()[-1]), 
+                anchor_size=np.array(cfg.CLS_MEAN_SIZE[0], dtype='float32'),
+	       	loc_scope=cfg.RPN.LOC_SCOPE,
+	       	loc_bin_size=cfg.RPN.LOC_BIN_SIZE,
+	       	num_head_bin=cfg.RPN.NUM_HEAD_BIN,
+	       	get_xz_fine=cfg.RPN.LOC_XZ_FINE,
+	       	get_y_by_bin=False,
+	       	get_ry_fine=False)
+        proposals[:, 1] += proposals[:, 3] / 2
+        proposals = proposals.reshape(rpn_scores.shape[0], -1, proposals.shape[-1])
+
+        sorted_idxs = np.argsort(-rpn_scores, axis=-1)
+
+        if cfg.TEST.RPN_DISTANCE_BASED_PROPOSE:
+            ret = distance_based_proposal(rpn_scores, proposals, sorted_idxs)
+        else:
+            ret = score_based_proposal(rpn_scores, proposals, sorted_idxs)
+
+        return ret
+
+
+    return generate_proposal 
+
+
+if __name__ == "__main__":
+    np.random.seed(3333)
+    x_np = np.random.random((4, 256, 84)).astype('float32')
+
+    from config import cfg
+    cfg.RPN.LOC_XZ_FINE = True
+    # cfg.TEST.RPN_DISTANCE_BASED_PROPOSE = False
+    # cfg.RPN.NMS_TYPE = 'rotate'
+    proposal_func = get_proposal_func(cfg)
+
+    x = fluid.layers.data(name="x", shape=[256, 84], dtype='float32')
+    proposal = fluid.default_main_program().current_block().create_var(
+                    name="proposal", dtype='float32', shape=[256, 7])
+    fluid.layers.py_func(proposal_func, x, proposal)
+    loss = fluid.layers.reduce_mean(proposal)
+
+    place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    ret = exe.run(fetch_list=[proposal.name, loss.name], feed={'x': x_np})
+    print(ret)
--- a/PaddleCV/Paddle3D/PointRCNN/utils/pts_utils/CMakeLists.txt
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/pts_utils/CMakeLists.txt
+
+cmake_minimum_required(VERSION 2.8.12)
+project(pts_utils)
+
+add_subdirectory(pybind11)
+pybind11_add_module(pts_utils pts_utils.cpp)
--- a/PaddleCV/Paddle3D/PointRCNN/utils/pts_utils/pts_utils.cpp
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/pts_utils/pts_utils.cpp
+#include <pybind11/pybind11.h>
+#include <pybind11/numpy.h>
+#include <math.h>
+
+namespace py = pybind11;
+
+int pt_in_box3d(float x, float y, float z, float cx, float cy, float cz, float h, float w, float l, float cosa, float sina) {
+	if ((fabsf(x - cx) > 10.) || (fabsf(y - cy) > h / 2.0) || (fabsf(z - cz) > 10.)){
+			return 0;
+	}
+
+	float x_rot = (x - cx) * cosa + (z - cz) * (-sina);
+	float z_rot = (x - cx) * sina + (z - cz) * cosa;
+
+	int in_flag = static_cast<int>((x_rot >= -l / 2.0) & (x_rot <= l / 2.0) & (z_rot >= -w / 2.0) & (z_rot <= w / 2.0));
+	return in_flag;
+}
+
+py::array_t<int> pts_in_boxes3d(py::array_t<float> pts, py::array_t<float> boxes) {
+  py::buffer_info pts_buf= pts.request(), boxes_buf = boxes.request();
+
+  if (pts_buf.ndim != 2 || boxes_buf.ndim != 2) {
+    throw std::runtime_error("Number of dimensions must be 2");
+  }
+  if (pts_buf.shape[1] != 3) {
+    throw std::runtime_error("pts 2nd dimension must be 3");
+  }
+  if (boxes_buf.shape[1] != 7) {
+    throw std::runtime_error("boxes 2nd dimension must be 7");
+  }
+
+  auto pts_num = pts_buf.shape[0];
+  auto boxes_num = boxes_buf.shape[0];
+  auto mask = py::array_t<int>(pts_num * boxes_num);
+  py::buffer_info mask_buf = mask.request();
+
+  float *pts_ptr = (float *) pts_buf.ptr,
+        *boxes_ptr = (float *) boxes_buf.ptr;
+  int *mask_ptr = (int *) mask_buf.ptr;
+
+  for (ssize_t i = 0; i < boxes_num; i++) {
+    float cx = boxes_ptr[i * 7];
+    float cy = boxes_ptr[i * 7 + 1] - boxes_ptr[i * 7 + 3] / 2.;
+    float cz = boxes_ptr[i * 7 + 2];
+    float h = boxes_ptr[i * 7 + 3];
+    float w = boxes_ptr[i * 7 + 4];
+    float l = boxes_ptr[i * 7 + 5];
+    float angle = boxes_ptr[i * 7 + 6];
+    float cosa = cosf(angle);
+    float sina = sinf(angle);
+    for (ssize_t j = 0; j < pts_num; j++) {
+      mask_ptr[i * pts_num + j] = pt_in_box3d(pts_ptr[j * 3], pts_ptr[j * 3 + 1], pts_ptr[j * 3 + 2], cx, cy, cz, h, w, l, cosa, sina);
+    }
+  }
+
+  mask.resize({boxes_num, pts_num});
+  return mask;
+}
+
+PYBIND11_MODULE(pts_utils, m) {
+    m.def("pts_in_boxes3d", &pts_in_boxes3d, "Calculate mask for whether points in boxes3d");
+}
--- a/PaddleCV/Paddle3D/PointRCNN/utils/pts_utils/setup.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/pts_utils/setup.py
+from setuptools import setup
+from setuptools import Extension
+
+setup(
+    name='pts_utils',
+    ext_modules = [Extension(
+        name='pts_utils',
+        sources=['pts_utils.cpp'],
+        include_dirs=[r'../../pybind11/include'],
+        extra_compile_args=['-std=c++11']
+    )],
+)
--- a/PaddleCV/Paddle3D/PointRCNN/utils/pts_utils/test.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/pts_utils/test.py
+import numpy as np
+import pts_utils
+
+a = np.random.random((16384, 3)).astype('float32')
+b = np.random.random((64, 7)).astype('float32')
+c = pts_utils.pts_in_boxes3d(a, b)
+print(a, b, c, c.shape, np.sum(c))
--- a/PaddleCV/Paddle3D/PointRCNN/utils/run_utils.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/run_utils.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+"""
+Contains common utility functions.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import sys
+import six
+import logging
+import numpy as np
+import paddle.fluid as fluid
+
+__all__ = ["check_gpu",  "print_arguments", "parse_outputs", "Stat"]
+
+logger = logging.getLogger(__name__)
+
+
+def check_gpu(use_gpu):
+    """
+    Log error and exit when set use_gpu=True in paddlepaddle
+    cpu version.
+    """
+    err = "Config use_gpu cannot be set as True while you are " \
+          "using paddlepaddle cpu version ! \nPlease try: \n" \
+          "\t1. Install paddlepaddle-gpu to run model on GPU \n" \
+          "\t2. Set --use_gpu=False to run model on CPU"
+
+    try:
+        if use_gpu and not fluid.is_compiled_with_cuda():
+            logger.error(err)
+            sys.exit(1)
+    except Exception as e:
+        pass
+
+
+def print_arguments(args):
+    """Print argparse's arguments.
+
+    Usage:
+
+    .. code-block:: python
+
+        parser = argparse.ArgumentParser()
+        parser.add_argument("name", default="Jonh", type=str, help="User name.")
+        args = parser.parse_args()
+        print_arguments(args)
+
+    :param args: Input argparse.Namespace for printing.
+    :type args: argparse.Namespace
+    """
+    logger.info("-----------  Configuration Arguments -----------")
+    for arg, value in sorted(six.iteritems(vars(args))):
+        logger.info("%s: %s" % (arg, value))
+    logger.info("------------------------------------------------")
+
+
+def parse_outputs(outputs, filter_key=None, extra_keys=None, prog=None):
+    keys, values = [], []
+    for k, v in outputs.items():
+        if filter_key is not None and k.find(filter_key) < 0:
+            continue
+        keys.append(k)
+        v.persistable = True
+        values.append(v.name)
+
+    if prog is not None and extra_keys is not None:
+        for k in extra_keys:
+            try:
+                v = fluid.framework._get_var(k, prog)
+                keys.append(k)
+                v.persistable = True
+                values.append(v.name)
+            except:
+                pass
+    return keys, values
+
+
+class Stat(object):
+    def __init__(self):
+        self.stats = {}
+
+    def update(self, keys, values):
+        for k, v in zip(keys, values):
+            if k not in self.stats:
+                self.stats[k] = []
+            self.stats[k].append(v)
+
+    def reset(self):
+        self.stats = {}
+
+    def get_mean_log(self):
+        log = ""
+        for k, v in self.stats.items():
+            log += "avg_{}: {:.4f}, ".format(k, np.mean(v))
+        return log
--- a/PaddleCV/Paddle3D/PointRCNN/utils/save_utils.py
+++ b/PaddleCV/Paddle3D/PointRCNN/utils/save_utils.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import numpy as np
+from utils.config import cfg
+from utils import calibration as calib
+import utils.cyops.kitti_utils as kitti_utils 
+
+__all__ = ['save_rpn_feature', 'save_kitti_result', 'save_kitti_format']
+
+
+def save_rpn_feature(rets, kitti_features_dir):
+    """
+    save rpn features for RCNN offline training
+    """
+
+    sample_id = rets['sample_id'][0]
+    backbone_xyz = rets['backbone_xyz'][0]
+    backbone_feature = rets['backbone_feature'][0]
+    pts_features = rets['pts_features'][0]
+    seg_mask = rets['seg_mask'][0]
+    rpn_cls = rets['rpn_cls'][0]
+
+    for i in range(len(sample_id)):
+        pts_intensity = pts_features[i, :, 0]
+        s_id = sample_id[i, 0]
+
+        output_file = os.path.join(kitti_features_dir, '%06d.npy' % s_id)
+        xyz_file = os.path.join(kitti_features_dir, '%06d_xyz.npy' % s_id)
+        seg_file = os.path.join(kitti_features_dir, '%06d_seg.npy' % s_id)
+        intensity_file = os.path.join(
+            kitti_features_dir, '%06d_intensity.npy' % s_id)
+        np.save(output_file, backbone_feature[i])
+        np.save(xyz_file, backbone_xyz[i])
+        np.save(seg_file, seg_mask[i])
+        np.save(intensity_file, pts_intensity)
+        rpn_scores_raw_file = os.path.join(
+            kitti_features_dir, '%06d_rawscore.npy' % s_id)
+        np.save(rpn_scores_raw_file, rpn_cls[i])
+
+
+def save_kitti_result(rets, seg_output_dir, kitti_output_dir, reader, classes):
+    sample_id = rets['sample_id'][0]
+    roi_scores_row = rets['roi_scores_row'][0]
+    bboxes3d = rets['rois'][0]
+    pts_rect = rets['pts_rect'][0]
+    seg_mask = rets['seg_mask'][0]
+    rpn_cls_label = rets['rpn_cls_label'][0]
+    gt_boxes3d = rets['gt_boxes3d'][0]
+    gt_boxes3d_num = rets['gt_boxes3d'][1]
+
+    for i in range(len(sample_id)):
+        s_id = sample_id[i, 0]
+
+        seg_result_data = np.concatenate((pts_rect[i].reshape(-1, 3),
+                                          rpn_cls_label[i].reshape(-1, 1),
+                                          seg_mask[i].reshape(-1, 1)),
+                                         axis=1).astype('float16')
+        seg_output_file = os.path.join(seg_output_dir, '%06d.npy' % s_id)
+        np.save(seg_output_file, seg_result_data)
+
+        scores = roi_scores_row[i, :]
+        bbox3d = bboxes3d[i, :]
+        img_shape = reader.get_image_shape(s_id)
+        calib = reader.get_calib(s_id)
+
+        corners3d = kitti_utils.boxes3d_to_corners3d(bbox3d)
+        img_boxes, _ = calib.corners3d_to_img_boxes(corners3d)
+
+        img_boxes[:, 0] = np.clip(img_boxes[:, 0], 0, img_shape[1] - 1)
+        img_boxes[:, 1] = np.clip(img_boxes[:, 1], 0, img_shape[0] - 1)
+        img_boxes[:, 2] = np.clip(img_boxes[:, 2], 0, img_shape[1] - 1)
+        img_boxes[:, 3] = np.clip(img_boxes[:, 3], 0, img_shape[0] - 1)
+
+        img_boxes_w = img_boxes[:, 2] - img_boxes[:, 0]
+        img_boxes_h = img_boxes[:, 3] - img_boxes[:, 1]
+        box_valid_mask = np.logical_and(
+            img_boxes_w < img_shape[1] * 0.8, img_boxes_h < img_shape[0] * 0.8)
+
+        kitti_output_file = os.path.join(kitti_output_dir, '%06d.txt' % s_id)
+        with open(kitti_output_file, 'w') as f:
+            for k in range(bbox3d.shape[0]):
+                if box_valid_mask[k] == 0:
+                    continue
+                x, z, ry = bbox3d[k, 0], bbox3d[k, 2], bbox3d[k, 6]
+                beta = np.arctan2(z, x)
+                alpha = -np.sign(beta) * np.pi / 2 + beta + ry
+
+                f.write('{} -1 -1 {:.4f} {:.4f} {:.4f} {:.4f} {:.4f} {:.4f} {:.4f} {:.4f} {:.4f} {:.4f} {:.4f} {:.4f} {:.4f}\n'.format(
+                    classes, alpha, img_boxes[k, 0], img_boxes[k, 1], img_boxes[k, 2], img_boxes[k, 3],
+                    bbox3d[k, 3], bbox3d[k, 4], bbox3d[k, 5], bbox3d[k, 0], bbox3d[k, 1], bbox3d[k, 2],
+                    bbox3d[k, 6], scores[k]))
+
+
+def save_kitti_format(sample_id, calib, bbox3d, kitti_output_dir, scores, img_shape):
+    corners3d = kitti_utils.boxes3d_to_corners3d(bbox3d)
+    img_boxes, _ = calib.corners3d_to_img_boxes(corners3d)
+    img_boxes[:, 0] = np.clip(img_boxes[:, 0], 0, img_shape[1] - 1)
+    img_boxes[:, 1] = np.clip(img_boxes[:, 1], 0, img_shape[0] - 1)
+    img_boxes[:, 2] = np.clip(img_boxes[:, 2], 0, img_shape[1] - 1)
+    img_boxes[:, 3] = np.clip(img_boxes[:, 3], 0, img_shape[0] - 1)
+
+    img_boxes_w = img_boxes[:, 2] - img_boxes[:, 0]
+    img_boxes_h = img_boxes[:, 3] - img_boxes[:, 1]
+    box_valid_mask = np.logical_and(img_boxes_w < img_shape[1] * 0.8, img_boxes_h < img_shape[0] * 0.8)
+
+    kitti_output_file = os.path.join(kitti_output_dir, '%06d.txt' % sample_id)
+    with open(kitti_output_file, 'w') as f:
+        for k in range(bbox3d.shape[0]):
+            if box_valid_mask[k] == 0:
+                continue
+            x, z, ry = bbox3d[k, 0], bbox3d[k, 2], bbox3d[k, 6]
+            beta = np.arctan2(z, x)
+            alpha = -np.sign(beta) * np.pi / 2 + beta + ry
+
+            f.write('%s -1 -1 %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f %.4f\n' %
+                  (cfg.CLASSES, alpha, img_boxes[k, 0], img_boxes[k, 1], img_boxes[k, 2], img_boxes[k, 3],
+                   bbox3d[k, 3], bbox3d[k, 4], bbox3d[k, 5], bbox3d[k, 0], bbox3d[k, 1], bbox3d[k, 2],
+                   bbox3d[k, 6], scores[k]))
+