add industrial code and doc (#5493)

* add industrial code and doc

add industrial code and doc (#5493)
* add industrial code and doc
8f6106cb · littletomatodonkey · GitHub · 1df0f648 · 8f6106cb · 8f6106cb
68 changed file
--- a/tutorials/README.md
+++ b/tutorials/README.md
+# 产业级模型开发教程
+飞桨是源于产业实践的开源深度学习平台，致力于让深度学习技术的创新与应用更简单。产业级模型的开发过程主要包含下面三个步骤。
+<div align="center">
+<img src="images/intrstrial_sota_model_pipeline.png"  width = "800" />
+</div>
+具体地，
+* 关于论文复现流程与方法，请参考：[论文复现指南](./article-implementation/ArticleReproduction_CV.md)
+* 关于模型速度与精度优化的方法，请参考：[产业级SOTA模型优化指南](./pp-series/README.md)
+* 关于训推一体全流程功能开发与测试方法，请参考：[飞桨训推一体全流程开发文档](./tipc/README.md)
--- a/tutorials/images/intrstrial_sota_model_pipeline.png
+++ b/tutorials/images/intrstrial_sota_model_pipeline.png
--- a/tutorials/pp-series/HRNet-Keypoint/README.md
+++ b/tutorials/pp-series/HRNet-Keypoint/README.md
+# [Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019)](https://arxiv.org/abs/1902.09212)
+## 1 Introduction
+This is the paddle code of [Deep High-Resolution Representation Learning for Human Pose Estimation](https://arxiv.org/abs/1902.09212).  
+In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.
+## 2 How to use
+### 2.1 Environment
+### Requirements:
+- PaddlePaddle 2.2
+- OS 64 bit
+- Python 3(3.5.1+/3.6/3.7/3.8/3.9)，64 bit
+- pip/pip3(9.0.1+), 64 bit
+- CUDA >= 10.1
+- cuDNN >= 7.6
+### Installation
+#### 1. Install PaddlePaddle
+```
+# CUDA10.1
+python -m pip install paddlepaddle-gpu==2.2.0.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+```
+- For more CUDA version or environment to quick install, please refer to the [PaddlePaddle Quick Installation document](https://www.paddlepaddle.org.cn/install/quick)
+- For more installation methods such as conda or compile with source code, please refer to the [installation document](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)
+Please make sure that your PaddlePaddle is installed successfully and the version is not lower than the required version. Use the following command to verify.
+```
+# check
+>>> import paddle
+>>> paddle.utils.run_check()
+# confirm the paddle's version
+python -c "import paddle; print(paddle.__version__)"
+```
+**Note**
+1.  If you want to use PaddleDetection on multi-GPU, please install NCCL at first.
+#### 2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
+#### 3. Install dependencies:
+   ```
+   pip install -r requirements.txt
+   ```
+#### 4. Init output(training model output directory) and log(tensorboard log directory) directory:
+   ```
+   mkdir output
+   mkdir log
+   ```
+   Your directory tree should look like this:
+   ```
+   ${POSE_ROOT}
+   ├── config
+   ├── dataset
+   ├── figures
+   ├── lib
+   ├── log
+   ├── output
+   ├── tools
+   ├── README.md
+   └── requirements.txt
+   ```
+### 2.2 Data preparation
+#### COCO Data Download
+- The coco dataset is downloaded automatically through the code. The dataset is large and takes a long time to download
+    ```
+    # automatically download coco datasets by executing code
+    python dataset/download_coco.py
+    ```
+    after code execution, the organization structure of coco dataset file is：
+    ```
+    >>cd dataset
+    >>tree
+    ├── annotations
+    │   ├── instances_train2017.json
+    │   ├── instances_val2017.json
+    │   |   ...
+    ├── train2017
+    │   ├── 000000000009.jpg
+    │   ├── 000000580008.jpg
+    │   |   ...
+    ├── val2017
+    │   ├── 000000000139.jpg
+    │   ├── 000000000285.jpg
+    │   |   ...
+    |   ...
+    ```
+- If the coco dataset has been downloaded  
+    The files can be organized according to the above data file organization structure.
+### 2.3 Training & Evaluation & Inference
+We provides scripts for training, evalution and inference with various features according to different configure.
+```bash
+# training on single-GPU
+export CUDA_VISIBLE_DEVICES=0
+python tools/train.py -c configs/hrnet_w32_256x192.yml
+# training on multi-GPU
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/hrnet_w32_256x192.yml
+# GPU evaluation
+export CUDA_VISIBLE_DEVICES=0
+python tools/eval.py -c configs/hrnet_w32_256x192.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams
+# Inference
+python tools/infer.py -c configs/hrnet_w32_256x192.yml --infer_img=dataset/test_image/hrnet_demo.jpg -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams
+# training with distillation
+python tools/train.py -c configs/lite_hrnet_30_256x192_coco.yml  --distill_config=./configs/hrnet_w32_256x192_teacher.yml
+# training with PACT quantization on single-GPU
+export CUDA_VISIBLE_DEVICES=0
+python tools/train.py -c configs/lite_hrnet_30_256x192_coco_pact.yml
+# training with PACT quantization on multi-GPU
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/lite_hrnet_30_256x192_coco_pact.yml
+# GPU evaluation with PACT quantization
+export CUDA_VISIBLE_DEVICES=0
+python tools/eval.py -c configs/lite_hrnet_30_256x192_coco_pact.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/lite_hrnet_30_256x192_coco_pact.pdparams
+# Inference with PACT quantization
+python tools/infer.py -c configs/lite_hrnet_30_256x192_coco_pact.yml
+--infer_img=dataset/test_image/hrnet_demo.jpg -o weights=https://paddledet.bj.bcebos.com/models/keypoint/lite_hrnet_30_256x192_coco_pact.pdparams
+```
+## 3 Result
+COCO Dataset
+| Model              | Input Size | AP(coco val) |                           Model Download                           | Config File                                                    |
+| :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------------- |
+| HRNet-w32             | 256x192  |     76.9     | [hrnet_w32_256x192.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/hrnet_w32_256x192.pdparams) | [config](./configs/hrnet_w32_256x192.yml)                     |
+| LiteHRNet-30          | 256x192  |     69.4     | [lite_hrnet_30_256x192_coco.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/lite_hrnet_30_256x192_coco.pdparams) | [config](./configs/lite_hrnet_30_256x192_coco.yml)                     |
+| LiteHRNet-30-PACT         | 256x192  |     68.9     | [lite_hrnet_30_256x192_coco_pact.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/lite_hrnet_30_256x192_coco_pact.pdparams) | [config](./configs/lite_hrnet_30_256x192_coco_pact.yml)                     |
+| LiteHRNet-30-PACT         | 256x192  |     69.9     | [lite_hrnet_30_256x192_coco.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/lite_hrnet_30_256x192_coco_dist.pdparams) | [config](./configs/lite_hrnet_30_256x192_coco_pact.yml)                     |
+![](/dataset/test_image/hrnet_demo.jpg)
+![](/deploy/output/hrnet_demo_vis.jpg)
+## Citation
+````
+@inproceedings{cheng2020bottom,
+  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
+  author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
+  booktitle={CVPR},
+  year={2019}
+}
+````
--- a/tutorials/pp-series/HRNet-Keypoint/configs/hrnet_w32_256x192.yml
+++ b/tutorials/pp-series/HRNet-Keypoint/configs/hrnet_w32_256x192.yml
+use_gpu: true
+log_iter: 5
+save_dir: output
+snapshot_epoch: 10
+weights: output/hrnet_w32_256x192/model_final
+epoch: 210
+num_joints: &num_joints 17
+pixel_std: &pixel_std 200
+metric: KeyPointTopDownCOCOEval
+num_classes: 1
+train_height: &train_height 256
+train_width: &train_width 192
+trainsize: &trainsize [*train_width, *train_height]
+hmsize: &hmsize [48, 64]
+flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
+#####model
+architecture: TopDownHRNet
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams
+TopDownHRNet:
+  backbone: HRNet
+  post_process: HRNetPostProcess
+  flip_perm: *flip_perm
+  num_joints: *num_joints
+  width: &width 32
+  loss: KeyPointMSELoss
+  use_dark: False
+HRNet:
+  width: *width
+  freeze_at: -1
+  freeze_norm: false
+  return_idx: [0]
+KeyPointMSELoss:
+  use_target_weight: true
+#####optimizer
+LearningRate:
+  base_lr: 0.0005
+  schedulers:
+  - !PiecewiseDecay
+    milestones: [170, 200]
+    gamma: 0.1
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 1000
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer:
+    factor: 0.0
+    type: L2
+#####data
+TrainDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: train2017
+    anno_path: annotations/person_keypoints_train2017.json
+    dataset_dir: dataset/coco
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+EvalDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: val2017
+    anno_path: annotations/person_keypoints_val2017.json
+    dataset_dir: dataset/coco
+    bbox_file: bbox.json
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+    image_thre: 0.0
+TestDataset:
+  !ImageFolder
+    anno_path: dataset/coco/keypoint_imagelist.txt
+worker_num: 2
+global_mean: &global_mean [0.485, 0.456, 0.406]
+global_std: &global_std [0.229, 0.224, 0.225]
+TrainReader:
+  sample_transforms:
+    - RandomFlipHalfBodyTransform:
+        scale: 0.5
+        rot: 40
+        num_joints_half_body: 8
+        prob_half_body: 0.3
+        pixel_std: *pixel_std
+        trainsize: *trainsize
+        upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+        flip_pairs: *flip_perm
+    - TopDownAffine:
+        trainsize: *trainsize
+    - ToHeatmapsTopDown:
+        hmsize: *hmsize
+        sigma: 2
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 64
+  shuffle: true
+  drop_last: false
+EvalReader:
+  sample_transforms:
+    - TopDownAffine:
+        trainsize: *trainsize
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 16
+TestReader:
+  inputs_def:
+    image_shape: [3, *train_height, *train_width]
+  sample_transforms:
+    - Decode: {}
+    - TopDownEvalAffine:
+        trainsize: *trainsize
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 1
+  fuse_normalize: false #whether to fuse nomalize layer into model while export model
--- a/tutorials/pp-series/HRNet-Keypoint/configs/hrnet_w32_256x192_teacher.yml
+++ b/tutorials/pp-series/HRNet-Keypoint/configs/hrnet_w32_256x192_teacher.yml
+pretrain_weights:
+weights: "https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams"
+num_joints: &num_joints 17
+pixel_std: &pixel_std 200
+metric: KeyPointTopDownCOCOEval
+train_height: &train_height 256
+train_width: &train_width 192
+trainsize: &trainsize [*train_width, *train_height]
+hmsize: &hmsize [48, 64]
+flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
+# distillation config and loss
+freeze_parameters: True
+distill_loss:
+  name: DistMSELoss
+  weight: 1.0
+  key: output
+# model
+architecture: TopDownHRNet
+TopDownHRNet:
+  backbone: HRNet
+  post_process: HRNetPostProcess
+  flip_perm: *flip_perm
+  num_joints: *num_joints
+  width: &width 32
+  loss: KeyPointMSELoss
+  use_dark: False
+HRNet:
+  width: *width
+  freeze_at: -1
+  freeze_norm: false
+  return_idx: [0]
+KeyPointMSELoss:
+  use_target_weight: true
\ No newline at end of file
--- a/tutorials/pp-series/HRNet-Keypoint/configs/lite_hrnet_30_256x192_coco.yml
+++ b/tutorials/pp-series/HRNet-Keypoint/configs/lite_hrnet_30_256x192_coco.yml
+use_gpu: true
+log_iter: 5
+save_dir: output
+snapshot_epoch: 10
+weights: output/lite_hrnet_30_256x192_coco/model_final
+epoch: 210
+num_joints: &num_joints 17
+pixel_std: &pixel_std 200
+metric: KeyPointTopDownCOCOEval
+num_classes: 1
+train_height: &train_height 256
+train_width: &train_width 192
+trainsize: &trainsize [*train_width, *train_height]
+hmsize: &hmsize [48, 64]
+flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
+#####model
+architecture: TopDownHRNet
+TopDownHRNet:
+  backbone: LiteHRNet
+  post_process: HRNetPostProcess
+  flip_perm: *flip_perm
+  num_joints: *num_joints
+  width: &width 40
+  loss: KeyPointMSELoss
+  use_dark: false
+LiteHRNet:
+  network_type: lite_30
+  freeze_at: -1
+  freeze_norm: false
+  return_idx: [0]
+KeyPointMSELoss:
+  use_target_weight: true
+  loss_scale: 1.0
+#####optimizer
+LearningRate:
+  base_lr: 0.002
+  schedulers:
+  - !PiecewiseDecay
+    milestones: [170, 200]
+    gamma: 0.1
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer:
+    factor: 0.0
+    type: L2
+#####data
+TrainDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: train2017
+    anno_path: annotations/person_keypoints_train2017.json
+    dataset_dir: dataset/coco
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+EvalDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: val2017
+    anno_path: annotations/person_keypoints_val2017.json
+    dataset_dir: dataset/coco
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+    image_thre: 0.0
+TestDataset:
+  !ImageFolder
+    anno_path: dataset/coco/keypoint_imagelist.txt
+worker_num: 4
+global_mean: &global_mean [0.485, 0.456, 0.406]
+global_std: &global_std [0.229, 0.224, 0.225]
+TrainReader:
+  sample_transforms:
+    - RandomFlipHalfBodyTransform:
+        scale: 0.25
+        rot: 30
+        num_joints_half_body: 8
+        prob_half_body: 0.3
+        pixel_std: *pixel_std
+        trainsize: *trainsize
+        upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+        flip_pairs: *flip_perm
+    - TopDownAffine:
+        trainsize: *trainsize
+    - ToHeatmapsTopDown:
+        hmsize: *hmsize
+        sigma: 2
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 64
+  shuffle: true
+  drop_last: false
+EvalReader:
+  sample_transforms:
+    - TopDownAffine:
+        trainsize: *trainsize
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 16
+TestReader:
+  inputs_def:
+    image_shape: [3, *train_height, *train_width]
+  sample_transforms:
+    - Decode: {}
+    - TopDownEvalAffine:
+        trainsize: *trainsize
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 1
--- a/tutorials/pp-series/HRNet-Keypoint/configs/lite_hrnet_30_256x192_coco_pact.yml
+++ b/tutorials/pp-series/HRNet-Keypoint/configs/lite_hrnet_30_256x192_coco_pact.yml
+use_gpu: true
+log_iter: 5
+save_dir: output
+snapshot_epoch: 10
+weights: output/lite_hrnet_30_256x192_coco/model_final
+epoch: 50
+num_joints: &num_joints 17
+pixel_std: &pixel_std 200
+metric: KeyPointTopDownCOCOEval
+num_classes: 1
+train_height: &train_height 256
+train_width: &train_width 192
+trainsize: &trainsize [*train_width, *train_height]
+hmsize: &hmsize [48, 64]
+flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams
+slim: QAT
+QAT:
+  quant_config: {
+    'activation_preprocess_type': 'PACT',
+    'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
+    'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9,
+    'quantizable_layer_type': ['Conv2D', 'Linear']}
+  print_model: True
+architecture: TopDownHRNet
+TopDownHRNet:
+  backbone: LiteHRNet
+  post_process: HRNetPostProcess
+  flip_perm: *flip_perm
+  num_joints: *num_joints
+  width: &width 40
+  loss: KeyPointMSELoss
+  use_dark: false
+LiteHRNet:
+  network_type: lite_30
+  freeze_at: -1
+  freeze_norm: false
+  return_idx: [0]
+KeyPointMSELoss:
+  use_target_weight: true
+  loss_scale: 1.0
+# optimizer
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    milestones: [40, 45]
+    gamma: 0.1
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 500
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer:
+    factor: 0.0
+    type: L2
+#####data
+TrainDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: train2017
+    anno_path: annotations/person_keypoints_train2017.json
+    dataset_dir: dataset/coco
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+EvalDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: val2017
+    anno_path: annotations/person_keypoints_val2017.json
+    dataset_dir: dataset/coco
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+    image_thre: 0.0
+TestDataset:
+  !ImageFolder
+    anno_path: dataset/coco/keypoint_imagelist.txt
+worker_num: 4
+global_mean: &global_mean [0.485, 0.456, 0.406]
+global_std: &global_std [0.229, 0.224, 0.225]
+TrainReader:
+  sample_transforms:
+    - RandomFlipHalfBodyTransform:
+        scale: 0.25
+        rot: 30
+        num_joints_half_body: 8
+        prob_half_body: 0.3
+        pixel_std: *pixel_std
+        trainsize: *trainsize
+        upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+        flip_pairs: *flip_perm
+    - TopDownAffine:
+        trainsize: *trainsize
+    - ToHeatmapsTopDown:
+        hmsize: *hmsize
+        sigma: 2
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 64
+  shuffle: true
+  drop_last: false
+EvalReader:
+  sample_transforms:
+    - TopDownAffine:
+        trainsize: *trainsize
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 16
+TestReader:
+  inputs_def:
+    image_shape: [3, *train_height, *train_width]
+  sample_transforms:
+    - Decode: {}
+    - TopDownEvalAffine:
+        trainsize: *trainsize
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 1
--- a/tutorials/pp-series/HRNet-Keypoint/dataset/download_coco.py
+++ b/tutorials/pp-series/HRNet-Keypoint/dataset/download_coco.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+import os.path as osp
+import logging
+# add python path of PadleDetection to sys.path
+parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3)))
+if parent_path not in sys.path:
+    sys.path.append(parent_path)
+from ppdet.utils.download import download_dataset
+logging.basicConfig(level=logging.INFO)
+download_path = osp.split(osp.realpath(sys.argv[0]))[0]
+download_dataset(download_path, 'coco')
--- a/tutorials/pp-series/HRNet-Keypoint/dataset/test_image/000000397133.jpg
+++ b/tutorials/pp-series/HRNet-Keypoint/dataset/test_image/000000397133.jpg
--- a/tutorials/pp-series/HRNet-Keypoint/dataset/test_image/hrnet_demo.jpg
+++ b/tutorials/pp-series/HRNet-Keypoint/dataset/test_image/hrnet_demo.jpg
--- a/tutorials/pp-series/HRNet-Keypoint/deploy/benchmark_utils.py
+++ b/tutorials/pp-series/HRNet-Keypoint/deploy/benchmark_utils.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import logging
+import paddle
+import paddle.inference as paddle_infer
+from pathlib import Path
+CUR_DIR = os.path.dirname(os.path.abspath(__file__))
+LOG_PATH_ROOT = f"{CUR_DIR}/../../output"
+class PaddleInferBenchmark(object):
+    def __init__(self,
+                 config,
+                 model_info: dict={},
+                 data_info: dict={},
+                 perf_info: dict={},
+                 resource_info: dict={},
+                 **kwargs):
+        """
+        Construct PaddleInferBenchmark Class to format logs.
+        args:
+            config(paddle.inference.Config): paddle inference config
+            model_info(dict): basic model info
+                {'model_name': 'resnet50'
+                 'precision': 'fp32'}
+            data_info(dict): input data info
+                {'batch_size': 1
+                 'shape': '3,224,224'
+                 'data_num': 1000}
+            perf_info(dict): performance result
+                {'preprocess_time_s': 1.0
+                'inference_time_s': 2.0
+                'postprocess_time_s': 1.0
+                'total_time_s': 4.0}
+            resource_info(dict): 
+                cpu and gpu resources
+                {'cpu_rss': 100
+                 'gpu_rss': 100
+                 'gpu_util': 60}
+        """
+        # PaddleInferBenchmark Log Version
+        self.log_version = "1.0.3"
+        # Paddle Version
+        self.paddle_version = paddle.__version__
+        self.paddle_commit = paddle.__git_commit__
+        paddle_infer_info = paddle_infer.get_version()
+        self.paddle_branch = paddle_infer_info.strip().split(': ')[-1]
+        # model info
+        self.model_info = model_info
+        # data info
+        self.data_info = data_info
+        # perf info
+        self.perf_info = perf_info
+        try:
+            # required value
+            self.model_name = model_info['model_name']
+            self.precision = model_info['precision']
+            self.batch_size = data_info['batch_size']
+            self.shape = data_info['shape']
+            self.data_num = data_info['data_num']
+            self.inference_time_s = round(perf_info['inference_time_s'], 4)
+        except:
+            self.print_help()
+            raise ValueError(
+                "Set argument wrong, please check input argument and its type")
+        self.preprocess_time_s = perf_info.get('preprocess_time_s', 0)
+        self.postprocess_time_s = perf_info.get('postprocess_time_s', 0)
+        self.total_time_s = perf_info.get('total_time_s', 0)
+        self.inference_time_s_90 = perf_info.get("inference_time_s_90", "")
+        self.inference_time_s_99 = perf_info.get("inference_time_s_99", "")
+        self.succ_rate = perf_info.get("succ_rate", "")
+        self.qps = perf_info.get("qps", "")
+        # conf info
+        self.config_status = self.parse_config(config)
+        # mem info
+        if isinstance(resource_info, dict):
+            self.cpu_rss_mb = int(resource_info.get('cpu_rss_mb', 0))
+            self.cpu_vms_mb = int(resource_info.get('cpu_vms_mb', 0))
+            self.cpu_shared_mb = int(resource_info.get('cpu_shared_mb', 0))
+            self.cpu_dirty_mb = int(resource_info.get('cpu_dirty_mb', 0))
+            self.cpu_util = round(resource_info.get('cpu_util', 0), 2)
+            self.gpu_rss_mb = int(resource_info.get('gpu_rss_mb', 0))
+            self.gpu_util = round(resource_info.get('gpu_util', 0), 2)
+            self.gpu_mem_util = round(resource_info.get('gpu_mem_util', 0), 2)
+        else:
+            self.cpu_rss_mb = 0
+            self.cpu_vms_mb = 0
+            self.cpu_shared_mb = 0
+            self.cpu_dirty_mb = 0
+            self.cpu_util = 0
+            self.gpu_rss_mb = 0
+            self.gpu_util = 0
+            self.gpu_mem_util = 0
+        # init benchmark logger
+        self.benchmark_logger()
+    def benchmark_logger(self):
+        """
+        benchmark logger
+        """
+        # remove other logging handler
+        for handler in logging.root.handlers[:]:
+            logging.root.removeHandler(handler)
+        # Init logger
+        FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+        log_output = f"{LOG_PATH_ROOT}/{self.model_name}.log"
+        Path(f"{LOG_PATH_ROOT}").mkdir(parents=True, exist_ok=True)
+        logging.basicConfig(
+            level=logging.INFO,
+            format=FORMAT,
+            handlers=[
+                logging.FileHandler(
+                    filename=log_output, mode='w'),
+                logging.StreamHandler(),
+            ])
+        self.logger = logging.getLogger(__name__)
+        self.logger.info(
+            f"Paddle Inference benchmark log will be saved to {log_output}")
+    def parse_config(self, config) -> dict:
+        """
+        parse paddle predictor config
+        args:
+            config(paddle.inference.Config): paddle inference config
+        return:
+            config_status(dict): dict style config info
+        """
+        if isinstance(config, paddle_infer.Config):
+            config_status = {}
+            config_status['runtime_device'] = "gpu" if config.use_gpu(
+            ) else "cpu"
+            config_status['ir_optim'] = config.ir_optim()
+            config_status['enable_tensorrt'] = config.tensorrt_engine_enabled()
+            config_status['precision'] = self.precision
+            config_status['enable_mkldnn'] = config.mkldnn_enabled()
+            config_status[
+                'cpu_math_library_num_threads'] = config.cpu_math_library_num_threads(
+                )
+        elif isinstance(config, dict):
+            config_status['runtime_device'] = config.get('runtime_device', "")
+            config_status['ir_optim'] = config.get('ir_optim', "")
+            config_status['enable_tensorrt'] = config.get('enable_tensorrt',
+                                                          "")
+            config_status['precision'] = config.get('precision', "")
+            config_status['enable_mkldnn'] = config.get('enable_mkldnn', "")
+            config_status['cpu_math_library_num_threads'] = config.get(
+                'cpu_math_library_num_threads', "")
+        else:
+            self.print_help()
+            raise ValueError(
+                "Set argument config wrong, please check input argument and its type"
+            )
+        return config_status
+    def report(self, identifier=None):
+        """
+        print log report
+        args:
+            identifier(string): identify log
+        """
+        if identifier:
+            identifier = f"[{identifier}]"
+        else:
+            identifier = ""
+        self.logger.info("\n")
+        self.logger.info(
+            "---------------------- Paddle info ----------------------")
+        self.logger.info(f"{identifier} paddle_version: {self.paddle_version}")
+        self.logger.info(f"{identifier} paddle_commit: {self.paddle_commit}")
+        self.logger.info(f"{identifier} paddle_branch: {self.paddle_branch}")
+        self.logger.info(f"{identifier} log_api_version: {self.log_version}")
+        self.logger.info(
+            "----------------------- Conf info -----------------------")
+        self.logger.info(
+            f"{identifier} runtime_device: {self.config_status['runtime_device']}"
+        )
+        self.logger.info(
+            f"{identifier} ir_optim: {self.config_status['ir_optim']}")
+        self.logger.info(f"{identifier} enable_memory_optim: {True}")
+        self.logger.info(
+            f"{identifier} enable_tensorrt: {self.config_status['enable_tensorrt']}"
+        )
+        self.logger.info(
+            f"{identifier} enable_mkldnn: {self.config_status['enable_mkldnn']}"
+        )
+        self.logger.info(
+            f"{identifier} cpu_math_library_num_threads: {self.config_status['cpu_math_library_num_threads']}"
+        )
+        self.logger.info(
+            "----------------------- Model info ----------------------")
+        self.logger.info(f"{identifier} model_name: {self.model_name}")
+        self.logger.info(f"{identifier} precision: {self.precision}")
+        self.logger.info(
+            "----------------------- Data info -----------------------")
+        self.logger.info(f"{identifier} batch_size: {self.batch_size}")
+        self.logger.info(f"{identifier} input_shape: {self.shape}")
+        self.logger.info(f"{identifier} data_num: {self.data_num}")
+        self.logger.info(
+            "----------------------- Perf info -----------------------")
+        self.logger.info(
+            f"{identifier} cpu_rss(MB): {self.cpu_rss_mb}, cpu_vms: {self.cpu_vms_mb}, cpu_shared_mb: {self.cpu_shared_mb}, cpu_dirty_mb: {self.cpu_dirty_mb}, cpu_util: {self.cpu_util}%"
+        )
+        self.logger.info(
+            f"{identifier} gpu_rss(MB): {self.gpu_rss_mb}, gpu_util: {self.gpu_util}%, gpu_mem_util: {self.gpu_mem_util}%"
+        )
+        self.logger.info(
+            f"{identifier} total time spent(s): {self.total_time_s}")
+        self.logger.info(
+            f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, inference_time(ms): {round(self.inference_time_s*1000, 1)}, postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}"
+        )
+        if self.inference_time_s_90:
+            self.looger.info(
+                f"{identifier} 90%_cost: {self.inference_time_s_90}, 99%_cost: {self.inference_time_s_99}, succ_rate: {self.succ_rate}"
+            )
+        if self.qps:
+            self.logger.info(f"{identifier} QPS: {self.qps}")
+    def print_help(self):
+        """
+        print function help
+        """
+        print("""Usage: 
+            ==== Print inference benchmark logs. ====
+            config = paddle.inference.Config()
+            model_info = {'model_name': 'resnet50'
+                          'precision': 'fp32'}
+            data_info = {'batch_size': 1
+                         'shape': '3,224,224'
+                         'data_num': 1000}
+            perf_info = {'preprocess_time_s': 1.0
+                         'inference_time_s': 2.0
+                         'postprocess_time_s': 1.0
+                         'total_time_s': 4.0}
+            resource_info = {'cpu_rss_mb': 100
+                             'gpu_rss_mb': 100
+                             'gpu_util': 60}
+            log = PaddleInferBenchmark(config, model_info, data_info, perf_info, resource_info)
+            log('Test')
+            """)
+    def __call__(self, identifier=None):
+        """
+        __call__
+        args:
+            identifier(string): identify log
+        """
+        self.report(identifier)
--- a/tutorials/pp-series/HRNet-Keypoint/deploy/infer.py
+++ b/tutorials/pp-series/HRNet-Keypoint/deploy/infer.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import yaml
+import glob
+from functools import reduce
+import cv2
+import numpy as np
+import math
+import paddle
+from paddle.inference import Config
+from paddle.inference import create_predictor
+from benchmark_utils import PaddleInferBenchmark
+from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, WarpAffine, TopDownEvalAffine, expand_crop
+from postprocess import HRNetPostProcess
+from visualize import draw_pose
+from utils import argsparser, Timer, get_current_memory_mb
+class Detector(object):
+    """
+    Args:
+        pred_config (object): config of model, defined by `Config(model_dir)`
+        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
+        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
+        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
+        batch_size (int): size of pre batch in inference
+        trt_min_shape (int): min shape for dynamic shape in trt
+        trt_max_shape (int): max shape for dynamic shape in trt
+        trt_opt_shape (int): opt shape for dynamic shape in trt
+        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
+            calibration, trt_calib_mode need to set True
+        cpu_threads (int): cpu threads
+        enable_mkldnn (bool): whether to open MKLDNN
+    """
+    def __init__(self,
+                 pred_config,
+                 model_dir,
+                 device='CPU',
+                 run_mode='paddle',
+                 batch_size=1,
+                 trt_min_shape=1,
+                 trt_max_shape=1280,
+                 trt_opt_shape=640,
+                 trt_calib_mode=False,
+                 cpu_threads=1,
+                 enable_mkldnn=False,
+                 use_dark=True):
+        self.pred_config = pred_config
+        self.predictor, self.config = load_predictor(
+            model_dir,
+            run_mode=run_mode,
+            batch_size=batch_size,
+            min_subgraph_size=self.pred_config.min_subgraph_size,
+            device=device,
+            use_dynamic_shape=self.pred_config.use_dynamic_shape,
+            trt_min_shape=trt_min_shape,
+            trt_max_shape=trt_max_shape,
+            trt_opt_shape=trt_opt_shape,
+            trt_calib_mode=trt_calib_mode,
+            cpu_threads=cpu_threads,
+            enable_mkldnn=enable_mkldnn)
+        self.det_times = Timer()
+        self.cpu_mem, self.gpu_mem, self.gpu_util = 0, 0, 0
+        self.use_dark = use_dark
+    def preprocess(self, image_list):
+        preprocess_ops = []
+        for op_info in self.pred_config.preprocess_infos:
+            new_op_info = op_info.copy()
+            op_type = new_op_info.pop('type')
+            preprocess_ops.append(eval(op_type)(**new_op_info))
+        input_im_lst = []
+        input_im_info_lst = []
+        for im_path in image_list:
+            im, im_info = preprocess(im_path, preprocess_ops)
+            input_im_lst.append(im)
+            input_im_info_lst.append(im_info)
+        inputs = create_inputs(input_im_lst, input_im_info_lst)
+        return inputs
+    def postprocess(self, np_boxes, inputs, threshold=0.5):
+        # postprocess output of predictor
+        results = {}
+        imshape = inputs['im_shape'][:, ::-1]
+        center = np.round(imshape / 2.)
+        scale = imshape / 200.
+        postprocess = HRNetPostProcess(use_dark=self.use_dark)
+        results['keypoint'] = postprocess(np_boxes, center, scale)
+        return results
+    def predict(self, image_list, threshold=0.5, repeats=1, add_timer=True):
+        '''
+        Args:
+            image_list (list): list of image
+            threshold (float): threshold of predicted box' score
+            repeats (int): repeat number for prediction
+            add_timer (bool): whether add timer during prediction
+        Returns:
+            results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
+                            matix element:[class, score, x_min, y_min, x_max, y_max]
+                            MaskRCNN's results include 'masks': np.ndarray:
+                            shape: [N, im_h, im_w]
+        '''
+        # preprocess
+        if add_timer:
+            self.det_times.preprocess_time_s.start()
+        inputs = self.preprocess(image_list)
+        np_boxes = None
+        input_names = self.predictor.get_input_names()
+        for i in range(len(input_names)):
+            input_tensor = self.predictor.get_input_handle(input_names[i])
+            input_tensor.copy_from_cpu(inputs[input_names[i]])
+        if add_timer:
+            self.det_times.preprocess_time_s.end()
+            self.det_times.inference_time_s.start()
+        # model prediction
+        for i in range(repeats):
+            self.predictor.run()
+            output_names = self.predictor.get_output_names()
+            boxes_tensor = self.predictor.get_output_handle(output_names[0])
+            np_boxes = boxes_tensor.copy_to_cpu()
+        if add_timer:
+            self.det_times.inference_time_s.end(repeats=repeats)
+            self.det_times.postprocess_time_s.start()
+        # postprocess
+        results = self.postprocess(np_boxes, inputs, threshold=threshold)
+        if add_timer:
+            self.det_times.postprocess_time_s.end()
+            self.det_times.img_num += len(image_list)
+        return results
+    def get_timer(self):
+        return self.det_times
+def create_inputs(imgs, im_info):
+    """generate input for different model type
+    Args:
+        imgs (list(numpy)): list of images (np.ndarray)
+        im_info (list(dict)): list of image info
+    Returns:
+        inputs (dict): input of model
+    """
+    inputs = {}
+    inputs['image'] = np.stack(imgs, axis=0)
+    im_shape = []
+    for e in im_info:
+        im_shape.append(np.array((e['im_shape'])).astype('float32'))
+    inputs['im_shape'] = np.stack(im_shape, axis=0)
+    return inputs
+class PredictConfig():
+    """set config of preprocess, postprocess and visualize
+    Args:
+        model_dir (str): root path of model.yml
+    """
+    def __init__(self, model_dir):
+        # parsing Yaml config for Preprocess
+        deploy_file = os.path.join(model_dir, 'infer_cfg.yml')
+        with open(deploy_file) as f:
+            yml_conf = yaml.safe_load(f)
+        self.arch = yml_conf['arch']
+        self.preprocess_infos = yml_conf['Preprocess']
+        self.min_subgraph_size = yml_conf['min_subgraph_size']
+        self.labels = yml_conf['label_list']
+        self.use_dynamic_shape = yml_conf['use_dynamic_shape']
+        self.print_config()
+    def print_config(self):
+        print('-----------  Model Configuration -----------')
+        print('%s: %s' % ('Model Arch', self.arch))
+        print('%s: ' % ('Transform Order'))
+        for op_info in self.preprocess_infos:
+            print('--%s: %s' % ('transform op', op_info['type']))
+        print('--------------------------------------------')
+def load_predictor(model_dir,
+                   run_mode='paddle',
+                   batch_size=1,
+                   device='CPU',
+                   min_subgraph_size=3,
+                   use_dynamic_shape=False,
+                   trt_min_shape=1,
+                   trt_max_shape=1280,
+                   trt_opt_shape=640,
+                   trt_calib_mode=False,
+                   cpu_threads=1,
+                   enable_mkldnn=False):
+    """set AnalysisConfig, generate AnalysisPredictor
+    Args:
+        model_dir (str): root path of __model__ and __params__
+        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
+        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
+        use_dynamic_shape (bool): use dynamic shape or not
+        trt_min_shape (int): min shape for dynamic shape in trt
+        trt_max_shape (int): max shape for dynamic shape in trt
+        trt_opt_shape (int): opt shape for dynamic shape in trt
+        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
+            calibration, trt_calib_mode need to set True
+    Returns:
+        predictor (PaddlePredictor): AnalysisPredictor
+    Raises:
+        ValueError: predict by TensorRT need device == 'GPU'.
+    """
+    if device != 'GPU' and run_mode != 'paddle':
+        raise ValueError(
+            "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
+            .format(run_mode, device))
+    config = Config(
+        os.path.join(model_dir, 'model.pdmodel'),
+        os.path.join(model_dir, 'model.pdiparams'))
+    if device == 'GPU':
+        # initial GPU memory(M), device ID
+        config.enable_use_gpu(200, 0)
+        # optimize graph and fuse op
+        config.switch_ir_optim(True)
+    elif device == 'XPU':
+        config.enable_lite_engine()
+        config.enable_xpu(10 * 1024 * 1024)
+    else:
+        config.disable_gpu()
+        config.set_cpu_math_library_num_threads(cpu_threads)
+        if enable_mkldnn:
+            try:
+                # cache 10 different shapes for mkldnn to avoid memory leak
+                config.set_mkldnn_cache_capacity(10)
+                config.enable_mkldnn()
+            except Exception as e:
+                print(
+                    "The current environment does not support `mkldnn`, so disable mkldnn."
+                )
+                pass
+    precision_map = {
+        'trt_int8': Config.Precision.Int8,
+        'trt_fp32': Config.Precision.Float32,
+        'trt_fp16': Config.Precision.Half
+    }
+    if run_mode in precision_map.keys():
+        config.enable_tensorrt_engine(
+            workspace_size=1 << 25,
+            max_batch_size=batch_size,
+            min_subgraph_size=min_subgraph_size,
+            precision_mode=precision_map[run_mode],
+            use_static=False,
+            use_calib_mode=trt_calib_mode)
+        if use_dynamic_shape:
+            min_input_shape = {
+                'image': [batch_size, 3, trt_min_shape, trt_min_shape]
+            }
+            max_input_shape = {
+                'image': [batch_size, 3, trt_max_shape, trt_max_shape]
+            }
+            opt_input_shape = {
+                'image': [batch_size, 3, trt_opt_shape, trt_opt_shape]
+            }
+            config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
+                                              opt_input_shape)
+            print('trt set dynamic shape done!')
+    # disable print log when predict
+    config.disable_glog_info()
+    # enable shared memory
+    config.enable_memory_optim()
+    # disable feed, fetch OP, needed by zero_copy_run
+    config.switch_use_feed_fetch_ops(False)
+    predictor = create_predictor(config)
+    return predictor, config
+def get_test_images(infer_dir, infer_img):
+    """
+    Get image path list in TEST mode
+    """
+    assert infer_img is not None or infer_dir is not None, \
+        "--infer_img or --infer_dir should be set"
+    assert infer_img is None or os.path.isfile(infer_img), \
+            "{} is not a file".format(infer_img)
+    assert infer_dir is None or os.path.isdir(infer_dir), \
+            "{} is not a directory".format(infer_dir)
+    # infer_img has a higher priority
+    if infer_img and os.path.isfile(infer_img):
+        return [infer_img]
+    images = set()
+    infer_dir = os.path.abspath(infer_dir)
+    assert os.path.isdir(infer_dir), \
+        "infer_dir {} is not a directory".format(infer_dir)
+    exts = ['jpg', 'jpeg', 'png', 'bmp']
+    exts += [ext.upper() for ext in exts]
+    for ext in exts:
+        images.update(glob.glob('{}/*.{}'.format(infer_dir, ext)))
+    images = list(images)
+    assert len(images) > 0, "no image found in {}".format(infer_dir)
+    print("Found {} inference images in total.".format(len(images)))
+    return images
+def print_arguments(args):
+    print('-----------  Running Arguments -----------')
+    for arg, value in sorted(vars(args).items()):
+        print('%s: %s' % (arg, value))
+    print('------------------------------------------')
+def predict_image(detector, image_list, batch_size=1):
+    for i, img_file in enumerate(image_list):
+        if FLAGS.run_benchmark:
+            # warmup
+            detector.predict(
+                image_list, FLAGS.threshold, repeats=10, add_timer=False)
+            # run benchmark
+            detector.predict(
+                image_list, FLAGS.threshold, repeats=10, add_timer=True)
+            cm, gm, gu = get_current_memory_mb()
+            detector.cpu_mem += cm
+            detector.gpu_mem += gm
+            detector.gpu_util += gu
+            print('Test iter {}'.format(i))
+        else:
+            results = detector.predict(image_list, FLAGS.threshold)
+            draw_pose(
+                img_file,
+                results,
+                visual_thread=FLAGS.threshold,
+                save_dir=FLAGS.output_dir)
+def predict_video(detector, camera_id):
+    video_out_name = 'output.mp4'
+    if camera_id != -1:
+        capture = cv2.VideoCapture(camera_id)
+    else:
+        capture = cv2.VideoCapture(FLAGS.video_file)
+        video_out_name = os.path.split(FLAGS.video_file)[-1]
+    # Get Video info : resolution, fps, frame count
+    width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+    fps = int(capture.get(cv2.CAP_PROP_FPS))
+    frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
+    print("fps: %d, frame_count: %d" % (fps, frame_count))
+    if not os.path.exists(FLAGS.output_dir):
+        os.makedirs(FLAGS.output_dir)
+    out_path = os.path.join(FLAGS.output_dir, video_out_name)
+    fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
+    writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
+    index = 1
+    while (1):
+        ret, frame = capture.read()
+        if not ret:
+            break
+        print('detect frame: %d' % (index))
+        index += 1
+        results = detector.predict([frame], FLAGS.threshold)
+        im = draw_pose(
+            frame, results, visual_thread=FLAGS.threshold, returnimg=True)
+        writer.write(im)
+        if camera_id != -1:
+            cv2.imshow('Mask Detection', im)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                break
+    writer.release()
+def main():
+    pred_config = PredictConfig(FLAGS.model_dir)
+    detector = Detector(
+        pred_config,
+        FLAGS.model_dir,
+        device=FLAGS.device,
+        run_mode=FLAGS.run_mode,
+        batch_size=FLAGS.batch_size,
+        trt_min_shape=FLAGS.trt_min_shape,
+        trt_max_shape=FLAGS.trt_max_shape,
+        trt_opt_shape=FLAGS.trt_opt_shape,
+        trt_calib_mode=FLAGS.trt_calib_mode,
+        cpu_threads=FLAGS.cpu_threads,
+        enable_mkldnn=FLAGS.enable_mkldnn,
+        use_dark=FLAGS.use_dark)
+    # predict from video file or camera video stream
+    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
+        predict_video(detector, FLAGS.camera_id)
+    else:
+        # predict from image
+        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
+        predict_image(detector, img_list)
+        if not FLAGS.run_benchmark:
+            detector.det_times.info(average=True)
+        else:
+            mems = {
+                'cpu_rss_mb': detector.cpu_mem / len(img_list),
+                'gpu_rss_mb': detector.gpu_mem / len(img_list),
+                'gpu_util': detector.gpu_util * 100 / len(img_list)
+            }
+            perf_info = detector.det_times.report(average=True)
+            model_dir = FLAGS.model_dir
+            mode = FLAGS.run_mode
+            model_info = {
+                'model_name': model_dir.strip('/').split('/')[-1],
+                'precision': mode.split('_')[-1]
+            }
+            data_info = {
+                'batch_size': 1,
+                'shape': "dynamic_shape",
+                'data_num': perf_info['img_num']
+            }
+            det_log = PaddleInferBenchmark(detector.config, model_info,
+                                           data_info, perf_info, mems)
+            det_log('Det')
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    print_arguments(FLAGS)
+    FLAGS.device = FLAGS.device.upper()
+    assert FLAGS.device in ['CPU', 'GPU', 'XPU'
+                            ], "device should be CPU, GPU or XPU"
+    assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device"
+    main()
--- a/tutorials/pp-series/HRNet-Keypoint/deploy/logger.py
+++ b/tutorials/pp-series/HRNet-Keypoint/deploy/logger.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import functools
+import logging
+import os
+import sys
+import paddle.distributed as dist
+__all__ = ['setup_logger']
+logger_initialized = []
+def setup_logger(name="ppdet", output=None):
+    """
+    Initialize logger and set its verbosity level to INFO.
+    Args:
+        output (str): a file name or a directory to save log. If None, will not save log file.
+            If ends with ".txt" or ".log", assumed to be a file name.
+            Otherwise, logs will be saved to `output/log.txt`.
+        name (str): the root module name of this logger
+    Returns:
+        logging.Logger: a logger
+    """
+    logger = logging.getLogger(name)
+    if name in logger_initialized:
+        return logger
+    logger.setLevel(logging.INFO)
+    logger.propagate = False
+    formatter = logging.Formatter(
+        "[%(asctime)s] %(name)s %(levelname)s: %(message)s",
+        datefmt="%m/%d %H:%M:%S")
+    # stdout logging: master only
+    local_rank = dist.get_rank()
+    if local_rank == 0:
+        ch = logging.StreamHandler(stream=sys.stdout)
+        ch.setLevel(logging.DEBUG)
+        ch.setFormatter(formatter)
+        logger.addHandler(ch)
+    # file logging: all workers
+    if output is not None:
+        if output.endswith(".txt") or output.endswith(".log"):
+            filename = output
+        else:
+            filename = os.path.join(output, "log.txt")
+        if local_rank > 0:
+            filename = filename + ".rank{}".format(local_rank)
+        os.makedirs(os.path.dirname(filename))
+        fh = logging.FileHandler(filename, mode='a')
+        fh.setLevel(logging.DEBUG)
+        fh.setFormatter(logging.Formatter())
+        logger.addHandler(fh)
+    logger_initialized.append(name)
+    return logger
--- a/tutorials/pp-series/HRNet-Keypoint/deploy/output/hrnet_demo_vis.jpg
+++ b/tutorials/pp-series/HRNet-Keypoint/deploy/output/hrnet_demo_vis.jpg
--- a/tutorials/pp-series/HRNet-Keypoint/deploy/postprocess.py
+++ b/tutorials/pp-series/HRNet-Keypoint/deploy/postprocess.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. 
+#   
+# Licensed under the Apache License, Version 2.0 (the "License");   
+# you may not use this file except in compliance with the License.  
+# You may obtain a copy of the License at   
+#   
+#     http://www.apache.org/licenses/LICENSE-2.0    
+#   
+# Unless required by applicable law or agreed to in writing, software   
+# distributed under the License is distributed on an "AS IS" BASIS, 
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
+# See the License for the specific language governing permissions and   
+# limitations under the License.
+from scipy.optimize import linear_sum_assignment
+from collections import abc, defaultdict
+import cv2
+import numpy as np
+import math
+import paddle
+import paddle.nn as nn
+from preprocess import get_affine_mat_kernel, get_affine_transform
+class HRNetPostProcess(object):
+    def __init__(self, use_dark=True):
+        self.use_dark = use_dark
+    def flip_back(self, output_flipped, matched_parts):
+        assert output_flipped.ndim == 4,\
+                'output_flipped should be [batch_size, num_joints, height, width]'
+        output_flipped = output_flipped[:, :, :, ::-1]
+        for pair in matched_parts:
+            tmp = output_flipped[:, pair[0], :, :].copy()
+            output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
+            output_flipped[:, pair[1], :, :] = tmp
+        return output_flipped
+    def get_max_preds(self, heatmaps):
+        """get predictions from score maps
+        Args:
+            heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
+        Returns:
+            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
+            maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints
+        """
+        assert isinstance(heatmaps,
+                          np.ndarray), 'heatmaps should be numpy.ndarray'
+        assert heatmaps.ndim == 4, 'batch_images should be 4-ndim'
+        batch_size = heatmaps.shape[0]
+        num_joints = heatmaps.shape[1]
+        width = heatmaps.shape[3]
+        heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1))
+        idx = np.argmax(heatmaps_reshaped, 2)
+        maxvals = np.amax(heatmaps_reshaped, 2)
+        maxvals = maxvals.reshape((batch_size, num_joints, 1))
+        idx = idx.reshape((batch_size, num_joints, 1))
+        preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
+        preds[:, :, 0] = (preds[:, :, 0]) % width
+        preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
+        pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
+        pred_mask = pred_mask.astype(np.float32)
+        preds *= pred_mask
+        return preds, maxvals
+    def gaussian_blur(self, heatmap, kernel):
+        border = (kernel - 1) // 2
+        batch_size = heatmap.shape[0]
+        num_joints = heatmap.shape[1]
+        height = heatmap.shape[2]
+        width = heatmap.shape[3]
+        for i in range(batch_size):
+            for j in range(num_joints):
+                origin_max = np.max(heatmap[i, j])
+                dr = np.zeros((height + 2 * border, width + 2 * border))
+                dr[border:-border, border:-border] = heatmap[i, j].copy()
+                dr = cv2.GaussianBlur(dr, (kernel, kernel), 0)
+                heatmap[i, j] = dr[border:-border, border:-border].copy()
+                heatmap[i, j] *= origin_max / np.max(heatmap[i, j])
+        return heatmap
+    def dark_parse(self, hm, coord):
+        heatmap_height = hm.shape[0]
+        heatmap_width = hm.shape[1]
+        px = int(coord[0])
+        py = int(coord[1])
+        if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2:
+            dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1])
+            dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px])
+            dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2])
+            dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \
+                + hm[py-1][px-1])
+            dyy = 0.25 * (
+                hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px])
+            derivative = np.matrix([[dx], [dy]])
+            hessian = np.matrix([[dxx, dxy], [dxy, dyy]])
+            if dxx * dyy - dxy**2 != 0:
+                hessianinv = hessian.I
+                offset = -hessianinv * derivative
+                offset = np.squeeze(np.array(offset.T), axis=0)
+                coord += offset
+        return coord
+    def dark_postprocess(self, hm, coords, kernelsize):
+        """
+        refer to https://github.com/ilovepose/DarkPose/lib/core/inference.py
+        """
+        hm = self.gaussian_blur(hm, kernelsize)
+        hm = np.maximum(hm, 1e-10)
+        hm = np.log(hm)
+        for n in range(coords.shape[0]):
+            for p in range(coords.shape[1]):
+                coords[n, p] = self.dark_parse(hm[n][p], coords[n][p])
+        return coords
+    def get_final_preds(self, heatmaps, center, scale, kernelsize=3):
+        """the highest heatvalue location with a quarter offset in the
+        direction from the highest response to the second highest response.
+        Args:
+            heatmaps (numpy.ndarray): The predicted heatmaps
+            center (numpy.ndarray): The boxes center
+            scale (numpy.ndarray): The scale factor
+        Returns:
+            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
+            maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints
+        """
+        coords, maxvals = self.get_max_preds(heatmaps)
+        heatmap_height = heatmaps.shape[2]
+        heatmap_width = heatmaps.shape[3]
+        if self.use_dark:
+            coords = self.dark_postprocess(heatmaps, coords, kernelsize)
+        else:
+            for n in range(coords.shape[0]):
+                for p in range(coords.shape[1]):
+                    hm = heatmaps[n][p]
+                    px = int(math.floor(coords[n][p][0] + 0.5))
+                    py = int(math.floor(coords[n][p][1] + 0.5))
+                    if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1:
+                        diff = np.array([
+                            hm[py][px + 1] - hm[py][px - 1],
+                            hm[py + 1][px] - hm[py - 1][px]
+                        ])
+                        coords[n][p] += np.sign(diff) * .25
+        preds = coords.copy()
+        # Transform back
+        for i in range(coords.shape[0]):
+            preds[i] = transform_preds(coords[i], center[i], scale[i],
+                                       [heatmap_width, heatmap_height])
+        return preds, maxvals
+    def __call__(self, output, center, scale):
+        preds, maxvals = self.get_final_preds(output, center, scale)
+        return np.concatenate(
+            (preds, maxvals), axis=-1), np.mean(
+                maxvals, axis=1)
+def transform_preds(coords, center, scale, output_size):
+    target_coords = np.zeros(coords.shape)
+    trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1)
+    for p in range(coords.shape[0]):
+        target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
+    return target_coords
+def affine_transform(pt, t):
+    new_pt = np.array([pt[0], pt[1], 1.]).T
+    new_pt = np.dot(t, new_pt)
+    return new_pt[:2]
+def translate_to_ori_images(keypoint_result, batch_records):
+    kpts, scores = keypoint_result['keypoint']
+    kpts[..., 0] += batch_records[:, 0:1]
+    kpts[..., 1] += batch_records[:, 1:2]
+    return kpts, scores
--- a/tutorials/pp-series/HRNet-Keypoint/deploy/preprocess.py
+++ b/tutorials/pp-series/HRNet-Keypoint/deploy/preprocess.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import cv2
+import numpy as np
+def decode_image(im_file, im_info):
+    """read rgb image
+    Args:
+        im_file (str|np.ndarray): input can be image path or np.ndarray
+        im_info (dict): info of image
+    Returns:
+        im (np.ndarray):  processed image (np.ndarray)
+        im_info (dict): info of processed image
+    """
+    if isinstance(im_file, str):
+        with open(im_file, 'rb') as f:
+            im_read = f.read()
+        data = np.frombuffer(im_read, dtype='uint8')
+        im = cv2.imdecode(data, 1)  # BGR mode, but need RGB mode
+        im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+    else:
+        im = im_file
+    im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
+    im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32)
+    return im, im_info
+class Resize(object):
+    """resize image by target_size and max_size
+    Args:
+        target_size (int): the target size of image
+        keep_ratio (bool): whether keep_ratio or not, default true
+        interp (int): method of resize
+    """
+    def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR):
+        if isinstance(target_size, int):
+            target_size = [target_size, target_size]
+        self.target_size = target_size
+        self.keep_ratio = keep_ratio
+        self.interp = interp
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        assert len(self.target_size) == 2
+        assert self.target_size[0] > 0 and self.target_size[1] > 0
+        im_channel = im.shape[2]
+        im_scale_y, im_scale_x = self.generate_scale(im)
+        im = cv2.resize(
+            im,
+            None,
+            None,
+            fx=im_scale_x,
+            fy=im_scale_y,
+            interpolation=self.interp)
+        im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
+        im_info['scale_factor'] = np.array(
+            [im_scale_y, im_scale_x]).astype('float32')
+        return im, im_info
+    def generate_scale(self, im):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+        Returns:
+            im_scale_x: the resize ratio of X
+            im_scale_y: the resize ratio of Y
+        """
+        origin_shape = im.shape[:2]
+        im_c = im.shape[2]
+        if self.keep_ratio:
+            im_size_min = np.min(origin_shape)
+            im_size_max = np.max(origin_shape)
+            target_size_min = np.min(self.target_size)
+            target_size_max = np.max(self.target_size)
+            im_scale = float(target_size_min) / float(im_size_min)
+            if np.round(im_scale * im_size_max) > target_size_max:
+                im_scale = float(target_size_max) / float(im_size_max)
+            im_scale_x = im_scale
+            im_scale_y = im_scale
+        else:
+            resize_h, resize_w = self.target_size
+            im_scale_y = resize_h / float(origin_shape[0])
+            im_scale_x = resize_w / float(origin_shape[1])
+        return im_scale_y, im_scale_x
+class NormalizeImage(object):
+    """normalize image
+    Args:
+        mean (list): im - mean
+        std (list): im / std
+        is_scale (bool): whether need im / 255
+        is_channel_first (bool): if True: image shape is CHW, else: HWC
+    """
+    def __init__(self, mean, std, is_scale=True):
+        self.mean = mean
+        self.std = std
+        self.is_scale = is_scale
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        im = im.astype(np.float32, copy=False)
+        mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+        std = np.array(self.std)[np.newaxis, np.newaxis, :]
+        if self.is_scale:
+            im = im / 255.0
+        im -= mean
+        im /= std
+        return im, im_info
+class Permute(object):
+    """permute image
+    Args:
+        to_bgr (bool): whether convert RGB to BGR 
+        channel_first (bool): whether convert HWC to CHW
+    """
+    def __init__(self, ):
+        super(Permute, self).__init__()
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        im = im.transpose((2, 0, 1)).copy()
+        return im, im_info
+class PadStride(object):
+    """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config
+    Args:
+        stride (bool): model with FPN need image shape % stride == 0
+    """
+    def __init__(self, stride=0):
+        self.coarsest_stride = stride
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        coarsest_stride = self.coarsest_stride
+        if coarsest_stride <= 0:
+            return im, im_info
+        im_c, im_h, im_w = im.shape
+        pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
+        pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
+        padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
+        padding_im[:, :im_h, :im_w] = im
+        return padding_im, im_info
+class WarpAffine(object):
+    """Warp affine the image
+    """
+    def __init__(self,
+                 keep_res=False,
+                 pad=31,
+                 input_h=512,
+                 input_w=512,
+                 scale=0.4,
+                 shift=0.1):
+        self.keep_res = keep_res
+        self.pad = pad
+        self.input_h = input_h
+        self.input_w = input_w
+        self.scale = scale
+        self.shift = shift
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
+        h, w = img.shape[:2]
+        if self.keep_res:
+            input_h = (h | self.pad) + 1
+            input_w = (w | self.pad) + 1
+            s = np.array([input_w, input_h], dtype=np.float32)
+            c = np.array([w // 2, h // 2], dtype=np.float32)
+        else:
+            s = max(h, w) * 1.0
+            input_h, input_w = self.input_h, self.input_w
+            c = np.array([w / 2., h / 2.], dtype=np.float32)
+        trans_input = get_affine_transform(c, s, 0, [input_w, input_h])
+        img = cv2.resize(img, (w, h))
+        inp = cv2.warpAffine(
+            img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR)
+        return inp, im_info
+class EvalAffine(object):
+    def __init__(self, size, stride=64):
+        super(EvalAffine, self).__init__()
+        self.size = size
+        self.stride = stride
+    def __call__(self, image, im_info):
+        s = self.size
+        h, w, _ = image.shape
+        trans, size_resized = get_affine_mat_kernel(h, w, s, inv=False)
+        image_resized = cv2.warpAffine(image, trans, size_resized)
+        return image_resized, im_info
+def get_affine_mat_kernel(h, w, s, inv=False):
+    if w < h:
+        w_ = s
+        h_ = int(np.ceil((s / w * h) / 64.) * 64)
+        scale_w = w
+        scale_h = h_ / w_ * w
+    else:
+        h_ = s
+        w_ = int(np.ceil((s / h * w) / 64.) * 64)
+        scale_h = h
+        scale_w = w_ / h_ * h
+    center = np.array([np.round(w / 2.), np.round(h / 2.)])
+    size_resized = (w_, h_)
+    trans = get_affine_transform(
+        center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv)
+    return trans, size_resized
+def get_affine_transform(center,
+                         input_size,
+                         rot,
+                         output_size,
+                         shift=(0., 0.),
+                         inv=False):
+    """Get the affine transform matrix, given the center/scale/rot/output_size.
+    Args:
+        center (np.ndarray[2, ]): Center of the bounding box (x, y).
+        scale (np.ndarray[2, ]): Scale of the bounding box
+            wrt [width, height].
+        rot (float): Rotation angle (degree).
+        output_size (np.ndarray[2, ]): Size of the destination heatmaps.
+        shift (0-100%): Shift translation ratio wrt the width/height.
+            Default (0., 0.).
+        inv (bool): Option to inverse the affine transform direction.
+            (inv=False: src->dst or inv=True: dst->src)
+    Returns:
+        np.ndarray: The transform matrix.
+    """
+    assert len(center) == 2
+    assert len(output_size) == 2
+    assert len(shift) == 2
+    if not isinstance(input_size, (np.ndarray, list)):
+        input_size = np.array([input_size, input_size], dtype=np.float32)
+    scale_tmp = input_size
+    shift = np.array(shift)
+    src_w = scale_tmp[0]
+    dst_w = output_size[0]
+    dst_h = output_size[1]
+    rot_rad = np.pi * rot / 180
+    src_dir = rotate_point([0., src_w * -0.5], rot_rad)
+    dst_dir = np.array([0., dst_w * -0.5])
+    src = np.zeros((3, 2), dtype=np.float32)
+    src[0, :] = center + scale_tmp * shift
+    src[1, :] = center + src_dir + scale_tmp * shift
+    src[2, :] = _get_3rd_point(src[0, :], src[1, :])
+    dst = np.zeros((3, 2), dtype=np.float32)
+    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
+    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
+    dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
+    if inv:
+        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
+    else:
+        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
+    return trans
+def get_warp_matrix(theta, size_input, size_dst, size_target):
+    """This code is based on 
+        https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py
+        Calculate the transformation matrix under the constraint of unbiased.
+    Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
+    Data Processing for Human Pose Estimation (CVPR 2020).
+    Args:
+        theta (float): Rotation angle in degrees.
+        size_input (np.ndarray): Size of input image [w, h].
+        size_dst (np.ndarray): Size of output image [w, h].
+        size_target (np.ndarray): Size of ROI in input plane [w, h].
+    Returns:
+        matrix (np.ndarray): A matrix for transformation.
+    """
+    theta = np.deg2rad(theta)
+    matrix = np.zeros((2, 3), dtype=np.float32)
+    scale_x = size_dst[0] / size_target[0]
+    scale_y = size_dst[1] / size_target[1]
+    matrix[0, 0] = np.cos(theta) * scale_x
+    matrix[0, 1] = -np.sin(theta) * scale_x
+    matrix[0, 2] = scale_x * (
+        -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
+        np.sin(theta) + 0.5 * size_target[0])
+    matrix[1, 0] = np.sin(theta) * scale_y
+    matrix[1, 1] = np.cos(theta) * scale_y
+    matrix[1, 2] = scale_y * (
+        -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
+        np.cos(theta) + 0.5 * size_target[1])
+    return matrix
+def rotate_point(pt, angle_rad):
+    """Rotate a point by an angle.
+    Args:
+        pt (list[float]): 2 dimensional point to be rotated
+        angle_rad (float): rotation angle by radian
+    Returns:
+        list[float]: Rotated point.
+    """
+    assert len(pt) == 2
+    sn, cs = np.sin(angle_rad), np.cos(angle_rad)
+    new_x = pt[0] * cs - pt[1] * sn
+    new_y = pt[0] * sn + pt[1] * cs
+    rotated_pt = [new_x, new_y]
+    return rotated_pt
+def _get_3rd_point(a, b):
+    """To calculate the affine matrix, three pairs of points are required. This
+    function is used to get the 3rd point, given 2D points a & b.
+    The 3rd point is defined by rotating vector `a - b` by 90 degrees
+    anticlockwise, using b as the rotation center.
+    Args:
+        a (np.ndarray): point(x,y)
+        b (np.ndarray): point(x,y)
+    Returns:
+        np.ndarray: The 3rd point.
+    """
+    assert len(a) == 2
+    assert len(b) == 2
+    direction = a - b
+    third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
+    return third_pt
+class TopDownEvalAffine(object):
+    """apply affine transform to image and coords
+    Args:
+        trainsize (list): [w, h], the standard size used to train
+        use_udp (bool): whether to use Unbiased Data Processing.
+        records(dict): the dict contained the image and coords
+    Returns:
+        records (dict): contain the image and coords after tranformed
+    """
+    def __init__(self, trainsize, use_udp=False):
+        self.trainsize = trainsize
+        self.use_udp = use_udp
+    def __call__(self, image, im_info):
+        rot = 0
+        imshape = im_info['im_shape'][::-1]
+        center = im_info['center'] if 'center' in im_info else imshape / 2.
+        scale = im_info['scale'] if 'scale' in im_info else imshape
+        if self.use_udp:
+            trans = get_warp_matrix(
+                rot, center * 2.0,
+                [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+        else:
+            trans = get_affine_transform(center, scale, rot, self.trainsize)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+        return image, im_info
+def expand_crop(images, rect, expand_ratio=0.3):
+    imgh, imgw, c = images.shape
+    label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()]
+    if label != 0:
+        return None, None, None
+    org_rect = [xmin, ymin, xmax, ymax]
+    h_half = (ymax - ymin) * (1 + expand_ratio) / 2.
+    w_half = (xmax - xmin) * (1 + expand_ratio) / 2.
+    if h_half > w_half * 4 / 3:
+        w_half = h_half * 0.75
+    center = [(ymin + ymax) / 2., (xmin + xmax) / 2.]
+    ymin = max(0, int(center[0] - h_half))
+    ymax = min(imgh - 1, int(center[0] + h_half))
+    xmin = max(0, int(center[1] - w_half))
+    xmax = min(imgw - 1, int(center[1] + w_half))
+    return images[ymin:ymax, xmin:xmax, :], [xmin, ymin, xmax, ymax], org_rect
+class EvalAffine(object):
+    def __init__(self, size, stride=64):
+        super(EvalAffine, self).__init__()
+        self.size = size
+        self.stride = stride
+    def __call__(self, image, im_info):
+        s = self.size
+        h, w, _ = image.shape
+        trans, size_resized = get_affine_mat_kernel(h, w, s, inv=False)
+        image_resized = cv2.warpAffine(image, trans, size_resized)
+        return image_resized, im_info
+def get_affine_mat_kernel(h, w, s, inv=False):
+    if w < h:
+        w_ = s
+        h_ = int(np.ceil((s / w * h) / 64.) * 64)
+        scale_w = w
+        scale_h = h_ / w_ * w
+    else:
+        h_ = s
+        w_ = int(np.ceil((s / h * w) / 64.) * 64)
+        scale_h = h
+        scale_w = w_ / h_ * h
+    center = np.array([np.round(w / 2.), np.round(h / 2.)])
+    size_resized = (w_, h_)
+    trans = get_affine_transform(
+        center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv)
+    return trans, size_resized
+def get_affine_transform(center,
+                         input_size,
+                         rot,
+                         output_size,
+                         shift=(0., 0.),
+                         inv=False):
+    """Get the affine transform matrix, given the center/scale/rot/output_size.
+    Args:
+        center (np.ndarray[2, ]): Center of the bounding box (x, y).
+        scale (np.ndarray[2, ]): Scale of the bounding box
+            wrt [width, height].
+        rot (float): Rotation angle (degree).
+        output_size (np.ndarray[2, ]): Size of the destination heatmaps.
+        shift (0-100%): Shift translation ratio wrt the width/height.
+            Default (0., 0.).
+        inv (bool): Option to inverse the affine transform direction.
+            (inv=False: src->dst or inv=True: dst->src)
+    Returns:
+        np.ndarray: The transform matrix.
+    """
+    assert len(center) == 2
+    assert len(output_size) == 2
+    assert len(shift) == 2
+    if not isinstance(input_size, (np.ndarray, list)):
+        input_size = np.array([input_size, input_size], dtype=np.float32)
+    scale_tmp = input_size
+    shift = np.array(shift)
+    src_w = scale_tmp[0]
+    dst_w = output_size[0]
+    dst_h = output_size[1]
+    rot_rad = np.pi * rot / 180
+    src_dir = rotate_point([0., src_w * -0.5], rot_rad)
+    dst_dir = np.array([0., dst_w * -0.5])
+    src = np.zeros((3, 2), dtype=np.float32)
+    src[0, :] = center + scale_tmp * shift
+    src[1, :] = center + src_dir + scale_tmp * shift
+    src[2, :] = _get_3rd_point(src[0, :], src[1, :])
+    dst = np.zeros((3, 2), dtype=np.float32)
+    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
+    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
+    dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
+    if inv:
+        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
+    else:
+        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
+    return trans
+def get_warp_matrix(theta, size_input, size_dst, size_target):
+    """This code is based on 
+        https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py
+        Calculate the transformation matrix under the constraint of unbiased.
+    Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
+    Data Processing for Human Pose Estimation (CVPR 2020).
+    Args:
+        theta (float): Rotation angle in degrees.
+        size_input (np.ndarray): Size of input image [w, h].
+        size_dst (np.ndarray): Size of output image [w, h].
+        size_target (np.ndarray): Size of ROI in input plane [w, h].
+    Returns:
+        matrix (np.ndarray): A matrix for transformation.
+    """
+    theta = np.deg2rad(theta)
+    matrix = np.zeros((2, 3), dtype=np.float32)
+    scale_x = size_dst[0] / size_target[0]
+    scale_y = size_dst[1] / size_target[1]
+    matrix[0, 0] = np.cos(theta) * scale_x
+    matrix[0, 1] = -np.sin(theta) * scale_x
+    matrix[0, 2] = scale_x * (
+        -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
+        np.sin(theta) + 0.5 * size_target[0])
+    matrix[1, 0] = np.sin(theta) * scale_y
+    matrix[1, 1] = np.cos(theta) * scale_y
+    matrix[1, 2] = scale_y * (
+        -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
+        np.cos(theta) + 0.5 * size_target[1])
+    return matrix
+def rotate_point(pt, angle_rad):
+    """Rotate a point by an angle.
+    Args:
+        pt (list[float]): 2 dimensional point to be rotated
+        angle_rad (float): rotation angle by radian
+    Returns:
+        list[float]: Rotated point.
+    """
+    assert len(pt) == 2
+    sn, cs = np.sin(angle_rad), np.cos(angle_rad)
+    new_x = pt[0] * cs - pt[1] * sn
+    new_y = pt[0] * sn + pt[1] * cs
+    rotated_pt = [new_x, new_y]
+    return rotated_pt
+def _get_3rd_point(a, b):
+    """To calculate the affine matrix, three pairs of points are required. This
+    function is used to get the 3rd point, given 2D points a & b.
+    The 3rd point is defined by rotating vector `a - b` by 90 degrees
+    anticlockwise, using b as the rotation center.
+    Args:
+        a (np.ndarray): point(x,y)
+        b (np.ndarray): point(x,y)
+    Returns:
+        np.ndarray: The 3rd point.
+    """
+    assert len(a) == 2
+    assert len(b) == 2
+    direction = a - b
+    third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
+    return third_pt
+class TopDownEvalAffine(object):
+    """apply affine transform to image and coords
+    Args:
+        trainsize (list): [w, h], the standard size used to train
+        use_udp (bool): whether to use Unbiased Data Processing.
+        records(dict): the dict contained the image and coords
+    Returns:
+        records (dict): contain the image and coords after tranformed
+    """
+    def __init__(self, trainsize, use_udp=False):
+        self.trainsize = trainsize
+        self.use_udp = use_udp
+    def __call__(self, image, im_info):
+        rot = 0
+        imshape = im_info['im_shape'][::-1]
+        center = im_info['center'] if 'center' in im_info else imshape / 2.
+        scale = im_info['scale'] if 'scale' in im_info else imshape
+        if self.use_udp:
+            trans = get_warp_matrix(
+                rot, center * 2.0,
+                [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+        else:
+            trans = get_affine_transform(center, scale, rot, self.trainsize)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+        return image, im_info
+def expand_crop(images, rect, expand_ratio=0.3):
+    imgh, imgw, c = images.shape
+    label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()]
+    if label != 0:
+        return None, None, None
+    org_rect = [xmin, ymin, xmax, ymax]
+    h_half = (ymax - ymin) * (1 + expand_ratio) / 2.
+    w_half = (xmax - xmin) * (1 + expand_ratio) / 2.
+    if h_half > w_half * 4 / 3:
+        w_half = h_half * 0.75
+    center = [(ymin + ymax) / 2., (xmin + xmax) / 2.]
+    ymin = max(0, int(center[0] - h_half))
+    ymax = min(imgh - 1, int(center[0] + h_half))
+    xmin = max(0, int(center[1] - w_half))
+    xmax = min(imgw - 1, int(center[1] + w_half))
+    return images[ymin:ymax, xmin:xmax, :], [xmin, ymin, xmax, ymax], org_rect
+def preprocess(im, preprocess_ops):
+    # process image by preprocess_ops
+    im_info = {
+        'scale_factor': np.array(
+            [1., 1.], dtype=np.float32),
+        'im_shape': None,
+    }
+    im, im_info = decode_image(im, im_info)
+    for operator in preprocess_ops:
+        im, im_info = operator(im, im_info)
+    return im, im_info
--- a/tutorials/pp-series/HRNet-Keypoint/deploy/utils.py
+++ b/tutorials/pp-series/HRNet-Keypoint/deploy/utils.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import time
+import os
+import ast
+import argparse
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--model_dir",
+        type=str,
+        default=None,
+        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
+              "'infer_cfg.yml', created by tools/export_model.py."),
+        required=True)
+    parser.add_argument(
+        "--image_file", type=str, default=None, help="Path of image file.")
+    parser.add_argument(
+        "--image_dir",
+        type=str,
+        default=None,
+        help="Dir of image file, `image_file` has a higher priority.")
+    parser.add_argument(
+        "--batch_size", type=int, default=1, help="batch_size for inference.")
+    parser.add_argument(
+        "--video_file",
+        type=str,
+        default=None,
+        help="Path of video file, `video_file` or `camera_id` has a highest priority."
+    )
+    parser.add_argument(
+        "--camera_id",
+        type=int,
+        default=-1,
+        help="device id of camera to predict.")
+    parser.add_argument(
+        "--threshold", type=float, default=0.5, help="Threshold of score.")
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default="output",
+        help="Directory of output visualization files.")
+    parser.add_argument(
+        "--run_mode",
+        type=str,
+        default='paddle',
+        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU."
+    )
+    parser.add_argument(
+        "--use_gpu",
+        type=ast.literal_eval,
+        default=False,
+        help="Deprecated, please use `--device`.")
+    parser.add_argument(
+        "--run_benchmark",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether to predict a image_file repeatedly for benchmark")
+    parser.add_argument(
+        "--enable_mkldnn",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether use mkldnn with CPU.")
+    parser.add_argument(
+        "--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
+    parser.add_argument(
+        "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_max_shape",
+        type=int,
+        default=1280,
+        help="max_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_opt_shape",
+        type=int,
+        default=640,
+        help="opt_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_calib_mode",
+        type=bool,
+        default=False,
+        help="If the model is produced by TRT offline quantitative "
+        "calibration, trt_calib_mode need to set True.")
+    parser.add_argument(
+        '--save_images',
+        action='store_true',
+        help='Save visualization image results.')
+    parser.add_argument(
+        '--use_dark',
+        type=bool,
+        default=True,
+        help='whether to use darkpose to get better keypoint position predict ')
+    return parser
+class Times(object):
+    def __init__(self):
+        self.time = 0.
+        # start time
+        self.st = 0.
+        # end time
+        self.et = 0.
+    def start(self):
+        self.st = time.time()
+    def end(self, repeats=1, accumulative=True):
+        self.et = time.time()
+        if accumulative:
+            self.time += (self.et - self.st) / repeats
+        else:
+            self.time = (self.et - self.st) / repeats
+    def reset(self):
+        self.time = 0.
+        self.st = 0.
+        self.et = 0.
+    def value(self):
+        return round(self.time, 4)
+class Timer(Times):
+    def __init__(self):
+        super(Timer, self).__init__()
+        self.preprocess_time_s = Times()
+        self.inference_time_s = Times()
+        self.postprocess_time_s = Times()
+        self.img_num = 0
+    def info(self, average=False):
+        total_time = self.preprocess_time_s.value(
+        ) + self.inference_time_s.value() + self.postprocess_time_s.value()
+        total_time = round(total_time, 4)
+        print("------------------ Inference Time Info ----------------------")
+        print("total_time(ms): {}, img_num: {}".format(total_time * 1000,
+                                                       self.img_num))
+        preprocess_time = round(
+            self.preprocess_time_s.value() / max(1, self.img_num),
+            4) if average else self.preprocess_time_s.value()
+        postprocess_time = round(
+            self.postprocess_time_s.value() / max(1, self.img_num),
+            4) if average else self.postprocess_time_s.value()
+        inference_time = round(self.inference_time_s.value() /
+                               max(1, self.img_num),
+                               4) if average else self.inference_time_s.value()
+        average_latency = total_time / max(1, self.img_num)
+        qps = 0
+        if total_time > 0:
+            qps = 1 / average_latency
+        print("average latency time(ms): {:.2f}, QPS: {:2f}".format(
+            average_latency * 1000, qps))
+        print(
+            "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}".
+            format(preprocess_time * 1000, inference_time * 1000,
+                   postprocess_time * 1000))
+    def report(self, average=False):
+        dic = {}
+        dic['preprocess_time_s'] = round(
+            self.preprocess_time_s.value() / max(1, self.img_num),
+            4) if average else self.preprocess_time_s.value()
+        dic['postprocess_time_s'] = round(
+            self.postprocess_time_s.value() / max(1, self.img_num),
+            4) if average else self.postprocess_time_s.value()
+        dic['inference_time_s'] = round(
+            self.inference_time_s.value() / max(1, self.img_num),
+            4) if average else self.inference_time_s.value()
+        dic['img_num'] = self.img_num
+        total_time = self.preprocess_time_s.value(
+        ) + self.inference_time_s.value() + self.postprocess_time_s.value()
+        dic['total_time_s'] = round(total_time, 4)
+        return dic
+def get_current_memory_mb():
+    """
+    It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
+    And this function Current program is time-consuming.
+    """
+    import pynvml
+    import psutil
+    import GPUtil
+    gpu_id = int(os.environ.get('CUDA_VISIBLE_DEVICES', 0))
+    pid = os.getpid()
+    p = psutil.Process(pid)
+    info = p.memory_full_info()
+    cpu_mem = info.uss / 1024. / 1024.
+    gpu_mem = 0
+    gpu_percent = 0
+    gpus = GPUtil.getGPUs()
+    if gpu_id is not None and len(gpus) > 0:
+        gpu_percent = gpus[gpu_id].load
+        pynvml.nvmlInit()
+        handle = pynvml.nvmlDeviceGetHandleByIndex(0)
+        meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
+        gpu_mem = meminfo.used / 1024. / 1024.
+    return round(cpu_mem, 4), round(gpu_mem, 4), round(gpu_percent, 4)
--- a/tutorials/pp-series/HRNet-Keypoint/deploy/visualize.py
+++ b/tutorials/pp-series/HRNet-Keypoint/deploy/visualize.py
+# coding: utf-8
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import division
+import os
+import cv2
+import numpy as np
+from PIL import Image, ImageDraw, ImageFile
+ImageFile.LOAD_TRUNCATED_IMAGES = True
+import math
+def get_color(idx):
+    idx = idx * 3
+    color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
+    return color
+def draw_pose(imgfile,
+              results,
+              visual_thread=0.6,
+              save_name='pose.jpg',
+              save_dir='output',
+              returnimg=False,
+              ids=None):
+    try:
+        import matplotlib.pyplot as plt
+        import matplotlib
+        plt.switch_backend('agg')
+    except Exception as e:
+        logger.error('Matplotlib not found, please install matplotlib.'
+                     'for example: `pip install matplotlib`.')
+        raise e
+    skeletons, scores = results['keypoint']
+    skeletons = np.array(skeletons)
+    kpt_nums = 17
+    if len(skeletons) > 0:
+        kpt_nums = skeletons.shape[1]
+    if kpt_nums == 17:  #plot coco keypoint
+        EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7),
+                 (6, 8), (7, 9), (8, 10), (5, 11), (6, 12), (11, 13), (12, 14),
+                 (13, 15), (14, 16), (11, 12)]
+    else:  #plot mpii keypoint
+        EDGES = [(0, 1), (1, 2), (3, 4), (4, 5), (2, 6), (3, 6), (6, 7),
+                 (7, 8), (8, 9), (10, 11), (11, 12), (13, 14), (14, 15),
+                 (8, 12), (8, 13)]
+    NUM_EDGES = len(EDGES)
+    colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
+            [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
+            [170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
+    cmap = matplotlib.cm.get_cmap('hsv')
+    plt.figure()
+    img = cv2.imread(imgfile) if type(imgfile) == str else imgfile
+    color_set = results['colors'] if 'colors' in results else None
+    if 'bbox' in results and ids is None:
+        bboxs = results['bbox']
+        for j, rect in enumerate(bboxs):
+            xmin, ymin, xmax, ymax = rect
+            color = colors[0] if color_set is None else colors[color_set[j] %
+                                                               len(colors)]
+            cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1)
+    canvas = img.copy()
+    for i in range(kpt_nums):
+        for j in range(len(skeletons)):
+            if skeletons[j][i, 2] < visual_thread:
+                continue
+            if ids is None:
+                color = colors[i] if color_set is None else colors[color_set[j]
+                                                                   %
+                                                                   len(colors)]
+            else:
+                color = get_color(ids[j])
+            cv2.circle(
+                canvas,
+                tuple(skeletons[j][i, 0:2].astype('int32')),
+                2,
+                color,
+                thickness=-1)
+    to_plot = cv2.addWeighted(img, 0.3, canvas, 0.7, 0)
+    fig = matplotlib.pyplot.gcf()
+    stickwidth = 2
+    for i in range(NUM_EDGES):
+        for j in range(len(skeletons)):
+            edge = EDGES[i]
+            if skeletons[j][edge[0], 2] < visual_thread or skeletons[j][edge[
+                    1], 2] < visual_thread:
+                continue
+            cur_canvas = canvas.copy()
+            X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]]
+            Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]]
+            mX = np.mean(X)
+            mY = np.mean(Y)
+            length = ((X[0] - X[1])**2 + (Y[0] - Y[1])**2)**0.5
+            angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
+            polygon = cv2.ellipse2Poly((int(mY), int(mX)),
+                                       (int(length / 2), stickwidth),
+                                       int(angle), 0, 360, 1)
+            if ids is None:
+                color = colors[i] if color_set is None else colors[color_set[j]
+                                                                   %
+                                                                   len(colors)]
+            else:
+                color = get_color(ids[j])
+            cv2.fillConvexPoly(cur_canvas, polygon, color)
+            canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0)
+    if returnimg:
+        return canvas
+    save_name = os.path.join(
+        save_dir, os.path.splitext(os.path.basename(imgfile))[0] + '_vis.jpg')
+    plt.imsave(save_name, canvas[:, :, ::-1])
+    print("keypoint visualize image saved to: " + save_name)
+    plt.close()
--- a/tutorials/pp-series/HRNet-Keypoint/lib/__init__.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/__init__.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import lib.utils
+import lib.models
+import lib.metrics
+import lib.dataset
+import lib.core
--- a/tutorials/pp-series/HRNet-Keypoint/lib/core/__init__.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/core/__init__.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import callbacks
+from . import optimizer
+from . import trainer
+from .callbacks import *
+from .optimizer import *
+from .trainer import *
+__all__ = callbacks.__all__ \
+          + optimizer.__all__ + trainer.__all__
--- a/tutorials/pp-series/HRNet-Keypoint/lib/core/callbacks.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/core/callbacks.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+import datetime
+import six
+import copy
+import json
+import paddle
+import paddle.distributed as dist
+from lib.utils.checkpoint import save_model
+from lib.metrics.coco_utils import get_infer_results
+from lib.utils.logger import setup_logger
+logger = setup_logger('hrnet')
+__all__ = [
+    'Callback', 'ComposeCallback', 'LogPrinter', 'Checkpointer',
+    'VisualDLWriter'
+]
+class Callback(object):
+    def __init__(self, model):
+        self.model = model
+    def on_step_begin(self, status):
+        pass
+    def on_step_end(self, status):
+        pass
+    def on_epoch_begin(self, status):
+        pass
+    def on_epoch_end(self, status):
+        pass
+    def on_train_begin(self, status):
+        pass
+    def on_train_end(self, status):
+        pass
+class ComposeCallback(object):
+    def __init__(self, callbacks):
+        callbacks = [c for c in list(callbacks) if c is not None]
+        for c in callbacks:
+            assert isinstance(
+                c, Callback), "callback should be subclass of Callback"
+        self._callbacks = callbacks
+    def on_step_begin(self, status):
+        for c in self._callbacks:
+            c.on_step_begin(status)
+    def on_step_end(self, status):
+        for c in self._callbacks:
+            c.on_step_end(status)
+    def on_epoch_begin(self, status):
+        for c in self._callbacks:
+            c.on_epoch_begin(status)
+    def on_epoch_end(self, status):
+        for c in self._callbacks:
+            c.on_epoch_end(status)
+    def on_train_begin(self, status):
+        for c in self._callbacks:
+            c.on_train_begin(status)
+    def on_train_end(self, status):
+        for c in self._callbacks:
+            c.on_train_end(status)
+class LogPrinter(Callback):
+    def __init__(self, model):
+        super(LogPrinter, self).__init__(model)
+    def on_step_end(self, status):
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
+            mode = status['mode']
+            if mode == 'train':
+                epoch_id = status['epoch_id']
+                step_id = status['step_id']
+                steps_per_epoch = status['steps_per_epoch']
+                training_staus = status['training_staus']
+                batch_time = status['batch_time']
+                data_time = status['data_time']
+                epoches = self.model.cfg.epoch
+                batch_size = self.model.cfg['{}Reader'.format(mode.capitalize(
+                ))]['batch_size']
+                logs = training_staus.log()
+                space_fmt = ':' + str(len(str(steps_per_epoch))) + 'd'
+                if step_id % self.model.cfg.log_iter == 0:
+                    eta_steps = (epoches - epoch_id
+                                 ) * steps_per_epoch - step_id
+                    eta_sec = eta_steps * batch_time.global_avg
+                    eta_str = str(datetime.timedelta(seconds=int(eta_sec)))
+                    ips = float(batch_size) / batch_time.avg
+                    fmt = ' '.join([
+                        'Epoch: [{}]',
+                        '[{' + space_fmt + '}/{}]',
+                        'learning_rate: {lr:.6f}',
+                        '{meters}',
+                        'eta: {eta}',
+                        'batch_cost: {btime}',
+                        'data_cost: {dtime}',
+                        'ips: {ips:.4f} images/s',
+                    ])
+                    fmt = fmt.format(
+                        epoch_id,
+                        step_id,
+                        steps_per_epoch,
+                        lr=status['learning_rate'],
+                        meters=logs,
+                        eta=eta_str,
+                        btime=str(batch_time),
+                        dtime=str(data_time),
+                        ips=ips)
+                    logger.info(fmt)
+            if mode == 'eval':
+                step_id = status['step_id']
+                if step_id % 100 == 0:
+                    logger.info("Eval iter: {}".format(step_id))
+    def on_epoch_end(self, status):
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
+            mode = status['mode']
+            if mode == 'eval':
+                sample_num = status['sample_num']
+                cost_time = status['cost_time']
+                logger.info('Total sample number: {}, averge FPS: {}'.format(
+                    sample_num, sample_num / cost_time))
+class Checkpointer(Callback):
+    def __init__(self, model):
+        super(Checkpointer, self).__init__(model)
+        cfg = self.model.cfg
+        self.best_ap = 0.
+        self.save_dir = os.path.join(self.model.cfg.save_dir,
+                                     self.model.cfg.filename)
+        if hasattr(self.model.model, 'student_model'):
+            self.weight = self.model.model.student_model
+        else:
+            self.weight = self.model.model
+    def on_epoch_end(self, status):
+        # Checkpointer only performed during training
+        mode = status['mode']
+        epoch_id = status['epoch_id']
+        weight = None
+        save_name = None
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
+            if mode == 'train':
+                end_epoch = self.model.cfg.epoch
+                if (
+                        epoch_id + 1
+                ) % self.model.cfg.snapshot_epoch == 0 or epoch_id == end_epoch - 1:
+                    save_name = str(
+                        epoch_id
+                    ) if epoch_id != end_epoch - 1 else "model_final"
+                    weight = self.weight
+            elif mode == 'eval':
+                if 'save_best_model' in status and status['save_best_model']:
+                    for metric in self.model._metrics:
+                        map_res = metric.get_results()
+                        if 'bbox' in map_res:
+                            key = 'bbox'
+                        elif 'keypoint' in map_res:
+                            key = 'keypoint'
+                        else:
+                            key = 'mask'
+                        if key not in map_res:
+                            logger.warning("Evaluation results empty, this may be due to " \
+                                        "training iterations being too few or not " \
+                                        "loading the correct weights.")
+                            return
+                        if map_res[key][0] > self.best_ap:
+                            self.best_ap = map_res[key][0]
+                            save_name = 'best_model'
+                            weight = self.weight
+                        logger.info("Best test {} ap is {:0.3f}.".format(
+                            key, self.best_ap))
+            if weight:
+                save_model(weight, self.model.optimizer, self.save_dir,
+                           save_name, epoch_id + 1)
+class VisualDLWriter(Callback):
+    """
+    Use VisualDL to log data or image
+    """
+    def __init__(self, model):
+        super(VisualDLWriter, self).__init__(model)
+        assert six.PY3, "VisualDL requires Python >= 3.5"
+        try:
+            from visualdl import LogWriter
+        except Exception as e:
+            logger.error('visualdl not found, plaese install visualdl. '
+                         'for example: `pip install visualdl`.')
+            raise e
+        self.vdl_writer = LogWriter(
+            model.cfg.get('vdl_log_dir', 'vdl_log_dir/scalar'))
+        self.vdl_loss_step = 0
+        self.vdl_mAP_step = 0
+        self.vdl_image_step = 0
+        self.vdl_image_frame = 0
+    def on_step_end(self, status):
+        mode = status['mode']
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
+            if mode == 'train':
+                training_staus = status['training_staus']
+                for loss_name, loss_value in training_staus.get().items():
+                    self.vdl_writer.add_scalar(loss_name, loss_value,
+                                               self.vdl_loss_step)
+                    self.vdl_loss_step += 1
+            elif mode == 'test':
+                ori_image = status['original_image']
+                result_image = status['result_image']
+                self.vdl_writer.add_image(
+                    "original/frame_{}".format(self.vdl_image_frame),
+                    ori_image, self.vdl_image_step)
+                self.vdl_writer.add_image(
+                    "result/frame_{}".format(self.vdl_image_frame),
+                    result_image, self.vdl_image_step)
+                self.vdl_image_step += 1
+                # each frame can display ten pictures at most.
+                if self.vdl_image_step % 10 == 0:
+                    self.vdl_image_step = 0
+                    self.vdl_image_frame += 1
+    def on_epoch_end(self, status):
+        mode = status['mode']
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
+            if mode == 'eval':
+                for metric in self.model._metrics:
+                    for key, map_value in metric.get_results().items():
+                        self.vdl_writer.add_scalar("{}-mAP".format(key),
+                                                   map_value[0],
+                                                   self.vdl_mAP_step)
+                self.vdl_mAP_step += 1
--- a/tutorials/pp-series/HRNet-Keypoint/lib/core/export_utils.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/core/export_utils.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import yaml
+from collections import OrderedDict
+import paddle
+from lib.dataset.category import get_categories
+from lib.utils.logger import setup_logger
+logger = setup_logger('hrnet')
+# Global dictionary
+TRT_MIN_SUBGRAPH = {'HRNet': 3, }
+def _prune_input_spec(input_spec, program, targets):
+    # try to prune static program to figure out pruned input spec
+    # so we perform following operations in static mode
+    paddle.enable_static()
+    pruned_input_spec = [{}]
+    program = program.clone()
+    program = program._prune(targets=targets)
+    global_block = program.global_block()
+    for name, spec in input_spec[0].items():
+        try:
+            v = global_block.var(name)
+            pruned_input_spec[0][name] = spec
+        except Exception:
+            pass
+    paddle.disable_static()
+    return pruned_input_spec
+def _parse_reader(reader_cfg, dataset_cfg, metric, arch, image_shape):
+    preprocess_list = []
+    anno_file = dataset_cfg.get_anno()
+    clsid2catid, catid2name = get_categories(metric, anno_file, arch)
+    label_list = [str(cat) for cat in catid2name.values()]
+    fuse_normalize = reader_cfg.get('fuse_normalize', False)
+    sample_transforms = reader_cfg['sample_transforms']
+    for st in sample_transforms[1:]:
+        for key, value in st.items():
+            p = {'type': key}
+            if key == 'Resize':
+                if int(image_shape[1]) != -1:
+                    value['target_size'] = image_shape[1:]
+            if fuse_normalize and key == 'NormalizeImage':
+                continue
+            p.update(value)
+            preprocess_list.append(p)
+    return preprocess_list, label_list
+def _parse_tracker(tracker_cfg):
+    tracker_params = {}
+    for k, v in tracker_cfg.items():
+        tracker_params.update({k: v})
+    return tracker_params
+def _dump_infer_config(config, path, image_shape, model):
+    arch_state = False
+    from lib.utils.config.yaml_helpers import setup_orderdict
+    setup_orderdict()
+    use_dynamic_shape = True if image_shape[2] == -1 else False
+    infer_cfg = OrderedDict({
+        'mode': 'fluid',
+        'draw_threshold': 0.5,
+        'metric': config['metric'],
+        'use_dynamic_shape': use_dynamic_shape
+    })
+    infer_arch = config['architecture']
+    for arch, min_subgraph_size in TRT_MIN_SUBGRAPH.items():
+        if arch in infer_arch:
+            infer_cfg['arch'] = arch
+            infer_cfg['min_subgraph_size'] = min_subgraph_size
+            arch_state = True
+            break
+    if not arch_state:
+        logger.error(
+            'Architecture: {} is not supported for exporting model now.\n'.
+            format(infer_arch) +
+            'Please set TRT_MIN_SUBGRAPH in ppdet/engine/export_utils.py')
+        os._exit(0)
+    label_arch = 'keypoint_arch'
+    reader_cfg = config['TestReader']
+    dataset_cfg = config['TestDataset']
+    infer_cfg['Preprocess'], infer_cfg['label_list'] = _parse_reader(
+        reader_cfg, dataset_cfg, config['metric'], label_arch, image_shape[1:])
+    yaml.dump(infer_cfg, open(path, 'w'))
+    logger.info("Export inference config file to {}".format(
+        os.path.join(path)))
--- a/tutorials/pp-series/HRNet-Keypoint/lib/core/optimizer.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/core/optimizer.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import paddle
+import paddle.nn as nn
+import paddle.optimizer as optimizer
+import paddle.regularizer as regularizer
+from lib.utils.workspace import register, serializable
+__all__ = ['LearningRate', 'OptimizerBuilder']
+from ..utils.logger import setup_logger
+logger = setup_logger(__name__)
+@serializable
+class PiecewiseDecay(object):
+    """
+    Multi step learning rate decay
+    Args:
+        gamma (float | list): decay factor
+        milestones (list): steps at which to decay learning rate
+    """
+    def __init__(self,
+                 gamma=[0.1, 0.01],
+                 milestones=[8, 11],
+                 values=None,
+                 use_warmup=True):
+        super(PiecewiseDecay, self).__init__()
+        if type(gamma) is not list:
+            self.gamma = []
+            for i in range(len(milestones)):
+                self.gamma.append(gamma / 10**i)
+        else:
+            self.gamma = gamma
+        self.milestones = milestones
+        self.values = values
+        self.use_warmup = use_warmup
+    def __call__(self,
+                 base_lr=None,
+                 boundary=None,
+                 value=None,
+                 step_per_epoch=None):
+        if boundary is not None and self.use_warmup:
+            boundary.extend([int(step_per_epoch) * i for i in self.milestones])
+        else:
+            # do not use LinearWarmup
+            boundary = [int(step_per_epoch) * i for i in self.milestones]
+            value = [base_lr]  # during step[0, boundary[0]] is base_lr
+        # self.values is setted directly in config 
+        if self.values is not None:
+            assert len(self.milestones) + 1 == len(self.values)
+            return optimizer.lr.PiecewiseDecay(boundary, self.values)
+        # value is computed by self.gamma
+        value = value if value is not None else [base_lr]
+        for i in self.gamma:
+            value.append(base_lr * i)
+        return optimizer.lr.PiecewiseDecay(boundary, value)
+@serializable
+class LinearWarmup(object):
+    """
+    Warm up learning rate linearly
+    Args:
+        steps (int): warm up steps
+        start_factor (float): initial learning rate factor
+    """
+    def __init__(self, steps=500, start_factor=1. / 3):
+        super(LinearWarmup, self).__init__()
+        self.steps = steps
+        self.start_factor = start_factor
+    def __call__(self, base_lr, step_per_epoch):
+        boundary = []
+        value = []
+        for i in range(self.steps + 1):
+            if self.steps > 0:
+                alpha = i / self.steps
+                factor = self.start_factor * (1 - alpha) + alpha
+                lr = base_lr * factor
+                value.append(lr)
+            if i > 0:
+                boundary.append(i)
+        return boundary, value
+@register
+class LearningRate(object):
+    """
+    Learning Rate configuration
+    Args:
+        base_lr (float): base learning rate
+        schedulers (list): learning rate schedulers
+    """
+    __category__ = 'optim'
+    def __init__(self,
+                 base_lr=0.01,
+                 schedulers=[PiecewiseDecay(), LinearWarmup()]):
+        super(LearningRate, self).__init__()
+        self.base_lr = base_lr
+        self.schedulers = schedulers
+    def __call__(self, step_per_epoch):
+        assert len(self.schedulers) >= 1
+        if not self.schedulers[0].use_warmup:
+            return self.schedulers[0](base_lr=self.base_lr,
+                                      step_per_epoch=step_per_epoch)
+        # TODO: split warmup & decay 
+        # warmup
+        boundary, value = self.schedulers[1](self.base_lr, step_per_epoch)
+        # decay
+        decay_lr = self.schedulers[0](self.base_lr, boundary, value,
+                                      step_per_epoch)
+        return decay_lr
+@register
+class OptimizerBuilder():
+    """
+    Build optimizer handles
+    Args:
+        regularizer (object): an `Regularizer` instance
+        optimizer (object): an `Optimizer` instance
+    """
+    __category__ = 'optim'
+    def __init__(self,
+                 clip_grad_by_norm=None,
+                 regularizer={'type': 'L2',
+                              'factor': .0001},
+                 optimizer={'type': 'Momentum',
+                            'momentum': .9}):
+        self.clip_grad_by_norm = clip_grad_by_norm
+        self.regularizer = regularizer
+        self.optimizer = optimizer
+    def __call__(self, learning_rate, model=None):
+        if not isinstance(model, (list, tuple)):
+            model = [model]
+        if self.clip_grad_by_norm is not None:
+            grad_clip = nn.ClipGradByGlobalNorm(
+                clip_norm=self.clip_grad_by_norm)
+        else:
+            grad_clip = None
+        if self.regularizer and self.regularizer != 'None':
+            reg_type = self.regularizer['type'] + 'Decay'
+            reg_factor = self.regularizer['factor']
+            regularization = getattr(regularizer, reg_type)(reg_factor)
+        else:
+            regularization = None
+        optim_args = self.optimizer.copy()
+        optim_type = optim_args['type']
+        del optim_args['type']
+        optim_args['weight_decay'] = regularization
+        op = getattr(optimizer, optim_type)
+        params = []
+        for m in model:
+            if m is not None:
+                params.extend(m.parameters())
+        return op(learning_rate=learning_rate,
+                  parameters=params,
+                  grad_clip=grad_clip,
+                  **optim_args)
--- a/tutorials/pp-series/HRNet-Keypoint/lib/core/trainer.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/core/trainer.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+import copy
+import time
+import numpy as np
+from PIL import Image, ImageOps, ImageFile
+ImageFile.LOAD_TRUNCATED_IMAGES = True
+import paddle
+import paddle.distributed as dist
+from paddle.distributed import fleet
+from paddle import amp
+from paddle.static import InputSpec
+from lib.utils.workspace import create
+from lib.utils.checkpoint import load_weight, load_pretrain_weight
+from lib.utils.visualizer import visualize_results, save_result
+from lib.metrics.coco_utils import get_infer_results
+from lib.metrics import KeyPointTopDownCOCOEval
+from lib.dataset.category import get_categories
+import lib.utils.stats as stats
+from .callbacks import Callback, ComposeCallback, LogPrinter, Checkpointer, VisualDLWriter
+from .export_utils import _dump_infer_config, _prune_input_spec
+from lib.utils.logger import setup_logger
+logger = setup_logger('hrnet.pose')
+__all__ = ['Trainer']
+class Trainer(object):
+    def __init__(self, cfg, mode='train'):
+        self.cfg = cfg
+        assert mode.lower() in ['train', 'eval', 'test'], \
+                "mode should be 'train', 'eval' or 'test'"
+        self.mode = mode.lower()
+        self.optimizer = None
+        # init distillation config
+        self.distill_model = None
+        self.distill_loss = None
+        # build data loader
+        self.dataset = cfg['{}Dataset'.format(self.mode.capitalize())]
+        if self.mode == 'train':
+            self.loader = create('{}Reader'.format(self.mode.capitalize()))(
+                self.dataset, cfg.worker_num)
+        self.model = create(cfg.architecture)
+        #normalize params for deploy
+        self.model.load_meanstd(cfg['TestReader']['sample_transforms'])
+        # EvalDataset build with BatchSampler to evaluate in single device
+        if self.mode == 'eval':
+            self._eval_batch_sampler = paddle.io.BatchSampler(
+                self.dataset, batch_size=self.cfg.EvalReader['batch_size'])
+            self.loader = create('{}Reader'.format(self.mode.capitalize()))(
+                self.dataset, cfg.worker_num, self._eval_batch_sampler)
+        # TestDataset build after user set images, skip loader creation here
+        self._nranks = dist.get_world_size()
+        self._local_rank = dist.get_rank()
+        self.status = {}
+        self.start_epoch = 0
+        self.end_epoch = 0 if 'epoch' not in cfg else cfg.epoch
+        # initial default callbacks
+        self._init_callbacks()
+        # initial default metrics
+        self._init_metrics()
+        self._reset_metrics()
+    def _init_callbacks(self):
+        if self.mode == 'train':
+            self._callbacks = [LogPrinter(self), Checkpointer(self)]
+            if self.cfg.get('use_vdl', False):
+                self._callbacks.append(VisualDLWriter(self))
+            self._compose_callback = ComposeCallback(self._callbacks)
+        elif self.mode == 'eval':
+            self._callbacks = [LogPrinter(self)]
+            self._compose_callback = ComposeCallback(self._callbacks)
+        elif self.mode == 'test' and self.cfg.get('use_vdl', False):
+            self._callbacks = [VisualDLWriter(self)]
+            self._compose_callback = ComposeCallback(self._callbacks)
+        else:
+            self._callbacks = []
+            self._compose_callback = None
+    def _init_metrics(self, validate=False):
+        if self.mode == 'test' or (self.mode == 'train' and not validate):
+            self._metrics = []
+            return
+        if self.cfg.metric == 'KeyPointTopDownCOCOEval':
+            eval_dataset = self.cfg['EvalDataset']
+            eval_dataset.check_or_download_dataset()
+            anno_file = eval_dataset.get_anno()
+            save_prediction_only = self.cfg.get('save_prediction_only', False)
+            self._metrics = [
+                KeyPointTopDownCOCOEval(
+                    anno_file,
+                    len(eval_dataset),
+                    self.cfg.num_joints,
+                    self.cfg.save_dir,
+                    save_prediction_only=save_prediction_only)
+            ]
+        else:
+            logger.warning("Metric not support for metric type {}".format(
+                self.cfg.metric))
+            self._metrics = []
+    def init_optimizer(self, ):
+        # build optimizer in train mode
+        if self.mode == 'train':
+            steps_per_epoch = len(self.loader)
+            self.lr = create('LearningRate')(steps_per_epoch)
+            self.optimizer = create('OptimizerBuilder')(
+                self.lr, [self.model, self.distill_model])
+    def _reset_metrics(self):
+        for metric in self._metrics:
+            metric.reset()
+    def register_callbacks(self, callbacks):
+        callbacks = [c for c in list(callbacks) if c is not None]
+        for c in callbacks:
+            assert isinstance(c, Callback), \
+                    "metrics shoule be instances of subclass of Metric"
+        self._callbacks.extend(callbacks)
+        self._compose_callback = ComposeCallback(self._callbacks)
+    def register_metrics(self, metrics):
+        metrics = [m for m in list(metrics) if m is not None]
+        self._metrics.extend(metrics)
+    def load_weights(self, weights, model=None):
+        self.start_epoch = 0
+        if model is None:
+            model = self.model
+        load_pretrain_weight(self.model, weights)
+        logger.debug("Load weights {} to start training".format(weights))
+    def train(self, validate=False):
+        assert self.mode == 'train', "Model not in 'train' mode"
+        Init_mark = False
+        model = self.model
+        if self._nranks > 1:
+            model = paddle.DataParallel(
+                self.model,
+                find_unused_parameters=self.cfg.get("find_unused_parameters",
+                                                    False))
+        self.status.update({
+            'epoch_id': self.start_epoch,
+            'step_id': 0,
+            'steps_per_epoch': len(self.loader)
+        })
+        self.status['batch_time'] = stats.SmoothedValue(
+            self.cfg.log_iter, fmt='{avg:.4f}')
+        self.status['data_time'] = stats.SmoothedValue(
+            self.cfg.log_iter, fmt='{avg:.4f}')
+        self.status['training_staus'] = stats.TrainingStats(self.cfg.log_iter)
+        self._compose_callback.on_train_begin(self.status)
+        for epoch_id in range(self.start_epoch, self.cfg.epoch):
+            self.status['mode'] = 'train'
+            self.status['epoch_id'] = epoch_id
+            self._compose_callback.on_epoch_begin(self.status)
+            self.loader.dataset.set_epoch(epoch_id)
+            model.train()
+            iter_tic = time.time()
+            for step_id, data in enumerate(self.loader):
+                self.status['data_time'].update(time.time() - iter_tic)
+                self.status['step_id'] = step_id
+                self._compose_callback.on_step_begin(self.status)
+                data['epoch_id'] = epoch_id
+                # model forward
+                outputs = model(data)
+                if self.distill_model is not None:
+                    teacher_outputs = self.distill_model(data)
+                    distill_loss = self.distill_loss(outputs, teacher_outputs,
+                                                     data)
+                    loss = outputs['loss'] + teacher_outputs[
+                        "loss"] + distill_loss
+                else:
+                    loss = outputs['loss']
+                # model backward
+                loss.backward()
+                self.optimizer.step()
+                curr_lr = self.optimizer.get_lr()
+                self.lr.step()
+                self.optimizer.clear_grad()
+                self.status['learning_rate'] = curr_lr
+                if self._nranks < 2 or self._local_rank == 0:
+                    loss_dict = {"loss": outputs['loss']}
+                    if self.distill_model is not None:
+                        loss_dict.update({
+                            "loss_student": outputs['loss'],
+                            "loss_teacher": teacher_outputs["loss"],
+                            "loss_distill": distill_loss,
+                            "loss": loss
+                        })
+                    self.status['training_staus'].update(loss_dict)
+                self.status['batch_time'].update(time.time() - iter_tic)
+                self._compose_callback.on_step_end(self.status)
+                iter_tic = time.time()
+            self._compose_callback.on_epoch_end(self.status)
+            if validate and self._local_rank == 0 \
+                    and ((epoch_id + 1) % self.cfg.snapshot_epoch == 0 \
+                             or epoch_id == self.end_epoch - 1):
+                print("begin to eval...")
+                if not hasattr(self, '_eval_loader'):
+                    # build evaluation dataset and loader
+                    self._eval_dataset = self.cfg.EvalDataset
+                    self._eval_batch_sampler = \
+                        paddle.io.BatchSampler(
+                            self._eval_dataset,
+                            batch_size=self.cfg.EvalReader['batch_size'])
+                    self._eval_loader = create('EvalReader')(
+                        self._eval_dataset,
+                        self.cfg.worker_num,
+                        batch_sampler=self._eval_batch_sampler)
+                # if validation in training is enabled, metrics should be re-init
+                # Init_mark makes sure this code will only execute once
+                if validate and Init_mark == False:
+                    Init_mark = True
+                    self._init_metrics(validate=validate)
+                    self._reset_metrics()
+                with paddle.no_grad():
+                    self.status['save_best_model'] = True
+                    self._eval_with_loader(self._eval_loader)
+        self._compose_callback.on_train_end(self.status)
+    def _eval_with_loader(self, loader):
+        sample_num = 0
+        tic = time.time()
+        self._compose_callback.on_epoch_begin(self.status)
+        self.status['mode'] = 'eval'
+        self.model.eval()
+        for step_id, data in enumerate(loader):
+            self.status['step_id'] = step_id
+            self._compose_callback.on_step_begin(self.status)
+            # forward
+            outs = self.model(data)
+            # update metrics
+            for metric in self._metrics:
+                metric.update(data, outs)
+            sample_num += data['im_id'].numpy().shape[0]
+            self._compose_callback.on_step_end(self.status)
+        self.status['sample_num'] = sample_num
+        self.status['cost_time'] = time.time() - tic
+        # accumulate metric to log out
+        for metric in self._metrics:
+            metric.accumulate()
+            metric.log()
+        self._compose_callback.on_epoch_end(self.status)
+        # reset metric states for metric may performed multiple times
+        self._reset_metrics()
+    def evaluate(self):
+        with paddle.no_grad():
+            self._eval_with_loader(self.loader)
+    def predict(self,
+                images,
+                draw_threshold=0.5,
+                output_dir='output',
+                save_txt=False):
+        self.dataset.set_images(images)
+        loader = create('TestReader')(self.dataset, 0)
+        imid2path = self.dataset.get_imid2path()
+        anno_file = self.dataset.get_anno()
+        clsid2catid, catid2name = get_categories(
+            self.cfg.metric, anno_file=anno_file)
+        # Run Infer 
+        self.status['mode'] = 'test'
+        self.model.eval()
+        results = []
+        for step_id, data in enumerate(loader):
+            self.status['step_id'] = step_id
+            # forward
+            outs = self.model(data)
+            for key in ['im_shape', 'scale_factor', 'im_id']:
+                outs[key] = data[key]
+            for key, value in outs.items():
+                if hasattr(value, 'numpy'):
+                    outs[key] = value.numpy()
+            results.append(outs)
+        for outs in results:
+            batch_res = get_infer_results(outs, clsid2catid)
+            bbox_num = outs['bbox_num']
+            start = 0
+            for i, im_id in enumerate(outs['im_id']):
+                image_path = imid2path[int(im_id)]
+                image = Image.open(image_path).convert('RGB')
+                image = ImageOps.exif_transpose(image)
+                self.status['original_image'] = np.array(image.copy())
+                end = start + bbox_num[i]
+                bbox_res = batch_res['bbox'][start:end] \
+                        if 'bbox' in batch_res else None
+                keypoint_res = batch_res['keypoint'][start:end] \
+                        if 'keypoint' in batch_res else None
+                image = visualize_results(image, bbox_res, keypoint_res,
+                                          int(im_id), catid2name,
+                                          draw_threshold)
+                self.status['result_image'] = np.array(image.copy())
+                if self._compose_callback:
+                    self._compose_callback.on_step_end(self.status)
+                # save image with detection
+                save_name = self._get_save_image_name(output_dir, image_path)
+                logger.info("Detection bbox results save in {}".format(
+                    save_name))
+                image.save(save_name, quality=95)
+                if save_txt:
+                    save_path = os.path.splitext(save_name)[0] + '.txt'
+                    results = {}
+                    results["im_id"] = im_id
+                    if bbox_res:
+                        results["bbox_res"] = bbox_res
+                    if keypoint_res:
+                        results["keypoint_res"] = keypoint_res
+                    save_result(save_path, results, catid2name, draw_threshold)
+                start = end
+    def _get_save_image_name(self, output_dir, image_path):
+        """
+        Get save image name from source image path.
+        """
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+        image_name = os.path.split(image_path)[-1]
+        name, ext = os.path.splitext(image_name)
+        return os.path.join(output_dir, "{}".format(name)) + ext
+    def _get_infer_cfg_and_input_spec(self, save_dir, prune_input=True):
+        image_shape = [3, -1, -1]
+        im_shape = [None, 2]
+        scale_factor = [None, 2]
+        test_reader_name = 'TestReader'
+        if 'inputs_def' in self.cfg[test_reader_name]:
+            inputs_def = self.cfg[test_reader_name]['inputs_def']
+            image_shape = inputs_def.get('image_shape', None)
+        # set image_shape=[None, 3, -1, -1] as default
+        image_shape = [None] + image_shape
+        if hasattr(self.model, 'deploy'):
+            self.model.deploy = True
+        # Save infer cfg
+        _dump_infer_config(self.cfg,
+                           os.path.join(save_dir, 'infer_cfg.yml'),
+                           image_shape, self.model)
+        input_spec = [{
+            "image": InputSpec(
+                shape=image_shape, name='image'),
+            "im_shape": InputSpec(
+                shape=im_shape, name='im_shape'),
+            "scale_factor": InputSpec(
+                shape=scale_factor, name='scale_factor')
+        }]
+        if prune_input:
+            static_model = paddle.jit.to_static(
+                self.model, input_spec=input_spec)
+            # NOTE: dy2st do not pruned program, but jit.save will prune program
+            # input spec, prune input spec here and save with pruned input spec
+            pruned_input_spec = _prune_input_spec(
+                input_spec, static_model.forward.main_program,
+                static_model.forward.outputs)
+        else:
+            static_model = None
+            pruned_input_spec = input_spec
+        return static_model, pruned_input_spec
+    def export(self, output_dir='output_inference'):
+        self.model.eval()
+        model_name = os.path.splitext(os.path.split(self.cfg.filename)[-1])[0]
+        save_dir = os.path.join(output_dir, model_name)
+        if not os.path.exists(save_dir):
+            os.makedirs(save_dir)
+        static_model, pruned_input_spec = self._get_infer_cfg_and_input_spec(
+            save_dir)
+        # save model
+        if 'slim' not in self.cfg:
+            paddle.jit.save(
+                static_model,
+                os.path.join(save_dir, 'model'),
+                input_spec=pruned_input_spec)
+        else:
+            self.cfg.slim.save_quantized_model(
+                self.model,
+                os.path.join(save_dir, 'model'),
+                input_spec=pruned_input_spec)
+        logger.info("Export model and saved in {}".format(save_dir))
--- a/tutorials/pp-series/HRNet-Keypoint/lib/dataset/__init__.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/dataset/__init__.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import category
+from . import dataset
+from . import keypoint_coco
+from . import reader
+from . import transform
+from .category import *
+from .dataset import *
+from .keypoint_coco import *
+from .reader import *
+from .transform import *
+__all__ = category.__all__ + dataset.__all__ + keypoint_coco.__all__ \
+          + reader.__all__  + transform.__all__
--- a/tutorials/pp-series/HRNet-Keypoint/lib/dataset/category.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/dataset/category.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+from lib.utils.logger import setup_logger
+logger = setup_logger(__name__)
+__all__ = ['get_categories']
+def get_categories(metric_type, anno_file=None, arch=None):
+    """
+    Get class id to category id map and category id
+    to category name map from annotation file.
+    Args:
+        metric_type (str): metric type, currently support 'coco'.
+        anno_file (str): annotation file path
+    """
+    if arch == 'keypoint_arch':
+        return (None, {'id': 'keypoint'})
+    if metric_type.lower() == 'keypointtopdowncocoeval' or metric_type.lower(
+    ) == 'keypointtopdownmpiieval':
+        return (None, {'id': 'keypoint'})
+    else:
+        raise ValueError("unknown metric type {}".format(metric_type))
+def _mot_category(category='pedestrian'):
+    """
+    Get class id to category id map and category id
+    to category name map of mot dataset
+    """
+    label_map = {category: 0}
+    label_map = sorted(label_map.items(), key=lambda x: x[1])
+    cats = [l[0] for l in label_map]
+    clsid2catid = {i: i for i in range(len(cats))}
+    catid2name = {i: name for i, name in enumerate(cats)}
+    return clsid2catid, catid2name
+def _coco17_category():
+    """
+    Get class id to category id map and category id
+    to category name map of COCO2017 dataset
+    """
+    clsid2catid = {
+        1: 1,
+        2: 2,
+        3: 3,
+        4: 4,
+        5: 5,
+        6: 6,
+        7: 7,
+        8: 8,
+        9: 9,
+        10: 10,
+        11: 11,
+        12: 13,
+        13: 14,
+        14: 15,
+        15: 16,
+        16: 17,
+        17: 18,
+        18: 19,
+        19: 20,
+        20: 21,
+        21: 22,
+        22: 23,
+        23: 24,
+        24: 25,
+        25: 27,
+        26: 28,
+        27: 31,
+        28: 32,
+        29: 33,
+        30: 34,
+        31: 35,
+        32: 36,
+        33: 37,
+        34: 38,
+        35: 39,
+        36: 40,
+        37: 41,
+        38: 42,
+        39: 43,
+        40: 44,
+        41: 46,
+        42: 47,
+        43: 48,
+        44: 49,
+        45: 50,
+        46: 51,
+        47: 52,
+        48: 53,
+        49: 54,
+        50: 55,
+        51: 56,
+        52: 57,
+        53: 58,
+        54: 59,
+        55: 60,
+        56: 61,
+        57: 62,
+        58: 63,
+        59: 64,
+        60: 65,
+        61: 67,
+        62: 70,
+        63: 72,
+        64: 73,
+        65: 74,
+        66: 75,
+        67: 76,
+        68: 77,
+        69: 78,
+        70: 79,
+        71: 80,
+        72: 81,
+        73: 82,
+        74: 84,
+        75: 85,
+        76: 86,
+        77: 87,
+        78: 88,
+        79: 89,
+        80: 90
+    }
+    catid2name = {
+        0: 'background',
+        1: 'person',
+        2: 'bicycle',
+        3: 'car',
+        4: 'motorcycle',
+        5: 'airplane',
+        6: 'bus',
+        7: 'train',
+        8: 'truck',
+        9: 'boat',
+        10: 'traffic light',
+        11: 'fire hydrant',
+        13: 'stop sign',
+        14: 'parking meter',
+        15: 'bench',
+        16: 'bird',
+        17: 'cat',
+        18: 'dog',
+        19: 'horse',
+        20: 'sheep',
+        21: 'cow',
+        22: 'elephant',
+        23: 'bear',
+        24: 'zebra',
+        25: 'giraffe',
+        27: 'backpack',
+        28: 'umbrella',
+        31: 'handbag',
+        32: 'tie',
+        33: 'suitcase',
+        34: 'frisbee',
+        35: 'skis',
+        36: 'snowboard',
+        37: 'sports ball',
+        38: 'kite',
+        39: 'baseball bat',
+        40: 'baseball glove',
+        41: 'skateboard',
+        42: 'surfboard',
+        43: 'tennis racket',
+        44: 'bottle',
+        46: 'wine glass',
+        47: 'cup',
+        48: 'fork',
+        49: 'knife',
+        50: 'spoon',
+        51: 'bowl',
+        52: 'banana',
+        53: 'apple',
+        54: 'sandwich',
+        55: 'orange',
+        56: 'broccoli',
+        57: 'carrot',
+        58: 'hot dog',
+        59: 'pizza',
+        60: 'donut',
+        61: 'cake',
+        62: 'chair',
+        63: 'couch',
+        64: 'potted plant',
+        65: 'bed',
+        67: 'dining table',
+        70: 'toilet',
+        72: 'tv',
+        73: 'laptop',
+        74: 'mouse',
+        75: 'remote',
+        76: 'keyboard',
+        77: 'cell phone',
+        78: 'microwave',
+        79: 'oven',
+        80: 'toaster',
+        81: 'sink',
+        82: 'refrigerator',
+        84: 'book',
+        85: 'clock',
+        86: 'vase',
+        87: 'scissors',
+        88: 'teddy bear',
+        89: 'hair drier',
+        90: 'toothbrush'
+    }
+    clsid2catid = {k - 1: v for k, v in clsid2catid.items()}
+    catid2name.pop(0)
+    return clsid2catid, catid2name
+def _dota_category():
+    """
+    Get class id to category id map and category id
+    to category name map of dota dataset
+    """
+    catid2name = {
+        0: 'background',
+        1: 'plane',
+        2: 'baseball-diamond',
+        3: 'bridge',
+        4: 'ground-track-field',
+        5: 'small-vehicle',
+        6: 'large-vehicle',
+        7: 'ship',
+        8: 'tennis-court',
+        9: 'basketball-court',
+        10: 'storage-tank',
+        11: 'soccer-ball-field',
+        12: 'roundabout',
+        13: 'harbor',
+        14: 'swimming-pool',
+        15: 'helicopter'
+    }
+    catid2name.pop(0)
+    clsid2catid = {i: i + 1 for i in range(len(catid2name))}
+    return clsid2catid, catid2name
+def _oid19_category():
+    clsid2catid = {k: k + 1 for k in range(500)}
+    catid2name = {
+        0: "background",
+        1: "Infant bed",
+        2: "Rose",
+        3: "Flag",
+        4: "Flashlight",
+        5: "Sea turtle",
+        6: "Camera",
+        7: "Animal",
+        8: "Glove",
+        9: "Crocodile",
+        10: "Cattle",
+        11: "House",
+        12: "Guacamole",
+        13: "Penguin",
+        14: "Vehicle registration plate",
+        15: "Bench",
+        16: "Ladybug",
+        17: "Human nose",
+        18: "Watermelon",
+        19: "Flute",
+        20: "Butterfly",
+        21: "Washing machine",
+        22: "Raccoon",
+        23: "Segway",
+        24: "Taco",
+        25: "Jellyfish",
+        26: "Cake",
+        27: "Pen",
+        28: "Cannon",
+        29: "Bread",
+        30: "Tree",
+        31: "Shellfish",
+        32: "Bed",
+        33: "Hamster",
+        34: "Hat",
+        35: "Toaster",
+        36: "Sombrero",
+        37: "Tiara",
+        38: "Bowl",
+        39: "Dragonfly",
+        40: "Moths and butterflies",
+        41: "Antelope",
+        42: "Vegetable",
+        43: "Torch",
+        44: "Building",
+        45: "Power plugs and sockets",
+        46: "Blender",
+        47: "Billiard table",
+        48: "Cutting board",
+        49: "Bronze sculpture",
+        50: "Turtle",
+        51: "Broccoli",
+        52: "Tiger",
+        53: "Mirror",
+        54: "Bear",
+        55: "Zucchini",
+        56: "Dress",
+        57: "Volleyball",
+        58: "Guitar",
+        59: "Reptile",
+        60: "Golf cart",
+        61: "Tart",
+        62: "Fedora",
+        63: "Carnivore",
+        64: "Car",
+        65: "Lighthouse",
+        66: "Coffeemaker",
+        67: "Food processor",
+        68: "Truck",
+        69: "Bookcase",
+        70: "Surfboard",
+        71: "Footwear",
+        72: "Bench",
+        73: "Necklace",
+        74: "Flower",
+        75: "Radish",
+        76: "Marine mammal",
+        77: "Frying pan",
+        78: "Tap",
+        79: "Peach",
+        80: "Knife",
+        81: "Handbag",
+        82: "Laptop",
+        83: "Tent",
+        84: "Ambulance",
+        85: "Christmas tree",
+        86: "Eagle",
+        87: "Limousine",
+        88: "Kitchen & dining room table",
+        89: "Polar bear",
+        90: "Tower",
+        91: "Football",
+        92: "Willow",
+        93: "Human head",
+        94: "Stop sign",
+        95: "Banana",
+        96: "Mixer",
+        97: "Binoculars",
+        98: "Dessert",
+        99: "Bee",
+        100: "Chair",
+        101: "Wood-burning stove",
+        102: "Flowerpot",
+        103: "Beaker",
+        104: "Oyster",
+        105: "Woodpecker",
+        106: "Harp",
+        107: "Bathtub",
+        108: "Wall clock",
+        109: "Sports uniform",
+        110: "Rhinoceros",
+        111: "Beehive",
+        112: "Cupboard",
+        113: "Chicken",
+        114: "Man",
+        115: "Blue jay",
+        116: "Cucumber",
+        117: "Balloon",
+        118: "Kite",
+        119: "Fireplace",
+        120: "Lantern",
+        121: "Missile",
+        122: "Book",
+        123: "Spoon",
+        124: "Grapefruit",
+        125: "Squirrel",
+        126: "Orange",
+        127: "Coat",
+        128: "Punching bag",
+        129: "Zebra",
+        130: "Billboard",
+        131: "Bicycle",
+        132: "Door handle",
+        133: "Mechanical fan",
+        134: "Ring binder",
+        135: "Table",
+        136: "Parrot",
+        137: "Sock",
+        138: "Vase",
+        139: "Weapon",
+        140: "Shotgun",
+        141: "Glasses",
+        142: "Seahorse",
+        143: "Belt",
+        144: "Watercraft",
+        145: "Window",
+        146: "Giraffe",
+        147: "Lion",
+        148: "Tire",
+        149: "Vehicle",
+        150: "Canoe",
+        151: "Tie",
+        152: "Shelf",
+        153: "Picture frame",
+        154: "Printer",
+        155: "Human leg",
+        156: "Boat",
+        157: "Slow cooker",
+        158: "Croissant",
+        159: "Candle",
+        160: "Pancake",
+        161: "Pillow",
+        162: "Coin",
+        163: "Stretcher",
+        164: "Sandal",
+        165: "Woman",
+        166: "Stairs",
+        167: "Harpsichord",
+        168: "Stool",
+        169: "Bus",
+        170: "Suitcase",
+        171: "Human mouth",
+        172: "Juice",
+        173: "Skull",
+        174: "Door",
+        175: "Violin",
+        176: "Chopsticks",
+        177: "Digital clock",
+        178: "Sunflower",
+        179: "Leopard",
+        180: "Bell pepper",
+        181: "Harbor seal",
+        182: "Snake",
+        183: "Sewing machine",
+        184: "Goose",
+        185: "Helicopter",
+        186: "Seat belt",
+        187: "Coffee cup",
+        188: "Microwave oven",
+        189: "Hot dog",
+        190: "Countertop",
+        191: "Serving tray",
+        192: "Dog bed",
+        193: "Beer",
+        194: "Sunglasses",
+        195: "Golf ball",
+        196: "Waffle",
+        197: "Palm tree",
+        198: "Trumpet",
+        199: "Ruler",
+        200: "Helmet",
+        201: "Ladder",
+        202: "Office building",
+        203: "Tablet computer",
+        204: "Toilet paper",
+        205: "Pomegranate",
+        206: "Skirt",
+        207: "Gas stove",
+        208: "Cookie",
+        209: "Cart",
+        210: "Raven",
+        211: "Egg",
+        212: "Burrito",
+        213: "Goat",
+        214: "Kitchen knife",
+        215: "Skateboard",
+        216: "Salt and pepper shakers",
+        217: "Lynx",
+        218: "Boot",
+        219: "Platter",
+        220: "Ski",
+        221: "Swimwear",
+        222: "Swimming pool",
+        223: "Drinking straw",
+        224: "Wrench",
+        225: "Drum",
+        226: "Ant",
+        227: "Human ear",
+        228: "Headphones",
+        229: "Fountain",
+        230: "Bird",
+        231: "Jeans",
+        232: "Television",
+        233: "Crab",
+        234: "Microphone",
+        235: "Home appliance",
+        236: "Snowplow",
+        237: "Beetle",
+        238: "Artichoke",
+        239: "Jet ski",
+        240: "Stationary bicycle",
+        241: "Human hair",
+        242: "Brown bear",
+        243: "Starfish",
+        244: "Fork",
+        245: "Lobster",
+        246: "Corded phone",
+        247: "Drink",
+        248: "Saucer",
+        249: "Carrot",
+        250: "Insect",
+        251: "Clock",
+        252: "Castle",
+        253: "Tennis racket",
+        254: "Ceiling fan",
+        255: "Asparagus",
+        256: "Jaguar",
+        257: "Musical instrument",
+        258: "Train",
+        259: "Cat",
+        260: "Rifle",
+        261: "Dumbbell",
+        262: "Mobile phone",
+        263: "Taxi",
+        264: "Shower",
+        265: "Pitcher",
+        266: "Lemon",
+        267: "Invertebrate",
+        268: "Turkey",
+        269: "High heels",
+        270: "Bust",
+        271: "Elephant",
+        272: "Scarf",
+        273: "Barrel",
+        274: "Trombone",
+        275: "Pumpkin",
+        276: "Box",
+        277: "Tomato",
+        278: "Frog",
+        279: "Bidet",
+        280: "Human face",
+        281: "Houseplant",
+        282: "Van",
+        283: "Shark",
+        284: "Ice cream",
+        285: "Swim cap",
+        286: "Falcon",
+        287: "Ostrich",
+        288: "Handgun",
+        289: "Whiteboard",
+        290: "Lizard",
+        291: "Pasta",
+        292: "Snowmobile",
+        293: "Light bulb",
+        294: "Window blind",
+        295: "Muffin",
+        296: "Pretzel",
+        297: "Computer monitor",
+        298: "Horn",
+        299: "Furniture",
+        300: "Sandwich",
+        301: "Fox",
+        302: "Convenience store",
+        303: "Fish",
+        304: "Fruit",
+        305: "Earrings",
+        306: "Curtain",
+        307: "Grape",
+        308: "Sofa bed",
+        309: "Horse",
+        310: "Luggage and bags",
+        311: "Desk",
+        312: "Crutch",
+        313: "Bicycle helmet",
+        314: "Tick",
+        315: "Airplane",
+        316: "Canary",
+        317: "Spatula",
+        318: "Watch",
+        319: "Lily",
+        320: "Kitchen appliance",
+        321: "Filing cabinet",
+        322: "Aircraft",
+        323: "Cake stand",
+        324: "Candy",
+        325: "Sink",
+        326: "Mouse",
+        327: "Wine",
+        328: "Wheelchair",
+        329: "Goldfish",
+        330: "Refrigerator",
+        331: "French fries",
+        332: "Drawer",
+        333: "Treadmill",
+        334: "Picnic basket",
+        335: "Dice",
+        336: "Cabbage",
+        337: "Football helmet",
+        338: "Pig",
+        339: "Person",
+        340: "Shorts",
+        341: "Gondola",
+        342: "Honeycomb",
+        343: "Doughnut",
+        344: "Chest of drawers",
+        345: "Land vehicle",
+        346: "Bat",
+        347: "Monkey",
+        348: "Dagger",
+        349: "Tableware",
+        350: "Human foot",
+        351: "Mug",
+        352: "Alarm clock",
+        353: "Pressure cooker",
+        354: "Human hand",
+        355: "Tortoise",
+        356: "Baseball glove",
+        357: "Sword",
+        358: "Pear",
+        359: "Miniskirt",
+        360: "Traffic sign",
+        361: "Girl",
+        362: "Roller skates",
+        363: "Dinosaur",
+        364: "Porch",
+        365: "Human beard",
+        366: "Submarine sandwich",
+        367: "Screwdriver",
+        368: "Strawberry",
+        369: "Wine glass",
+        370: "Seafood",
+        371: "Racket",
+        372: "Wheel",
+        373: "Sea lion",
+        374: "Toy",
+        375: "Tea",
+        376: "Tennis ball",
+        377: "Waste container",
+        378: "Mule",
+        379: "Cricket ball",
+        380: "Pineapple",
+        381: "Coconut",
+        382: "Doll",
+        383: "Coffee table",
+        384: "Snowman",
+        385: "Lavender",
+        386: "Shrimp",
+        387: "Maple",
+        388: "Cowboy hat",
+        389: "Goggles",
+        390: "Rugby ball",
+        391: "Caterpillar",
+        392: "Poster",
+        393: "Rocket",
+        394: "Organ",
+        395: "Saxophone",
+        396: "Traffic light",
+        397: "Cocktail",
+        398: "Plastic bag",
+        399: "Squash",
+        400: "Mushroom",
+        401: "Hamburger",
+        402: "Light switch",
+        403: "Parachute",
+        404: "Teddy bear",
+        405: "Winter melon",
+        406: "Deer",
+        407: "Musical keyboard",
+        408: "Plumbing fixture",
+        409: "Scoreboard",
+        410: "Baseball bat",
+        411: "Envelope",
+        412: "Adhesive tape",
+        413: "Briefcase",
+        414: "Paddle",
+        415: "Bow and arrow",
+        416: "Telephone",
+        417: "Sheep",
+        418: "Jacket",
+        419: "Boy",
+        420: "Pizza",
+        421: "Otter",
+        422: "Office supplies",
+        423: "Couch",
+        424: "Cello",
+        425: "Bull",
+        426: "Camel",
+        427: "Ball",
+        428: "Duck",
+        429: "Whale",
+        430: "Shirt",
+        431: "Tank",
+        432: "Motorcycle",
+        433: "Accordion",
+        434: "Owl",
+        435: "Porcupine",
+        436: "Sun hat",
+        437: "Nail",
+        438: "Scissors",
+        439: "Swan",
+        440: "Lamp",
+        441: "Crown",
+        442: "Piano",
+        443: "Sculpture",
+        444: "Cheetah",
+        445: "Oboe",
+        446: "Tin can",
+        447: "Mango",
+        448: "Tripod",
+        449: "Oven",
+        450: "Mouse",
+        451: "Barge",
+        452: "Coffee",
+        453: "Snowboard",
+        454: "Common fig",
+        455: "Salad",
+        456: "Marine invertebrates",
+        457: "Umbrella",
+        458: "Kangaroo",
+        459: "Human arm",
+        460: "Measuring cup",
+        461: "Snail",
+        462: "Loveseat",
+        463: "Suit",
+        464: "Teapot",
+        465: "Bottle",
+        466: "Alpaca",
+        467: "Kettle",
+        468: "Trousers",
+        469: "Popcorn",
+        470: "Centipede",
+        471: "Spider",
+        472: "Sparrow",
+        473: "Plate",
+        474: "Bagel",
+        475: "Personal care",
+        476: "Apple",
+        477: "Brassiere",
+        478: "Bathroom cabinet",
+        479: "studio couch",
+        480: "Computer keyboard",
+        481: "Table tennis racket",
+        482: "Sushi",
+        483: "Cabinetry",
+        484: "Street light",
+        485: "Towel",
+        486: "Nightstand",
+        487: "Rabbit",
+        488: "Dolphin",
+        489: "Dog",
+        490: "Jug",
+        491: "Wok",
+        492: "Fire hydrant",
+        493: "Human eye",
+        494: "Skyscraper",
+        495: "Backpack",
+        496: "Potato",
+        497: "Paper towel",
+        498: "Lifejacket",
+        499: "Bicycle wheel",
+        500: "Toilet",
+    }
+    return clsid2catid, catid2name
+def _visdrone_category():
+    clsid2catid = {i: i for i in range(10)}
+    catid2name = {
+        0: 'pedestrian',
+        1: 'people',
+        2: 'bicycle',
+        3: 'car',
+        4: 'van',
+        5: 'truck',
+        6: 'tricycle',
+        7: 'awning-tricycle',
+        8: 'bus',
+        9: 'motor'
+    }
+    return clsid2catid, catid2name
--- a/tutorials/pp-series/HRNet-Keypoint/lib/dataset/dataset.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/dataset/dataset.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import numpy as np
+try:
+    from collections.abc import Sequence
+except Exception:
+    from collections import Sequence
+from paddle.io import Dataset
+import copy
+from lib.utils.workspace import register, serializable
+from lib.utils.download import get_dataset_path
+__all__ = ['DetDataset', 'ImageFolder']
+@serializable
+class DetDataset(Dataset):
+    """
+    Load detection dataset.
+    Args:
+        dataset_dir (str): root directory for dataset.
+        image_dir (str): directory for images.
+        anno_path (str): annotation file path.
+        data_fields (list): key name of data dictionary, at least have 'image'.
+        sample_num (int): number of samples to load, -1 means all.
+        use_default_label (bool): whether to load default label list.
+    """
+    def __init__(self,
+                 dataset_dir=None,
+                 image_dir=None,
+                 anno_path=None,
+                 data_fields=['image'],
+                 sample_num=-1,
+                 use_default_label=None,
+                 **kwargs):
+        super(DetDataset, self).__init__()
+        self.dataset_dir = dataset_dir if dataset_dir is not None else ''
+        self.anno_path = anno_path
+        self.image_dir = image_dir if image_dir is not None else ''
+        self.data_fields = data_fields
+        self.sample_num = sample_num
+        self.use_default_label = use_default_label
+        self._epoch = 0
+        self._curr_iter = 0
+    def __len__(self, ):
+        return len(self.roidbs)
+    def __getitem__(self, idx):
+        # data batch
+        roidb = copy.deepcopy(self.roidbs[idx])
+        if self.mixup_epoch == 0 or self._epoch < self.mixup_epoch:
+            n = len(self.roidbs)
+            idx = np.random.randint(n)
+            roidb = [roidb, copy.deepcopy(self.roidbs[idx])]
+        elif self.cutmix_epoch == 0 or self._epoch < self.cutmix_epoch:
+            n = len(self.roidbs)
+            idx = np.random.randint(n)
+            roidb = [roidb, copy.deepcopy(self.roidbs[idx])]
+        elif self.mosaic_epoch == 0 or self._epoch < self.mosaic_epoch:
+            n = len(self.roidbs)
+            roidb = [roidb, ] + [
+                copy.deepcopy(self.roidbs[np.random.randint(n)])
+                for _ in range(3)
+            ]
+        if isinstance(roidb, Sequence):
+            for r in roidb:
+                r['curr_iter'] = self._curr_iter
+        else:
+            roidb['curr_iter'] = self._curr_iter
+        self._curr_iter += 1
+        return self.transform(roidb)
+    def check_or_download_dataset(self):
+        self.dataset_dir = get_dataset_path(self.dataset_dir, self.anno_path,
+                                            self.image_dir)
+    def set_kwargs(self, **kwargs):
+        self.mixup_epoch = kwargs.get('mixup_epoch', -1)
+        self.cutmix_epoch = kwargs.get('cutmix_epoch', -1)
+        self.mosaic_epoch = kwargs.get('mosaic_epoch', -1)
+    def set_transform(self, transform):
+        self.transform = transform
+    def set_epoch(self, epoch_id):
+        self._epoch = epoch_id
+    def parse_dataset(self, ):
+        raise NotImplementedError(
+            "Need to implement parse_dataset method of Dataset")
+    def get_anno(self):
+        if self.anno_path is None:
+            return
+        return os.path.join(self.dataset_dir, self.anno_path)
+def _is_valid_file(f, extensions=('.jpg', '.jpeg', '.png', '.bmp')):
+    return f.lower().endswith(extensions)
+def _make_dataset(dir):
+    dir = os.path.expanduser(dir)
+    if not os.path.isdir(dir):
+        raise ('{} should be a dir'.format(dir))
+    images = []
+    for root, _, fnames in sorted(os.walk(dir, followlinks=True)):
+        for fname in sorted(fnames):
+            path = os.path.join(root, fname)
+            if _is_valid_file(path):
+                images.append(path)
+    return images
+@register
+@serializable
+class ImageFolder(DetDataset):
+    def __init__(self,
+                 dataset_dir=None,
+                 image_dir=None,
+                 anno_path=None,
+                 sample_num=-1,
+                 use_default_label=None,
+                 **kwargs):
+        super(ImageFolder, self).__init__(
+            dataset_dir,
+            image_dir,
+            anno_path,
+            sample_num=sample_num,
+            use_default_label=use_default_label)
+        self._imid2path = {}
+        self.roidbs = None
+        self.sample_num = sample_num
+    def check_or_download_dataset(self):
+        if self.dataset_dir:
+            # NOTE: ImageFolder is only used for prediction, in
+            #       infer mode, image_dir is set by set_images
+            #       so we only check anno_path here
+            self.dataset_dir = get_dataset_path(self.dataset_dir,
+                                                self.anno_path, None)
+    def parse_dataset(self, ):
+        if not self.roidbs:
+            self.roidbs = self._load_images()
+    def _parse(self):
+        image_dir = self.image_dir
+        if not isinstance(image_dir, Sequence):
+            image_dir = [image_dir]
+        images = []
+        for im_dir in image_dir:
+            if os.path.isdir(im_dir):
+                im_dir = os.path.join(self.dataset_dir, im_dir)
+                images.extend(_make_dataset(im_dir))
+            elif os.path.isfile(im_dir) and _is_valid_file(im_dir):
+                images.append(im_dir)
+        return images
+    def _load_images(self):
+        images = self._parse()
+        ct = 0
+        records = []
+        for image in images:
+            assert image != '' and os.path.isfile(image), \
+                    "Image {} not found".format(image)
+            if self.sample_num > 0 and ct >= self.sample_num:
+                break
+            rec = {'im_id': np.array([ct]), 'im_file': image}
+            self._imid2path[ct] = image
+            ct += 1
+            records.append(rec)
+        assert len(records) > 0, "No image file found"
+        return records
+    def get_imid2path(self):
+        return self._imid2path
+    def set_images(self, images):
+        self.image_dir = images
+        self.roidbs = self._load_images()
--- a/tutorials/pp-series/HRNet-Keypoint/lib/dataset/keypoint_coco.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/dataset/keypoint_coco.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import cv2
+import numpy as np
+import json
+import copy
+import pycocotools
+from pycocotools.coco import COCO
+from .dataset import DetDataset
+from lib.utils.workspace import register, serializable
+__all__ = ['KeypointTopDownBaseDataset', 'KeypointTopDownCocoDataset']
+@serializable
+class KeypointTopDownBaseDataset(DetDataset):
+    """Base class for top_down datasets.
+    All datasets should subclass it.
+    All subclasses should overwrite:
+        Methods:`_get_db`
+    Args:
+        dataset_dir (str): Root path to the dataset.
+        image_dir (str): Path to a directory where images are held.
+        anno_path (str): Relative path to the annotation file.
+        num_joints (int): keypoint numbers
+        transform (composed(operators)): A sequence of data transforms.
+    """
+    def __init__(self,
+                 dataset_dir,
+                 image_dir,
+                 anno_path,
+                 num_joints,
+                 transform=[]):
+        super().__init__(dataset_dir, image_dir, anno_path)
+        self.image_info = {}
+        self.ann_info = {}
+        self.img_prefix = os.path.join(dataset_dir, image_dir)
+        self.transform = transform
+        self.ann_info['num_joints'] = num_joints
+        self.db = []
+    def __len__(self):
+        """Get dataset length."""
+        return len(self.db)
+    def _get_db(self):
+        """Get a sample"""
+        raise NotImplementedError
+    def __getitem__(self, idx):
+        """Prepare sample for training given the index."""
+        records = copy.deepcopy(self.db[idx])
+        records['image'] = cv2.imread(records['image_file'], cv2.IMREAD_COLOR |
+                                      cv2.IMREAD_IGNORE_ORIENTATION)
+        records['image'] = cv2.cvtColor(records['image'], cv2.COLOR_BGR2RGB)
+        records['score'] = records['score'] if 'score' in records else 1
+        records = self.transform(records)
+        # print('records', records)
+        return records
+@register
+@serializable
+class KeypointTopDownCocoDataset(KeypointTopDownBaseDataset):
+    """COCO dataset for top-down pose estimation. Adapted from
+        https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
+        Copyright (c) Microsoft, under the MIT License.
+    The dataset loads raw features and apply specified transforms
+    to return a dict containing the image tensors and other information.
+    COCO keypoint indexes:
+        0: 'nose',
+        1: 'left_eye',
+        2: 'right_eye',
+        3: 'left_ear',
+        4: 'right_ear',
+        5: 'left_shoulder',
+        6: 'right_shoulder',
+        7: 'left_elbow',
+        8: 'right_elbow',
+        9: 'left_wrist',
+        10: 'right_wrist',
+        11: 'left_hip',
+        12: 'right_hip',
+        13: 'left_knee',
+        14: 'right_knee',
+        15: 'left_ankle',
+        16: 'right_ankle'
+    Args:
+        dataset_dir (str): Root path to the dataset.
+        image_dir (str): Path to a directory where images are held.
+        anno_path (str): Relative path to the annotation file.
+        num_joints (int): Keypoint numbers
+        trainsize (list):[w, h] Image target size
+        transform (composed(operators)): A sequence of data transforms.
+        bbox_file (str): Path to a detection bbox file
+            Default: None.
+        use_gt_bbox (bool): Whether to use ground truth bbox
+            Default: True.
+        pixel_std (int): The pixel std of the scale
+            Default: 200.
+        image_thre (float): The threshold to filter the detection box
+            Default: 0.0.
+    """
+    def __init__(self,
+                 dataset_dir,
+                 image_dir,
+                 anno_path,
+                 num_joints,
+                 trainsize,
+                 transform=[],
+                 bbox_file=None,
+                 use_gt_bbox=True,
+                 pixel_std=200,
+                 image_thre=0.0):
+        super().__init__(dataset_dir, image_dir, anno_path, num_joints,
+                         transform)
+        self.bbox_file = bbox_file
+        self.use_gt_bbox = use_gt_bbox
+        self.trainsize = trainsize
+        self.pixel_std = pixel_std
+        self.image_thre = image_thre
+        self.dataset_name = 'coco'
+    def parse_dataset(self):
+        if self.use_gt_bbox:
+            self.db = self._load_coco_keypoint_annotations()
+        else:
+            self.db = self._load_coco_person_detection_results()
+    def _load_coco_keypoint_annotations(self):
+        coco = COCO(self.get_anno())
+        img_ids = coco.getImgIds()
+        gt_db = []
+        for index in img_ids:
+            im_ann = coco.loadImgs(index)[0]
+            width = im_ann['width']
+            height = im_ann['height']
+            file_name = im_ann['file_name']
+            im_id = int(im_ann["id"])
+            annIds = coco.getAnnIds(imgIds=index, iscrowd=False)
+            objs = coco.loadAnns(annIds)
+            valid_objs = []
+            for obj in objs:
+                x, y, w, h = obj['bbox']
+                x1 = np.max((0, x))
+                y1 = np.max((0, y))
+                x2 = np.min((width - 1, x1 + np.max((0, w - 1))))
+                y2 = np.min((height - 1, y1 + np.max((0, h - 1))))
+                if obj['area'] > 0 and x2 >= x1 and y2 >= y1:
+                    obj['clean_bbox'] = [x1, y1, x2 - x1, y2 - y1]
+                    valid_objs.append(obj)
+            objs = valid_objs
+            rec = []
+            for obj in objs:
+                if max(obj['keypoints']) == 0:
+                    continue
+                joints = np.zeros(
+                    (self.ann_info['num_joints'], 3), dtype=np.float)
+                joints_vis = np.zeros(
+                    (self.ann_info['num_joints'], 3), dtype=np.float)
+                for ipt in range(self.ann_info['num_joints']):
+                    joints[ipt, 0] = obj['keypoints'][ipt * 3 + 0]
+                    joints[ipt, 1] = obj['keypoints'][ipt * 3 + 1]
+                    joints[ipt, 2] = 0
+                    t_vis = obj['keypoints'][ipt * 3 + 2]
+                    if t_vis > 1:
+                        t_vis = 1
+                    joints_vis[ipt, 0] = t_vis
+                    joints_vis[ipt, 1] = t_vis
+                    joints_vis[ipt, 2] = 0
+                center, scale = self._box2cs(obj['clean_bbox'][:4])
+                rec.append({
+                    'image_file': os.path.join(self.img_prefix, file_name),
+                    'center': center,
+                    'scale': scale,
+                    'joints': joints,
+                    'joints_vis': joints_vis,
+                    'im_id': im_id,
+                })
+            gt_db.extend(rec)
+        return gt_db
+    def _box2cs(self, box):
+        x, y, w, h = box[:4]
+        center = np.zeros((2), dtype=np.float32)
+        center[0] = x + w * 0.5
+        center[1] = y + h * 0.5
+        aspect_ratio = self.trainsize[0] * 1.0 / self.trainsize[1]
+        if w > aspect_ratio * h:
+            h = w * 1.0 / aspect_ratio
+        elif w < aspect_ratio * h:
+            w = h * aspect_ratio
+        scale = np.array(
+            [w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std],
+            dtype=np.float32)
+        if center[0] != -1:
+            scale = scale * 1.25
+        return center, scale
+    def _load_coco_person_detection_results(self):
+        all_boxes = None
+        bbox_file_path = os.path.join(self.dataset_dir, self.bbox_file)
+        with open(bbox_file_path, 'r') as f:
+            all_boxes = json.load(f)
+        if not all_boxes:
+            print('=> Load %s fail!' % bbox_file_path)
+            return None
+        kpt_db = []
+        for n_img in range(0, len(all_boxes)):
+            det_res = all_boxes[n_img]
+            if det_res['category_id'] != 1:
+                continue
+            file_name = det_res[
+                'filename'] if 'filename' in det_res else '%012d.jpg' % det_res[
+                    'image_id']
+            img_name = os.path.join(self.img_prefix, file_name)
+            box = det_res['bbox']
+            score = det_res['score']
+            im_id = int(det_res['image_id'])
+            if score < self.image_thre:
+                continue
+            center, scale = self._box2cs(box)
+            joints = np.zeros((self.ann_info['num_joints'], 3), dtype=np.float)
+            joints_vis = np.ones(
+                (self.ann_info['num_joints'], 3), dtype=np.float)
+            kpt_db.append({
+                'image_file': img_name,
+                'im_id': im_id,
+                'center': center,
+                'scale': scale,
+                'score': score,
+                'joints': joints,
+                'joints_vis': joints_vis,
+            })
+        return kpt_db
--- a/tutorials/pp-series/HRNet-Keypoint/lib/dataset/reader.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/dataset/reader.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import traceback
+import six
+import sys
+if sys.version_info >= (3, 0):
+    pass
+else:
+    pass
+import numpy as np
+from paddle.io import DataLoader, DistributedBatchSampler
+from paddle.fluid.dataloader.collate import default_collate_fn
+from lib.utils.workspace import register
+from . import transform
+from lib.utils.logger import setup_logger
+logger = setup_logger('reader')
+MAIN_PID = os.getpid()
+__all__ = [
+    'Compose', 'BatchCompose', 'BaseDataLoader', 'TrainReader', 'EvalReader',
+    'TestReader'
+]
+class Compose(object):
+    def __init__(self, transforms, num_classes=80):
+        self.transforms = transforms
+        self.transforms_cls = []
+        for t in self.transforms:
+            for k, v in t.items():
+                op_cls = getattr(transform, k)
+                f = op_cls(**v)
+                if hasattr(f, 'num_classes'):
+                    f.num_classes = num_classes
+                self.transforms_cls.append(f)
+    def __call__(self, data):
+        for f in self.transforms_cls:
+            try:
+                data = f(data)
+            except Exception as e:
+                stack_info = traceback.format_exc()
+                logger.warning("fail to map sample transform [{}] "
+                               "with error: {} and stack:\n{}".format(
+                                   f, e, str(stack_info)))
+                raise e
+        return data
+class BatchCompose(Compose):
+    def __init__(self, transforms, num_classes=80, collate_batch=True):
+        super(BatchCompose, self).__init__(transforms, num_classes)
+        self.collate_batch = collate_batch
+    def __call__(self, data):
+        for f in self.transforms_cls:
+            try:
+                data = f(data)
+            except Exception as e:
+                stack_info = traceback.format_exc()
+                logger.warning("fail to map batch transform [{}] "
+                               "with error: {} and stack:\n{}".format(
+                                   f, e, str(stack_info)))
+                raise e
+        # remove keys which is not needed by model
+        extra_key = ['h', 'w', 'flipped']
+        for k in extra_key:
+            for sample in data:
+                if k in sample:
+                    sample.pop(k)
+        # batch data, if user-define batch function needed
+        # use user-defined here
+        if self.collate_batch:
+            batch_data = default_collate_fn(data)
+        else:
+            batch_data = {}
+            for k in data[0].keys():
+                tmp_data = []
+                for i in range(len(data)):
+                    tmp_data.append(data[i][k])
+                if not 'gt_' in k and not 'is_crowd' in k and not 'difficult' in k:
+                    tmp_data = np.stack(tmp_data, axis=0)
+                batch_data[k] = tmp_data
+        return batch_data
+SIZE_UNIT = ['K', 'M', 'G', 'T']
+SHM_QUERY_CMD = 'df -h'
+SHM_KEY = 'shm'
+SHM_DEFAULT_MOUNT = '/dev/shm'
+def _parse_size_in_M(size_str):
+    num, unit = size_str[:-1], size_str[-1]
+    assert unit in SIZE_UNIT, \
+            "unknown shm size unit {}".format(unit)
+    return float(num) * \
+            (1024 ** (SIZE_UNIT.index(unit) - 1))
+def _get_shared_memory_size_in_M():
+    try:
+        df_infos = os.popen(SHM_QUERY_CMD).readlines()
+    except:
+        return None
+    else:
+        shm_infos = []
+        for df_info in df_infos:
+            info = df_info.strip()
+            if info.find(SHM_KEY) >= 0:
+                shm_infos.append(info.split())
+        if len(shm_infos) == 0:
+            return None
+        elif len(shm_infos) == 1:
+            return _parse_size_in_M(shm_infos[0][3])
+        else:
+            default_mount_infos = [
+                si for si in shm_infos if si[-1] == SHM_DEFAULT_MOUNT
+            ]
+            if default_mount_infos:
+                return _parse_size_in_M(default_mount_infos[0][3])
+            else:
+                return max([_parse_size_in_M(si[3]) for si in shm_infos])
+class BaseDataLoader(object):
+    """
+    Base DataLoader implementation for detection models
+    Args:
+        sample_transforms (list): a list of transforms to perform
+                                  on each sample
+        batch_transforms (list): a list of transforms to perform
+                                 on batch
+        batch_size (int): batch size for batch collating, default 1.
+        shuffle (bool): whether to shuffle samples
+        drop_last (bool): whether to drop the last incomplete,
+                          default False
+        num_classes (int): class number of dataset, default 80
+        collate_batch (bool): whether to collate batch in dataloader.
+            If set to True, the samples will collate into batch according
+            to the batch size. Otherwise, the ground-truth will not collate,
+            which is used when the number of ground-truch is different in 
+            samples.
+        use_shared_memory (bool): whether to use shared memory to
+                accelerate data loading, enable this only if you
+                are sure that the shared memory size of your OS
+                is larger than memory cost of input datas of model.
+                Note that shared memory will be automatically
+                disabled if the shared memory of OS is less than
+                1G, which is not enough for detection models.
+                Default False.
+    """
+    def __init__(self,
+                 sample_transforms=[],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=False,
+                 drop_last=False,
+                 num_classes=80,
+                 collate_batch=True,
+                 use_shared_memory=False,
+                 **kwargs):
+        # sample transform
+        self._sample_transforms = Compose(
+            sample_transforms, num_classes=num_classes)
+        # batch transfrom 
+        self._batch_transforms = BatchCompose(batch_transforms, num_classes,
+                                              collate_batch)
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.drop_last = drop_last
+        self.use_shared_memory = use_shared_memory
+        self.kwargs = kwargs
+    def __call__(self,
+                 dataset,
+                 worker_num,
+                 batch_sampler=None,
+                 return_list=False):
+        self.dataset = dataset
+        self.dataset.check_or_download_dataset()
+        self.dataset.parse_dataset()
+        # get data
+        self.dataset.set_transform(self._sample_transforms)
+        # set kwargs
+        self.dataset.set_kwargs(**self.kwargs)
+        # batch sampler
+        if batch_sampler is None:
+            self._batch_sampler = DistributedBatchSampler(
+                self.dataset,
+                batch_size=self.batch_size,
+                shuffle=self.shuffle,
+                drop_last=self.drop_last)
+        else:
+            self._batch_sampler = batch_sampler
+        # DataLoader do not start sub-process in Windows and Mac
+        # system, do not need to use shared memory
+        use_shared_memory = self.use_shared_memory and \
+                            sys.platform not in ['win32', 'darwin']
+        # check whether shared memory size is bigger than 1G(1024M)
+        if use_shared_memory:
+            shm_size = _get_shared_memory_size_in_M()
+            if shm_size is not None and shm_size < 1024.:
+                logger.warning("Shared memory size is less than 1G, "
+                               "disable shared_memory in DataLoader")
+                use_shared_memory = False
+        self.dataloader = DataLoader(
+            dataset=self.dataset,
+            batch_sampler=self._batch_sampler,
+            collate_fn=self._batch_transforms,
+            num_workers=worker_num,
+            return_list=return_list,
+            use_shared_memory=use_shared_memory)
+        self.loader = iter(self.dataloader)
+        return self
+    def __len__(self):
+        return len(self._batch_sampler)
+    def __iter__(self):
+        return self
+    def __next__(self):
+        try:
+            return next(self.loader)
+        except StopIteration:
+            self.loader = iter(self.dataloader)
+            six.reraise(*sys.exc_info())
+    def next(self):
+        # python2 compatibility
+        return self.__next__()
+@register
+class TrainReader(BaseDataLoader):
+    __shared__ = ['num_classes']
+    def __init__(self,
+                 sample_transforms=[],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=True,
+                 drop_last=True,
+                 num_classes=80,
+                 collate_batch=True,
+                 **kwargs):
+        super(TrainReader, self).__init__(sample_transforms, batch_transforms,
+                                          batch_size, shuffle, drop_last,
+                                          num_classes, collate_batch, **kwargs)
+@register
+class EvalReader(BaseDataLoader):
+    __shared__ = ['num_classes']
+    def __init__(self,
+                 sample_transforms=[],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=False,
+                 drop_last=True,
+                 num_classes=80,
+                 **kwargs):
+        super(EvalReader, self).__init__(sample_transforms, batch_transforms,
+                                         batch_size, shuffle, drop_last,
+                                         num_classes, **kwargs)
+@register
+class TestReader(BaseDataLoader):
+    __shared__ = ['num_classes']
+    def __init__(self,
+                 sample_transforms=[],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=False,
+                 drop_last=False,
+                 num_classes=80,
+                 **kwargs):
+        super(TestReader, self).__init__(sample_transforms, batch_transforms,
+                                         batch_size, shuffle, drop_last,
+                                         num_classes, **kwargs)
--- a/tutorials/pp-series/HRNet-Keypoint/lib/dataset/transform/__init__.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/dataset/transform/__init__.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import operators
+from . import keypoint_operators
+from .operators import *
+from .keypoint_operators import *
+__all__ = []
+__all__ += registered_ops
+__all__ += keypoint_operators.__all__
--- a/tutorials/pp-series/HRNet-Keypoint/lib/dataset/transform/autoaugment_utils.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/dataset/transform/autoaugment_utils.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Reference: 
+#   https://github.com/tensorflow/tpu/blob/master/models/official/detection/utils/autoaugment_utils.py
+"""AutoAugment util file."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import inspect
+import math
+from PIL import Image, ImageEnhance
+import numpy as np
+import cv2
+from copy import deepcopy
+# This signifies the max integer that the controller RNN could predict for the
+# augmentation scheme.
+_MAX_LEVEL = 10.
+# Represents an invalid bounding box that is used for checking for padding
+# lists of bounding box coordinates for a few augmentation operations
+_INVALID_BOX = [[-1.0, -1.0, -1.0, -1.0]]
+def policy_v0():
+    """Autoaugment policy that was used in AutoAugment Detection Paper."""
+    # Each tuple is an augmentation operation of the form
+    # (operation, probability, magnitude). Each element in policy is a
+    # sub-policy that will be applied sequentially on the image.
+    policy = [
+        [('TranslateX_BBox', 0.6, 4), ('Equalize', 0.8, 10)],
+        [('TranslateY_Only_BBoxes', 0.2, 2), ('Cutout', 0.8, 8)],
+        [('Sharpness', 0.0, 8), ('ShearX_BBox', 0.4, 0)],
+        [('ShearY_BBox', 1.0, 2), ('TranslateY_Only_BBoxes', 0.6, 6)],
+        [('Rotate_BBox', 0.6, 10), ('Color', 1.0, 6)],
+    ]
+    return policy
+def policy_v1():
+    """Autoaugment policy that was used in AutoAugment Detection Paper."""
+    # Each tuple is an augmentation operation of the form
+    # (operation, probability, magnitude). Each element in policy is a
+    # sub-policy that will be applied sequentially on the image.
+    policy = [
+        [('TranslateX_BBox', 0.6, 4), ('Equalize', 0.8, 10)],
+        [('TranslateY_Only_BBoxes', 0.2, 2), ('Cutout', 0.8, 8)],
+        [('Sharpness', 0.0, 8), ('ShearX_BBox', 0.4, 0)],
+        [('ShearY_BBox', 1.0, 2), ('TranslateY_Only_BBoxes', 0.6, 6)],
+        [('Rotate_BBox', 0.6, 10), ('Color', 1.0, 6)],
+        [('Color', 0.0, 0), ('ShearX_Only_BBoxes', 0.8, 4)],
+        [('ShearY_Only_BBoxes', 0.8, 2), ('Flip_Only_BBoxes', 0.0, 10)],
+        [('Equalize', 0.6, 10), ('TranslateX_BBox', 0.2, 2)],
+        [('Color', 1.0, 10), ('TranslateY_Only_BBoxes', 0.4, 6)],
+        [('Rotate_BBox', 0.8, 10), ('Contrast', 0.0, 10)],  # , 
+        [('Cutout', 0.2, 2), ('Brightness', 0.8, 10)],
+        [('Color', 1.0, 6), ('Equalize', 1.0, 2)],
+        [('Cutout_Only_BBoxes', 0.4, 6), ('TranslateY_Only_BBoxes', 0.8, 2)],
+        [('Color', 0.2, 8), ('Rotate_BBox', 0.8, 10)],
+        [('Sharpness', 0.4, 4), ('TranslateY_Only_BBoxes', 0.0, 4)],
+        [('Sharpness', 1.0, 4), ('SolarizeAdd', 0.4, 4)],
+        [('Rotate_BBox', 1.0, 8), ('Sharpness', 0.2, 8)],
+        [('ShearY_BBox', 0.6, 10), ('Equalize_Only_BBoxes', 0.6, 8)],
+        [('ShearX_BBox', 0.2, 6), ('TranslateY_Only_BBoxes', 0.2, 10)],
+        [('SolarizeAdd', 0.6, 8), ('Brightness', 0.8, 10)],
+    ]
+    return policy
+def policy_vtest():
+    """Autoaugment test policy for debugging."""
+    # Each tuple is an augmentation operation of the form
+    # (operation, probability, magnitude). Each element in policy is a
+    # sub-policy that will be applied sequentially on the image.
+    policy = [[('TranslateX_BBox', 1.0, 4), ('Equalize', 1.0, 10)], ]
+    return policy
+def policy_v2():
+    """Additional policy that performs well on object detection."""
+    # Each tuple is an augmentation operation of the form
+    # (operation, probability, magnitude). Each element in policy is a
+    # sub-policy that will be applied sequentially on the image.
+    policy = [
+        [('Color', 0.0, 6), ('Cutout', 0.6, 8), ('Sharpness', 0.4, 8)],
+        [('Rotate_BBox', 0.4, 8), ('Sharpness', 0.4, 2),
+         ('Rotate_BBox', 0.8, 10)],
+        [('TranslateY_BBox', 1.0, 8), ('AutoContrast', 0.8, 2)],
+        [('AutoContrast', 0.4, 6), ('ShearX_BBox', 0.8, 8),
+         ('Brightness', 0.0, 10)],
+        [('SolarizeAdd', 0.2, 6), ('Contrast', 0.0, 10),
+         ('AutoContrast', 0.6, 0)],
+        [('Cutout', 0.2, 0), ('Solarize', 0.8, 8), ('Color', 1.0, 4)],
+        [('TranslateY_BBox', 0.0, 4), ('Equalize', 0.6, 8),
+         ('Solarize', 0.0, 10)],
+        [('TranslateY_BBox', 0.2, 2), ('ShearY_BBox', 0.8, 8),
+         ('Rotate_BBox', 0.8, 8)],
+        [('Cutout', 0.8, 8), ('Brightness', 0.8, 8), ('Cutout', 0.2, 2)],
+        [('Color', 0.8, 4), ('TranslateY_BBox', 1.0, 6),
+         ('Rotate_BBox', 0.6, 6)],
+        [('Rotate_BBox', 0.6, 10), ('BBox_Cutout', 1.0, 4),
+         ('Cutout', 0.2, 8)],
+        [('Rotate_BBox', 0.0, 0), ('Equalize', 0.6, 6),
+         ('ShearY_BBox', 0.6, 8)],
+        [('Brightness', 0.8, 8), ('AutoContrast', 0.4, 2),
+         ('Brightness', 0.2, 2)],
+        [('TranslateY_BBox', 0.4, 8), ('Solarize', 0.4, 6),
+         ('SolarizeAdd', 0.2, 10)],
+        [('Contrast', 1.0, 10), ('SolarizeAdd', 0.2, 8), ('Equalize', 0.2, 4)],
+    ]
+    return policy
+def policy_v3():
+    """"Additional policy that performs well on object detection."""
+    # Each tuple is an augmentation operation of the form
+    # (operation, probability, magnitude). Each element in policy is a
+    # sub-policy that will be applied sequentially on the image.
+    policy = [
+        [('Posterize', 0.8, 2), ('TranslateX_BBox', 1.0, 8)],
+        [('BBox_Cutout', 0.2, 10), ('Sharpness', 1.0, 8)],
+        [('Rotate_BBox', 0.6, 8), ('Rotate_BBox', 0.8, 10)],
+        [('Equalize', 0.8, 10), ('AutoContrast', 0.2, 10)],
+        [('SolarizeAdd', 0.2, 2), ('TranslateY_BBox', 0.2, 8)],
+        [('Sharpness', 0.0, 2), ('Color', 0.4, 8)],
+        [('Equalize', 1.0, 8), ('TranslateY_BBox', 1.0, 8)],
+        [('Posterize', 0.6, 2), ('Rotate_BBox', 0.0, 10)],
+        [('AutoContrast', 0.6, 0), ('Rotate_BBox', 1.0, 6)],
+        [('Equalize', 0.0, 4), ('Cutout', 0.8, 10)],
+        [('Brightness', 1.0, 2), ('TranslateY_BBox', 1.0, 6)],
+        [('Contrast', 0.0, 2), ('ShearY_BBox', 0.8, 0)],
+        [('AutoContrast', 0.8, 10), ('Contrast', 0.2, 10)],
+        [('Rotate_BBox', 1.0, 10), ('Cutout', 1.0, 10)],
+        [('SolarizeAdd', 0.8, 6), ('Equalize', 0.8, 8)],
+    ]
+    return policy
+def _equal(val1, val2, eps=1e-8):
+    return abs(val1 - val2) <= eps
+def blend(image1, image2, factor):
+    """Blend image1 and image2 using 'factor'.
+    Factor can be above 0.0.    A value of 0.0 means only image1 is used.
+    A value of 1.0 means only image2 is used.    A value between 0.0 and
+    1.0 means we linearly interpolate the pixel values between the two
+    images.    A value greater than 1.0 "extrapolates" the difference
+    between the two pixel values, and we clip the results to values
+    between 0 and 255.
+    Args:
+        image1: An image Tensor of type uint8.
+        image2: An image Tensor of type uint8.
+        factor: A floating point value above 0.0.
+    Returns:
+        A blended image Tensor of type uint8.
+    """
+    if factor == 0.0:
+        return image1
+    if factor == 1.0:
+        return image2
+    image1 = image1.astype(np.float32)
+    image2 = image2.astype(np.float32)
+    difference = image2 - image1
+    scaled = factor * difference
+    # Do addition in float.
+    temp = image1 + scaled
+    # Interpolate
+    if factor > 0.0 and factor < 1.0:
+        # Interpolation means we always stay within 0 and 255.
+        return temp.astype(np.uint8)
+    # Extrapolate:
+    #
+    # We need to clip and then cast.
+    return np.clip(temp, a_min=0, a_max=255).astype(np.uint8)
+def cutout(image, pad_size, replace=0):
+    """Apply cutout (https://arxiv.org/abs/1708.04552) to image.
+    This operation applies a (2*pad_size x 2*pad_size) mask of zeros to
+    a random location within `img`. The pixel values filled in will be of the
+    value `replace`. The located where the mask will be applied is randomly
+    chosen uniformly over the whole image.
+    Args:
+        image: An image Tensor of type uint8.
+        pad_size: Specifies how big the zero mask that will be generated is that
+            is applied to the image. The mask will be of size
+            (2*pad_size x 2*pad_size).
+        replace: What pixel value to fill in the image in the area that has
+            the cutout mask applied to it.
+    Returns:
+        An image Tensor that is of type uint8.
+    Example:
+        img = cv2.imread( "/home/vis/gry/train/img_data/test.jpg", cv2.COLOR_BGR2RGB )
+        new_img = cutout(img, pad_size=50, replace=0)
+    """
+    image_height, image_width = image.shape[0], image.shape[1]
+    cutout_center_height = np.random.randint(low=0, high=image_height)
+    cutout_center_width = np.random.randint(low=0, high=image_width)
+    lower_pad = np.maximum(0, cutout_center_height - pad_size)
+    upper_pad = np.maximum(0, image_height - cutout_center_height - pad_size)
+    left_pad = np.maximum(0, cutout_center_width - pad_size)
+    right_pad = np.maximum(0, image_width - cutout_center_width - pad_size)
+    cutout_shape = [
+        image_height - (lower_pad + upper_pad),
+        image_width - (left_pad + right_pad)
+    ]
+    padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]]
+    mask = np.pad(np.zeros(
+        cutout_shape, dtype=image.dtype),
+                  padding_dims,
+                  'constant',
+                  constant_values=1)
+    mask = np.expand_dims(mask, -1)
+    mask = np.tile(mask, [1, 1, 3])
+    image = np.where(
+        np.equal(mask, 0),
+        np.ones_like(
+            image, dtype=image.dtype) * replace,
+        image)
+    return image.astype(np.uint8)
+def solarize(image, threshold=128):
+    # For each pixel in the image, select the pixel
+    # if the value is less than the threshold.
+    # Otherwise, subtract 255 from the pixel.
+    return np.where(image < threshold, image, 255 - image)
+def solarize_add(image, addition=0, threshold=128):
+    # For each pixel in the image less than threshold
+    # we add 'addition' amount to it and then clip the
+    # pixel value to be between 0 and 255. The value
+    # of 'addition' is between -128 and 128.
+    added_image = image.astype(np.int64) + addition
+    added_image = np.clip(added_image, a_min=0, a_max=255).astype(np.uint8)
+    return np.where(image < threshold, added_image, image)
+def color(image, factor):
+    """use cv2 to deal"""
+    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
+    degenerate = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
+    return blend(degenerate, image, factor)
+# refer to https://github.com/4uiiurz1/pytorch-auto-augment/blob/024b2eac4140c38df8342f09998e307234cafc80/auto_augment.py#L197
+def contrast(img, factor):
+    img = ImageEnhance.Contrast(Image.fromarray(img)).enhance(factor)
+    return np.array(img)
+def brightness(image, factor):
+    """Equivalent of PIL Brightness."""
+    degenerate = np.zeros_like(image)
+    return blend(degenerate, image, factor)
+def posterize(image, bits):
+    """Equivalent of PIL Posterize."""
+    shift = 8 - bits
+    return np.left_shift(np.right_shift(image, shift), shift)
+def rotate(image, degrees, replace):
+    """Rotates the image by degrees either clockwise or counterclockwise.
+    Args:
+        image: An image Tensor of type uint8.
+        degrees: Float, a scalar angle in degrees to rotate all images by. If
+            degrees is positive the image will be rotated clockwise otherwise it will
+            be rotated counterclockwise.
+        replace: A one or three value 1D tensor to fill empty pixels caused by
+            the rotate operation.
+    Returns:
+        The rotated version of image.
+    """
+    image = wrap(image)
+    image = Image.fromarray(image)
+    image = image.rotate(degrees)
+    image = np.array(image, dtype=np.uint8)
+    return unwrap(image, replace)
+def random_shift_bbox(image,
+                      bbox,
+                      pixel_scaling,
+                      replace,
+                      new_min_bbox_coords=None):
+    """Move the bbox and the image content to a slightly new random location.
+    Args:
+        image: 3D uint8 Tensor.
+        bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
+            of type float that represents the normalized coordinates between 0 and 1.
+            The potential values for the new min corner of the bbox will be between
+            [old_min - pixel_scaling * bbox_height/2,
+             old_min - pixel_scaling * bbox_height/2].
+        pixel_scaling: A float between 0 and 1 that specifies the pixel range
+            that the new bbox location will be sampled from.
+        replace: A one or three value 1D tensor to fill empty pixels.
+        new_min_bbox_coords: If not None, then this is a tuple that specifies the
+            (min_y, min_x) coordinates of the new bbox. Normally this is randomly
+            specified, but this allows it to be manually set. The coordinates are
+            the absolute coordinates between 0 and image height/width and are int32.
+    Returns:
+        The new image that will have the shifted bbox location in it along with
+        the new bbox that contains the new coordinates.
+    """
+    # Obtains image height and width and create helper clip functions.
+    image_height, image_width = image.shape[0], image.shape[1]
+    image_height = float(image_height)
+    image_width = float(image_width)
+    def clip_y(val):
+        return np.clip(val, a_min=0, a_max=image_height - 1).astype(np.int32)
+    def clip_x(val):
+        return np.clip(val, a_min=0, a_max=image_width - 1).astype(np.int32)
+    # Convert bbox to pixel coordinates.
+    min_y = int(image_height * bbox[0])
+    min_x = int(image_width * bbox[1])
+    max_y = clip_y(image_height * bbox[2])
+    max_x = clip_x(image_width * bbox[3])
+    bbox_height, bbox_width = (max_y - min_y + 1, max_x - min_x + 1)
+    image_height = int(image_height)
+    image_width = int(image_width)
+    # Select the new min/max bbox ranges that are used for sampling the
+    # new min x/y coordinates of the shifted bbox.
+    minval_y = clip_y(min_y - np.int32(pixel_scaling * float(bbox_height) /
+                                       2.0))
+    maxval_y = clip_y(min_y + np.int32(pixel_scaling * float(bbox_height) /
+                                       2.0))
+    minval_x = clip_x(min_x - np.int32(pixel_scaling * float(bbox_width) /
+                                       2.0))
+    maxval_x = clip_x(min_x + np.int32(pixel_scaling * float(bbox_width) /
+                                       2.0))
+    # Sample and calculate the new unclipped min/max coordinates of the new bbox.
+    if new_min_bbox_coords is None:
+        unclipped_new_min_y = np.random.randint(
+            low=minval_y, high=maxval_y, dtype=np.int32)
+        unclipped_new_min_x = np.random.randint(
+            low=minval_x, high=maxval_x, dtype=np.int32)
+    else:
+        unclipped_new_min_y, unclipped_new_min_x = (
+            clip_y(new_min_bbox_coords[0]), clip_x(new_min_bbox_coords[1]))
+    unclipped_new_max_y = unclipped_new_min_y + bbox_height - 1
+    unclipped_new_max_x = unclipped_new_min_x + bbox_width - 1
+    # Determine if any of the new bbox was shifted outside the current image.
+    # This is used for determining if any of the original bbox content should be
+    # discarded.
+    new_min_y, new_min_x, new_max_y, new_max_x = (
+        clip_y(unclipped_new_min_y), clip_x(unclipped_new_min_x),
+        clip_y(unclipped_new_max_y), clip_x(unclipped_new_max_x))
+    shifted_min_y = (new_min_y - unclipped_new_min_y) + min_y
+    shifted_max_y = max_y - (unclipped_new_max_y - new_max_y)
+    shifted_min_x = (new_min_x - unclipped_new_min_x) + min_x
+    shifted_max_x = max_x - (unclipped_new_max_x - new_max_x)
+    # Create the new bbox tensor by converting pixel integer values to floats.
+    new_bbox = np.stack([
+        float(new_min_y) / float(image_height), float(new_min_x) /
+        float(image_width), float(new_max_y) / float(image_height),
+        float(new_max_x) / float(image_width)
+    ])
+    # Copy the contents in the bbox and fill the old bbox location
+    # with gray (128).
+    bbox_content = image[shifted_min_y:shifted_max_y + 1, shifted_min_x:
+                         shifted_max_x + 1, :]
+    def mask_and_add_image(min_y_, min_x_, max_y_, max_x_, mask,
+                           content_tensor, image_):
+        """Applies mask to bbox region in image then adds content_tensor to it."""
+        mask = np.pad(mask, [[min_y_, (image_height - 1) - max_y_],
+                             [min_x_, (image_width - 1) - max_x_], [0, 0]],
+                      'constant',
+                      constant_values=1)
+        content_tensor = np.pad(content_tensor,
+                                [[min_y_, (image_height - 1) - max_y_],
+                                 [min_x_, (image_width - 1) - max_x_], [0, 0]],
+                                'constant',
+                                constant_values=0)
+        return image_ * mask + content_tensor
+    # Zero out original bbox location.
+    mask = np.zeros_like(image)[min_y:max_y + 1, min_x:max_x + 1, :]
+    grey_tensor = np.zeros_like(mask) + replace[0]
+    image = mask_and_add_image(min_y, min_x, max_y, max_x, mask, grey_tensor,
+                               image)
+    # Fill in bbox content to new bbox location.
+    mask = np.zeros_like(bbox_content)
+    image = mask_and_add_image(new_min_y, new_min_x, new_max_y, new_max_x,
+                               mask, bbox_content, image)
+    return image.astype(np.uint8), new_bbox
+def _clip_bbox(min_y, min_x, max_y, max_x):
+    """Clip bounding box coordinates between 0 and 1.
+    Args:
+        min_y: Normalized bbox coordinate of type float between 0 and 1.
+        min_x: Normalized bbox coordinate of type float between 0 and 1.
+        max_y: Normalized bbox coordinate of type float between 0 and 1.
+        max_x: Normalized bbox coordinate of type float between 0 and 1.
+    Returns:
+        Clipped coordinate values between 0 and 1.
+    """
+    min_y = np.clip(min_y, a_min=0, a_max=1.0)
+    min_x = np.clip(min_x, a_min=0, a_max=1.0)
+    max_y = np.clip(max_y, a_min=0, a_max=1.0)
+    max_x = np.clip(max_x, a_min=0, a_max=1.0)
+    return min_y, min_x, max_y, max_x
+def _check_bbox_area(min_y, min_x, max_y, max_x, delta=0.05):
+    """Adjusts bbox coordinates to make sure the area is > 0.
+    Args:
+        min_y: Normalized bbox coordinate of type float between 0 and 1.
+        min_x: Normalized bbox coordinate of type float between 0 and 1.
+        max_y: Normalized bbox coordinate of type float between 0 and 1.
+        max_x: Normalized bbox coordinate of type float between 0 and 1.
+        delta: Float, this is used to create a gap of size 2 * delta between
+            bbox min/max coordinates that are the same on the boundary.
+            This prevents the bbox from having an area of zero.
+    Returns:
+        Tuple of new bbox coordinates between 0 and 1 that will now have a
+        guaranteed area > 0.
+    """
+    height = max_y - min_y
+    width = max_x - min_x
+    def _adjust_bbox_boundaries(min_coord, max_coord):
+        # Make sure max is never 0 and min is never 1.
+        max_coord = np.maximum(max_coord, 0.0 + delta)
+        min_coord = np.minimum(min_coord, 1.0 - delta)
+        return min_coord, max_coord
+    if _equal(height, 0):
+        min_y, max_y = _adjust_bbox_boundaries(min_y, max_y)
+    if _equal(width, 0):
+        min_x, max_x = _adjust_bbox_boundaries(min_x, max_x)
+    return min_y, min_x, max_y, max_x
+def _scale_bbox_only_op_probability(prob):
+    """Reduce the probability of the bbox-only operation.
+    Probability is reduced so that we do not distort the content of too many
+    bounding boxes that are close to each other. The value of 3.0 was a chosen
+    hyper parameter when designing the autoaugment algorithm that we found
+    empirically to work well.
+    Args:
+        prob: Float that is the probability of applying the bbox-only operation.
+    Returns:
+        Reduced probability.
+    """
+    return prob / 3.0
+def _apply_bbox_augmentation(image, bbox, augmentation_func, *args):
+    """Applies augmentation_func to the subsection of image indicated by bbox.
+    Args:
+        image: 3D uint8 Tensor.
+        bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
+            of type float that represents the normalized coordinates between 0 and 1.
+        augmentation_func: Augmentation function that will be applied to the
+            subsection of image.
+        *args: Additional parameters that will be passed into augmentation_func
+            when it is called.
+    Returns:
+        A modified version of image, where the bbox location in the image will
+        have `ugmentation_func applied to it.
+    """
+    image_height = image.shape[0]
+    image_width = image.shape[1]
+    min_y = int(image_height * bbox[0])
+    min_x = int(image_width * bbox[1])
+    max_y = int(image_height * bbox[2])
+    max_x = int(image_width * bbox[3])
+    # Clip to be sure the max values do not fall out of range.
+    max_y = np.minimum(max_y, image_height - 1)
+    max_x = np.minimum(max_x, image_width - 1)
+    # Get the sub-tensor that is the image within the bounding box region.
+    bbox_content = image[min_y:max_y + 1, min_x:max_x + 1, :]
+    # Apply the augmentation function to the bbox portion of the image.
+    augmented_bbox_content = augmentation_func(bbox_content, *args)
+    # Pad the augmented_bbox_content and the mask to match the shape of original
+    # image.
+    augmented_bbox_content = np.pad(
+        augmented_bbox_content, [[min_y, (image_height - 1) - max_y],
+                                 [min_x, (image_width - 1) - max_x], [0, 0]],
+        'constant',
+        constant_values=1)
+    # Create a mask that will be used to zero out a part of the original image.
+    mask_tensor = np.zeros_like(bbox_content)
+    mask_tensor = np.pad(mask_tensor,
+                         [[min_y, (image_height - 1) - max_y],
+                          [min_x, (image_width - 1) - max_x], [0, 0]],
+                         'constant',
+                         constant_values=1)
+    # Replace the old bbox content with the new augmented content.
+    image = image * mask_tensor + augmented_bbox_content
+    return image.astype(np.uint8)
+def _concat_bbox(bbox, bboxes):
+    """Helper function that concates bbox to bboxes along the first dimension."""
+    # Note if all elements in bboxes are -1 (_INVALID_BOX), then this means
+    # we discard bboxes and start the bboxes Tensor with the current bbox.
+    bboxes_sum_check = np.sum(bboxes)
+    bbox = np.expand_dims(bbox, 0)
+    # This check will be true when it is an _INVALID_BOX
+    if _equal(bboxes_sum_check, -4):
+        bboxes = bbox
+    else:
+        bboxes = np.concatenate([bboxes, bbox], 0)
+    return bboxes
+def _apply_bbox_augmentation_wrapper(image, bbox, new_bboxes, prob,
+                                     augmentation_func, func_changes_bbox,
+                                     *args):
+    """Applies _apply_bbox_augmentation with probability prob.
+    Args:
+        image: 3D uint8 Tensor.
+        bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
+            of type float that represents the normalized coordinates between 0 and 1.
+        new_bboxes: 2D Tensor that is a list of the bboxes in the image after they
+            have been altered by aug_func. These will only be changed when
+            func_changes_bbox is set to true. Each bbox has 4 elements
+            (min_y, min_x, max_y, max_x) of type float that are the normalized
+            bbox coordinates between 0 and 1.
+        prob: Float that is the probability of applying _apply_bbox_augmentation.
+        augmentation_func: Augmentation function that will be applied to the
+            subsection of image.
+        func_changes_bbox: Boolean. Does augmentation_func return bbox in addition
+            to image.
+        *args: Additional parameters that will be passed into augmentation_func
+            when it is called.
+    Returns:
+        A tuple. Fist element is a modified version of image, where the bbox
+        location in the image will have augmentation_func applied to it if it is
+        chosen to be called with probability `prob`. The second element is a
+        Tensor of Tensors of length 4 that will contain the altered bbox after
+        applying augmentation_func.
+    """
+    should_apply_op = (np.random.rand() + prob >= 1)
+    if func_changes_bbox:
+        if should_apply_op:
+            augmented_image, bbox = augmentation_func(image, bbox, *args)
+        else:
+            augmented_image, bbox = (image, bbox)
+    else:
+        if should_apply_op:
+            augmented_image = _apply_bbox_augmentation(
+                image, bbox, augmentation_func, *args)
+        else:
+            augmented_image = image
+    new_bboxes = _concat_bbox(bbox, new_bboxes)
+    return augmented_image.astype(np.uint8), new_bboxes
+def _apply_multi_bbox_augmentation(image, bboxes, prob, aug_func,
+                                   func_changes_bbox, *args):
+    """Applies aug_func to the image for each bbox in bboxes.
+    Args:
+        image: 3D uint8 Tensor.
+        bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox
+            has 4 elements (min_y, min_x, max_y, max_x) of type float.
+        prob: Float that is the probability of applying aug_func to a specific
+            bounding box within the image.
+        aug_func: Augmentation function that will be applied to the
+            subsections of image indicated by the bbox values in bboxes.
+        func_changes_bbox: Boolean. Does augmentation_func return bbox in addition
+            to image.
+        *args: Additional parameters that will be passed into augmentation_func
+            when it is called.
+    Returns:
+        A modified version of image, where each bbox location in the image will
+        have augmentation_func applied to it if it is chosen to be called with
+        probability prob independently across all bboxes. Also the final
+        bboxes are returned that will be unchanged if func_changes_bbox is set to
+        false and if true, the new altered ones will be returned.
+    """
+    # Will keep track of the new altered bboxes after aug_func is repeatedly
+    # applied. The -1 values are a dummy value and this first Tensor will be
+    # removed upon appending the first real bbox.
+    new_bboxes = np.array(_INVALID_BOX)
+    # If the bboxes are empty, then just give it _INVALID_BOX. The result
+    # will be thrown away.
+    bboxes = np.array((_INVALID_BOX)) if bboxes.size == 0 else bboxes
+    assert bboxes.shape[1] == 4, "bboxes.shape[1] must be 4!!!!"
+    # pylint:disable=g-long-lambda
+    # pylint:disable=line-too-long
+    wrapped_aug_func = lambda _image, bbox, _new_bboxes: _apply_bbox_augmentation_wrapper(_image, bbox, _new_bboxes, prob, aug_func, func_changes_bbox, *args)
+    # pylint:enable=g-long-lambda
+    # pylint:enable=line-too-long
+    # Setup the while_loop.
+    num_bboxes = bboxes.shape[0]  # We loop until we go over all bboxes.
+    idx = 0  # Counter for the while loop.
+    # Conditional function when to end the loop once we go over all bboxes
+    # images_and_bboxes contain (_image, _new_bboxes)
+    def cond(_idx, _images_and_bboxes):
+        return _idx < num_bboxes
+    # Shuffle the bboxes so that the augmentation order is not deterministic if
+    # we are not changing the bboxes with aug_func.
+    # if not func_changes_bbox:
+    #     print(bboxes)
+    #     loop_bboxes = np.take(bboxes,np.random.permutation(bboxes.shape[0]),axis=0)
+    #     print(loop_bboxes)
+    # else:
+    #     loop_bboxes = bboxes
+    # we can not shuffle the bbox because it does not contain class information here
+    loop_bboxes = deepcopy(bboxes)
+    # Main function of while_loop where we repeatedly apply augmentation on the
+    # bboxes in the image.
+    # pylint:disable=g-long-lambda
+    body = lambda _idx, _images_and_bboxes: [
+            _idx + 1, wrapped_aug_func(_images_and_bboxes[0],
+                                         loop_bboxes[_idx],
+                                         _images_and_bboxes[1])]
+    while (cond(idx, (image, new_bboxes))):
+        idx, (image, new_bboxes) = body(idx, (image, new_bboxes))
+    # Either return the altered bboxes or the original ones depending on if
+    # we altered them in anyway.
+    if func_changes_bbox:
+        final_bboxes = new_bboxes
+    else:
+        final_bboxes = bboxes
+    return image, final_bboxes
+def _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob, aug_func,
+                                           func_changes_bbox, *args):
+    """Checks to be sure num bboxes > 0 before calling inner function."""
+    num_bboxes = len(bboxes)
+    new_image = deepcopy(image)
+    new_bboxes = deepcopy(bboxes)
+    if num_bboxes != 0:
+        new_image, new_bboxes = _apply_multi_bbox_augmentation(
+            new_image, new_bboxes, prob, aug_func, func_changes_bbox, *args)
+    return new_image, new_bboxes
+def rotate_only_bboxes(image, bboxes, prob, degrees, replace):
+    """Apply rotate to each bbox in the image with probability prob."""
+    func_changes_bbox = False
+    prob = _scale_bbox_only_op_probability(prob)
+    return _apply_multi_bbox_augmentation_wrapper(
+        image, bboxes, prob, rotate, func_changes_bbox, degrees, replace)
+def shear_x_only_bboxes(image, bboxes, prob, level, replace):
+    """Apply shear_x to each bbox in the image with probability prob."""
+    func_changes_bbox = False
+    prob = _scale_bbox_only_op_probability(prob)
+    return _apply_multi_bbox_augmentation_wrapper(
+        image, bboxes, prob, shear_x, func_changes_bbox, level, replace)
+def shear_y_only_bboxes(image, bboxes, prob, level, replace):
+    """Apply shear_y to each bbox in the image with probability prob."""
+    func_changes_bbox = False
+    prob = _scale_bbox_only_op_probability(prob)
+    return _apply_multi_bbox_augmentation_wrapper(
+        image, bboxes, prob, shear_y, func_changes_bbox, level, replace)
+def translate_x_only_bboxes(image, bboxes, prob, pixels, replace):
+    """Apply translate_x to each bbox in the image with probability prob."""
+    func_changes_bbox = False
+    prob = _scale_bbox_only_op_probability(prob)
+    return _apply_multi_bbox_augmentation_wrapper(
+        image, bboxes, prob, translate_x, func_changes_bbox, pixels, replace)
+def translate_y_only_bboxes(image, bboxes, prob, pixels, replace):
+    """Apply translate_y to each bbox in the image with probability prob."""
+    func_changes_bbox = False
+    prob = _scale_bbox_only_op_probability(prob)
+    return _apply_multi_bbox_augmentation_wrapper(
+        image, bboxes, prob, translate_y, func_changes_bbox, pixels, replace)
+def flip_only_bboxes(image, bboxes, prob):
+    """Apply flip_lr to each bbox in the image with probability prob."""
+    func_changes_bbox = False
+    prob = _scale_bbox_only_op_probability(prob)
+    return _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob,
+                                                  np.fliplr, func_changes_bbox)
+def solarize_only_bboxes(image, bboxes, prob, threshold):
+    """Apply solarize to each bbox in the image with probability prob."""
+    func_changes_bbox = False
+    prob = _scale_bbox_only_op_probability(prob)
+    return _apply_multi_bbox_augmentation_wrapper(
+        image, bboxes, prob, solarize, func_changes_bbox, threshold)
+def equalize_only_bboxes(image, bboxes, prob):
+    """Apply equalize to each bbox in the image with probability prob."""
+    func_changes_bbox = False
+    prob = _scale_bbox_only_op_probability(prob)
+    return _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob,
+                                                  equalize, func_changes_bbox)
+def cutout_only_bboxes(image, bboxes, prob, pad_size, replace):
+    """Apply cutout to each bbox in the image with probability prob."""
+    func_changes_bbox = False
+    prob = _scale_bbox_only_op_probability(prob)
+    return _apply_multi_bbox_augmentation_wrapper(
+        image, bboxes, prob, cutout, func_changes_bbox, pad_size, replace)
+def _rotate_bbox(bbox, image_height, image_width, degrees):
+    """Rotates the bbox coordinated by degrees.
+    Args:
+        bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
+            of type float that represents the normalized coordinates between 0 and 1.
+        image_height: Int, height of the image.
+        image_width: Int, height of the image.
+        degrees: Float, a scalar angle in degrees to rotate all images by. If
+            degrees is positive the image will be rotated clockwise otherwise it will
+            be rotated counterclockwise.
+    Returns:
+        A tensor of the same shape as bbox, but now with the rotated coordinates.
+    """
+    image_height, image_width = (float(image_height), float(image_width))
+    # Convert from degrees to radians.
+    degrees_to_radians = math.pi / 180.0
+    radians = degrees * degrees_to_radians
+    # Translate the bbox to the center of the image and turn the normalized 0-1
+    # coordinates to absolute pixel locations.
+    # Y coordinates are made negative as the y axis of images goes down with
+    # increasing pixel values, so we negate to make sure x axis and y axis points
+    # are in the traditionally positive direction.
+    min_y = -int(image_height * (bbox[0] - 0.5))
+    min_x = int(image_width * (bbox[1] - 0.5))
+    max_y = -int(image_height * (bbox[2] - 0.5))
+    max_x = int(image_width * (bbox[3] - 0.5))
+    coordinates = np.stack([[min_y, min_x], [min_y, max_x], [max_y, min_x],
+                            [max_y, max_x]]).astype(np.float32)
+    # Rotate the coordinates according to the rotation matrix clockwise if
+    # radians is positive, else negative
+    rotation_matrix = np.stack([[math.cos(radians), math.sin(radians)],
+                                [-math.sin(radians), math.cos(radians)]])
+    new_coords = np.matmul(rotation_matrix,
+                           np.transpose(coordinates)).astype(np.int32)
+    # Find min/max values and convert them back to normalized 0-1 floats.
+    min_y = -(float(np.max(new_coords[0, :])) / image_height - 0.5)
+    min_x = float(np.min(new_coords[1, :])) / image_width + 0.5
+    max_y = -(float(np.min(new_coords[0, :])) / image_height - 0.5)
+    max_x = float(np.max(new_coords[1, :])) / image_width + 0.5
+    # Clip the bboxes to be sure the fall between [0, 1].
+    min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x)
+    min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x)
+    return np.stack([min_y, min_x, max_y, max_x])
+def rotate_with_bboxes(image, bboxes, degrees, replace):
+    # Rotate the image.
+    image = rotate(image, degrees, replace)
+    # Convert bbox coordinates to pixel values.
+    image_height, image_width = image.shape[:2]
+    # pylint:disable=g-long-lambda
+    wrapped_rotate_bbox = lambda bbox: _rotate_bbox(bbox, image_height, image_width, degrees)
+    # pylint:enable=g-long-lambda
+    new_bboxes = np.zeros_like(bboxes)
+    for idx in range(len(bboxes)):
+        new_bboxes[idx] = wrapped_rotate_bbox(bboxes[idx])
+    return image, new_bboxes
+def translate_x(image, pixels, replace):
+    """Equivalent of PIL Translate in X dimension."""
+    image = Image.fromarray(wrap(image))
+    image = image.transform(image.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0))
+    return unwrap(np.array(image), replace)
+def translate_y(image, pixels, replace):
+    """Equivalent of PIL Translate in Y dimension."""
+    image = Image.fromarray(wrap(image))
+    image = image.transform(image.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels))
+    return unwrap(np.array(image), replace)
+def _shift_bbox(bbox, image_height, image_width, pixels, shift_horizontal):
+    """Shifts the bbox coordinates by pixels.
+    Args:
+        bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
+            of type float that represents the normalized coordinates between 0 and 1.
+        image_height: Int, height of the image.
+        image_width: Int, width of the image.
+        pixels: An int. How many pixels to shift the bbox.
+        shift_horizontal: Boolean. If true then shift in X dimension else shift in
+            Y dimension.
+    Returns:
+        A tensor of the same shape as bbox, but now with the shifted coordinates.
+    """
+    pixels = int(pixels)
+    # Convert bbox to integer pixel locations.
+    min_y = int(float(image_height) * bbox[0])
+    min_x = int(float(image_width) * bbox[1])
+    max_y = int(float(image_height) * bbox[2])
+    max_x = int(float(image_width) * bbox[3])
+    if shift_horizontal:
+        min_x = np.maximum(0, min_x - pixels)
+        max_x = np.minimum(image_width, max_x - pixels)
+    else:
+        min_y = np.maximum(0, min_y - pixels)
+        max_y = np.minimum(image_height, max_y - pixels)
+    # Convert bbox back to floats.
+    min_y = float(min_y) / float(image_height)
+    min_x = float(min_x) / float(image_width)
+    max_y = float(max_y) / float(image_height)
+    max_x = float(max_x) / float(image_width)
+    # Clip the bboxes to be sure the fall between [0, 1].
+    min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x)
+    min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x)
+    return np.stack([min_y, min_x, max_y, max_x])
+def translate_bbox(image, bboxes, pixels, replace, shift_horizontal):
+    """Equivalent of PIL Translate in X/Y dimension that shifts image and bbox.
+    Args:
+        image: 3D uint8 Tensor.
+        bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox
+            has 4 elements (min_y, min_x, max_y, max_x) of type float with values
+            between [0, 1].
+        pixels: An int. How many pixels to shift the image and bboxes
+        replace: A one or three value 1D tensor to fill empty pixels.
+        shift_horizontal: Boolean. If true then shift in X dimension else shift in
+            Y dimension.
+    Returns:
+        A tuple containing a 3D uint8 Tensor that will be the result of translating
+        image by pixels. The second element of the tuple is bboxes, where now
+        the coordinates will be shifted to reflect the shifted image.
+    """
+    if shift_horizontal:
+        image = translate_x(image, pixels, replace)
+    else:
+        image = translate_y(image, pixels, replace)
+    # Convert bbox coordinates to pixel values.
+    image_height, image_width = image.shape[0], image.shape[1]
+    # pylint:disable=g-long-lambda
+    wrapped_shift_bbox = lambda bbox: _shift_bbox(bbox, image_height, image_width, pixels, shift_horizontal)
+    # pylint:enable=g-long-lambda
+    new_bboxes = deepcopy(bboxes)
+    num_bboxes = len(bboxes)
+    for idx in range(num_bboxes):
+        new_bboxes[idx] = wrapped_shift_bbox(bboxes[idx])
+    return image.astype(np.uint8), new_bboxes
+def shear_x(image, level, replace):
+    """Equivalent of PIL Shearing in X dimension."""
+    # Shear parallel to x axis is a projective transform
+    # with a matrix form of:
+    # [1    level
+    #    0    1].
+    image = Image.fromarray(wrap(image))
+    image = image.transform(image.size, Image.AFFINE, (1, level, 0, 0, 1, 0))
+    return unwrap(np.array(image), replace)
+def shear_y(image, level, replace):
+    """Equivalent of PIL Shearing in Y dimension."""
+    # Shear parallel to y axis is a projective transform
+    # with a matrix form of:
+    # [1    0
+    #    level    1].
+    image = Image.fromarray(wrap(image))
+    image = image.transform(image.size, Image.AFFINE, (1, 0, 0, level, 1, 0))
+    return unwrap(np.array(image), replace)
+def _shear_bbox(bbox, image_height, image_width, level, shear_horizontal):
+    """Shifts the bbox according to how the image was sheared.
+    Args:
+        bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
+            of type float that represents the normalized coordinates between 0 and 1.
+        image_height: Int, height of the image.
+        image_width: Int, height of the image.
+        level: Float. How much to shear the image.
+        shear_horizontal: If true then shear in X dimension else shear in
+            the Y dimension.
+    Returns:
+        A tensor of the same shape as bbox, but now with the shifted coordinates.
+    """
+    image_height, image_width = (float(image_height), float(image_width))
+    # Change bbox coordinates to be pixels.
+    min_y = int(image_height * bbox[0])
+    min_x = int(image_width * bbox[1])
+    max_y = int(image_height * bbox[2])
+    max_x = int(image_width * bbox[3])
+    coordinates = np.stack(
+        [[min_y, min_x], [min_y, max_x], [max_y, min_x], [max_y, max_x]])
+    coordinates = coordinates.astype(np.float32)
+    # Shear the coordinates according to the translation matrix.
+    if shear_horizontal:
+        translation_matrix = np.stack([[1, 0], [-level, 1]])
+    else:
+        translation_matrix = np.stack([[1, -level], [0, 1]])
+    translation_matrix = translation_matrix.astype(np.float32)
+    new_coords = np.matmul(translation_matrix,
+                           np.transpose(coordinates)).astype(np.int32)
+    # Find min/max values and convert them back to floats.
+    min_y = float(np.min(new_coords[0, :])) / image_height
+    min_x = float(np.min(new_coords[1, :])) / image_width
+    max_y = float(np.max(new_coords[0, :])) / image_height
+    max_x = float(np.max(new_coords[1, :])) / image_width
+    # Clip the bboxes to be sure the fall between [0, 1].
+    min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x)
+    min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x)
+    return np.stack([min_y, min_x, max_y, max_x])
+def shear_with_bboxes(image, bboxes, level, replace, shear_horizontal):
+    """Applies Shear Transformation to the image and shifts the bboxes.
+    Args:
+        image: 3D uint8 Tensor.
+        bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox
+            has 4 elements (min_y, min_x, max_y, max_x) of type float with values
+            between [0, 1].
+        level: Float. How much to shear the image. This value will be between
+            -0.3 to 0.3.
+        replace: A one or three value 1D tensor to fill empty pixels.
+        shear_horizontal: Boolean. If true then shear in X dimension else shear in
+            the Y dimension.
+    Returns:
+        A tuple containing a 3D uint8 Tensor that will be the result of shearing
+        image by level. The second element of the tuple is bboxes, where now
+        the coordinates will be shifted to reflect the sheared image.
+    """
+    if shear_horizontal:
+        image = shear_x(image, level, replace)
+    else:
+        image = shear_y(image, level, replace)
+    # Convert bbox coordinates to pixel values.
+    image_height, image_width = image.shape[:2]
+    # pylint:disable=g-long-lambda
+    wrapped_shear_bbox = lambda bbox: _shear_bbox(bbox, image_height, image_width, level, shear_horizontal)
+    # pylint:enable=g-long-lambda
+    new_bboxes = deepcopy(bboxes)
+    num_bboxes = len(bboxes)
+    for idx in range(num_bboxes):
+        new_bboxes[idx] = wrapped_shear_bbox(bboxes[idx])
+    return image.astype(np.uint8), new_bboxes
+def autocontrast(image):
+    """Implements Autocontrast function from PIL.
+    Args:
+        image: A 3D uint8 tensor.
+    Returns:
+        The image after it has had autocontrast applied to it and will be of type
+        uint8.
+    """
+    def scale_channel(image):
+        """Scale the 2D image using the autocontrast rule."""
+        # A possibly cheaper version can be done using cumsum/unique_with_counts
+        # over the histogram values, rather than iterating over the entire image.
+        # to compute mins and maxes.
+        lo = float(np.min(image))
+        hi = float(np.max(image))
+        # Scale the image, making the lowest value 0 and the highest value 255.
+        def scale_values(im):
+            scale = 255.0 / (hi - lo)
+            offset = -lo * scale
+            im = im.astype(np.float32) * scale + offset
+            img = np.clip(im, a_min=0, a_max=255.0)
+            return im.astype(np.uint8)
+        result = scale_values(image) if hi > lo else image
+        return result
+    # Assumes RGB for now.    Scales each channel independently
+    # and then stacks the result.
+    s1 = scale_channel(image[:, :, 0])
+    s2 = scale_channel(image[:, :, 1])
+    s3 = scale_channel(image[:, :, 2])
+    image = np.stack([s1, s2, s3], 2)
+    return image
+def sharpness(image, factor):
+    """Implements Sharpness function from PIL."""
+    orig_image = image
+    image = image.astype(np.float32)
+    # Make image 4D for conv operation.
+    # SMOOTH PIL Kernel.
+    kernel = np.array(
+        [[1, 1, 1], [1, 5, 1], [1, 1, 1]], dtype=np.float32) / 13.
+    result = cv2.filter2D(image, -1, kernel).astype(np.uint8)
+    # Blend the final result.
+    return blend(result, orig_image, factor)
+def equalize(image):
+    """Implements Equalize function from PIL using."""
+    def scale_channel(im, c):
+        """Scale the data in the channel to implement equalize."""
+        im = im[:, :, c].astype(np.int32)
+        # Compute the histogram of the image channel.
+        histo, _ = np.histogram(im, range=[0, 255], bins=256)
+        # For the purposes of computing the step, filter out the nonzeros.
+        nonzero = np.where(np.not_equal(histo, 0))
+        nonzero_histo = np.reshape(np.take(histo, nonzero), [-1])
+        step = (np.sum(nonzero_histo) - nonzero_histo[-1]) // 255
+        def build_lut(histo, step):
+            # Compute the cumulative sum, shifting by step // 2
+            # and then normalization by step.
+            lut = (np.cumsum(histo) + (step // 2)) // step
+            # Shift lut, prepending with 0.
+            lut = np.concatenate([[0], lut[:-1]], 0)
+            # Clip the counts to be in range.    This is done
+            # in the C code for image.point.
+            return np.clip(lut, a_min=0, a_max=255).astype(np.uint8)
+        # If step is zero, return the original image.    Otherwise, build
+        # lut from the full histogram and step and then index from it.
+        if step == 0:
+            result = im
+        else:
+            result = np.take(build_lut(histo, step), im)
+        return result.astype(np.uint8)
+    # Assumes RGB for now.    Scales each channel independently
+    # and then stacks the result.
+    s1 = scale_channel(image, 0)
+    s2 = scale_channel(image, 1)
+    s3 = scale_channel(image, 2)
+    image = np.stack([s1, s2, s3], 2)
+    return image
+def wrap(image):
+    """Returns 'image' with an extra channel set to all 1s."""
+    shape = image.shape
+    extended_channel = 255 * np.ones([shape[0], shape[1], 1], image.dtype)
+    extended = np.concatenate([image, extended_channel], 2).astype(image.dtype)
+    return extended
+def unwrap(image, replace):
+    """Unwraps an image produced by wrap.
+    Where there is a 0 in the last channel for every spatial position,
+    the rest of the three channels in that spatial dimension are grayed
+    (set to 128).    Operations like translate and shear on a wrapped
+    Tensor will leave 0s in empty locations.    Some transformations look
+    at the intensity of values to do preprocessing, and we want these
+    empty pixels to assume the 'average' value, rather than pure black.
+    Args:
+        image: A 3D Image Tensor with 4 channels.
+        replace: A one or three value 1D tensor to fill empty pixels.
+    Returns:
+        image: A 3D image Tensor with 3 channels.
+    """
+    image_shape = image.shape
+    # Flatten the spatial dimensions.
+    flattened_image = np.reshape(image, [-1, image_shape[2]])
+    # Find all pixels where the last channel is zero.
+    alpha_channel = flattened_image[:, 3]
+    replace = np.concatenate([replace, np.ones([1], image.dtype)], 0)
+    # Where they are zero, fill them in with 'replace'.
+    alpha_channel = np.reshape(alpha_channel, (-1, 1))
+    alpha_channel = np.tile(alpha_channel, reps=(1, flattened_image.shape[1]))
+    flattened_image = np.where(
+        np.equal(alpha_channel, 0),
+        np.ones_like(
+            flattened_image, dtype=image.dtype) * replace,
+        flattened_image)
+    image = np.reshape(flattened_image, image_shape)
+    image = image[:, :, :3]
+    return image.astype(np.uint8)
+def _cutout_inside_bbox(image, bbox, pad_fraction):
+    """Generates cutout mask and the mean pixel value of the bbox.
+    First a location is randomly chosen within the image as the center where the
+    cutout mask will be applied. Note this can be towards the boundaries of the
+    image, so the full cutout mask may not be applied.
+    Args:
+        image: 3D uint8 Tensor.
+        bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
+            of type float that represents the normalized coordinates between 0 and 1.
+        pad_fraction: Float that specifies how large the cutout mask should be in
+            in reference to the size of the original bbox. If pad_fraction is 0.25,
+            then the cutout mask will be of shape
+            (0.25 * bbox height, 0.25 * bbox width).
+    Returns:
+        A tuple. Fist element is a tensor of the same shape as image where each
+        element is either a 1 or 0 that is used to determine where the image
+        will have cutout applied. The second element is the mean of the pixels
+        in the image where the bbox is located.
+        mask value: [0,1]
+    """
+    image_height, image_width = image.shape[0], image.shape[1]
+    # Transform from shape [1, 4] to [4].
+    bbox = np.squeeze(bbox)
+    min_y = int(float(image_height) * bbox[0])
+    min_x = int(float(image_width) * bbox[1])
+    max_y = int(float(image_height) * bbox[2])
+    max_x = int(float(image_width) * bbox[3])
+    # Calculate the mean pixel values in the bounding box, which will be used
+    # to fill the cutout region.
+    mean = np.mean(image[min_y:max_y + 1, min_x:max_x + 1], axis=(0, 1))
+    # Cutout mask will be size pad_size_heigh * 2 by pad_size_width * 2 if the
+    # region lies entirely within the bbox.
+    box_height = max_y - min_y + 1
+    box_width = max_x - min_x + 1
+    pad_size_height = int(pad_fraction * (box_height / 2))
+    pad_size_width = int(pad_fraction * (box_width / 2))
+    # Sample the center location in the image where the zero mask will be applied.
+    cutout_center_height = np.random.randint(min_y, max_y + 1, dtype=np.int32)
+    cutout_center_width = np.random.randint(min_x, max_x + 1, dtype=np.int32)
+    lower_pad = np.maximum(0, cutout_center_height - pad_size_height)
+    upper_pad = np.maximum(
+        0, image_height - cutout_center_height - pad_size_height)
+    left_pad = np.maximum(0, cutout_center_width - pad_size_width)
+    right_pad = np.maximum(0,
+                           image_width - cutout_center_width - pad_size_width)
+    cutout_shape = [
+        image_height - (lower_pad + upper_pad),
+        image_width - (left_pad + right_pad)
+    ]
+    padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]]
+    mask = np.pad(np.zeros(
+        cutout_shape, dtype=image.dtype),
+                  padding_dims,
+                  'constant',
+                  constant_values=1)
+    mask = np.expand_dims(mask, 2)
+    mask = np.tile(mask, [1, 1, 3])
+    return mask, mean
+def bbox_cutout(image, bboxes, pad_fraction, replace_with_mean):
+    """Applies cutout to the image according to bbox information.
+    This is a cutout variant that using bbox information to make more informed
+    decisions on where to place the cutout mask.
+    Args:
+        image: 3D uint8 Tensor.
+        bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox
+            has 4 elements (min_y, min_x, max_y, max_x) of type float with values
+            between [0, 1].
+        pad_fraction: Float that specifies how large the cutout mask should be in
+            in reference to the size of the original bbox. If pad_fraction is 0.25,
+            then the cutout mask will be of shape
+            (0.25 * bbox height, 0.25 * bbox width).
+        replace_with_mean: Boolean that specified what value should be filled in
+            where the cutout mask is applied. Since the incoming image will be of
+            uint8 and will not have had any mean normalization applied, by default
+            we set the value to be 128. If replace_with_mean is True then we find
+            the mean pixel values across the channel dimension and use those to fill
+            in where the cutout mask is applied.
+    Returns:
+        A tuple. First element is a tensor of the same shape as image that has
+        cutout applied to it. Second element is the bboxes that were passed in
+        that will be unchanged.
+    """
+    def apply_bbox_cutout(image, bboxes, pad_fraction):
+        """Applies cutout to a single bounding box within image."""
+        # Choose a single bounding box to apply cutout to.
+        random_index = np.random.randint(0, bboxes.shape[0], dtype=np.int32)
+        # Select the corresponding bbox and apply cutout.
+        chosen_bbox = np.take(bboxes, random_index, axis=0)
+        mask, mean = _cutout_inside_bbox(image, chosen_bbox, pad_fraction)
+        # When applying cutout we either set the pixel value to 128 or to the mean
+        # value inside the bbox.
+        replace = mean if replace_with_mean else [128] * 3
+        # Apply the cutout mask to the image. Where the mask is 0 we fill it with
+        # `replace`.
+        image = np.where(
+            np.equal(mask, 0),
+            np.ones_like(
+                image, dtype=image.dtype) * replace,
+            image).astype(image.dtype)
+        return image
+    # Check to see if there are boxes, if so then apply boxcutout.
+    if len(bboxes) != 0:
+        image = apply_bbox_cutout(image, bboxes, pad_fraction)
+    return image, bboxes
+NAME_TO_FUNC = {
+        'AutoContrast': autocontrast,
+        'Equalize': equalize,
+        'Posterize': posterize,
+        'Solarize': solarize,
+        'SolarizeAdd': solarize_add,
+        'Color': color,
+        'Contrast': contrast,
+        'Brightness': brightness,
+        'Sharpness': sharpness,
+        'Cutout': cutout,
+        'BBox_Cutout': bbox_cutout,
+        'Rotate_BBox': rotate_with_bboxes,
+        # pylint:disable=g-long-lambda
+        'TranslateX_BBox': lambda image, bboxes, pixels, replace: translate_bbox(
+                image, bboxes, pixels, replace, shift_horizontal=True),
+        'TranslateY_BBox': lambda image, bboxes, pixels, replace: translate_bbox(
+                image, bboxes, pixels, replace, shift_horizontal=False),
+        'ShearX_BBox': lambda image, bboxes, level, replace: shear_with_bboxes(
+                image, bboxes, level, replace, shear_horizontal=True),
+        'ShearY_BBox': lambda image, bboxes, level, replace: shear_with_bboxes(
+                image, bboxes, level, replace, shear_horizontal=False),
+        # pylint:enable=g-long-lambda
+        'Rotate_Only_BBoxes': rotate_only_bboxes,
+        'ShearX_Only_BBoxes': shear_x_only_bboxes,
+        'ShearY_Only_BBoxes': shear_y_only_bboxes,
+        'TranslateX_Only_BBoxes': translate_x_only_bboxes,
+        'TranslateY_Only_BBoxes': translate_y_only_bboxes,
+        'Flip_Only_BBoxes': flip_only_bboxes,
+        'Solarize_Only_BBoxes': solarize_only_bboxes,
+        'Equalize_Only_BBoxes': equalize_only_bboxes,
+        'Cutout_Only_BBoxes': cutout_only_bboxes,
+}
+def _randomly_negate_tensor(tensor):
+    """With 50% prob turn the tensor negative."""
+    should_flip = np.floor(np.random.rand() + 0.5) >= 1
+    final_tensor = tensor if should_flip else -tensor
+    return final_tensor
+def _rotate_level_to_arg(level):
+    level = (level / _MAX_LEVEL) * 30.
+    level = _randomly_negate_tensor(level)
+    return (level, )
+def _shrink_level_to_arg(level):
+    """Converts level to ratio by which we shrink the image content."""
+    if level == 0:
+        return (1.0, )  # if level is zero, do not shrink the image
+    # Maximum shrinking ratio is 2.9.
+    level = 2. / (_MAX_LEVEL / level) + 0.9
+    return (level, )
+def _enhance_level_to_arg(level):
+    return ((level / _MAX_LEVEL) * 1.8 + 0.1, )
+def _shear_level_to_arg(level):
+    level = (level / _MAX_LEVEL) * 0.3
+    # Flip level to negative with 50% chance.
+    level = _randomly_negate_tensor(level)
+    return (level, )
+def _translate_level_to_arg(level, translate_const):
+    level = (level / _MAX_LEVEL) * float(translate_const)
+    # Flip level to negative with 50% chance.
+    level = _randomly_negate_tensor(level)
+    return (level, )
+def _bbox_cutout_level_to_arg(level, hparams):
+    cutout_pad_fraction = (
+        level / _MAX_LEVEL) * 0.75  # hparams.cutout_max_pad_fraction
+    return (cutout_pad_fraction,
+            False)  # hparams.cutout_bbox_replace_with_mean
+def level_to_arg(hparams):
+    return {
+        'AutoContrast': lambda level: (),
+        'Equalize': lambda level: (),
+        'Posterize': lambda level: (int((level / _MAX_LEVEL) * 4), ),
+        'Solarize': lambda level: (int((level / _MAX_LEVEL) * 256), ),
+        'SolarizeAdd': lambda level: (int((level / _MAX_LEVEL) * 110), ),
+        'Color': _enhance_level_to_arg,
+        'Contrast': _enhance_level_to_arg,
+        'Brightness': _enhance_level_to_arg,
+        'Sharpness': _enhance_level_to_arg,
+        'Cutout':
+        lambda level: (int((level / _MAX_LEVEL) * 100), ),  # hparams.cutout_const=100
+        # pylint:disable=g-long-lambda
+        'BBox_Cutout': lambda level: _bbox_cutout_level_to_arg(level, hparams),
+        'TranslateX_BBox':
+        lambda level: _translate_level_to_arg(level, 250),  # hparams.translate_const=250
+        'TranslateY_BBox':
+        lambda level: _translate_level_to_arg(level, 250),  # hparams.translate_cons
+        # pylint:enable=g-long-lambda
+        'ShearX_BBox': _shear_level_to_arg,
+        'ShearY_BBox': _shear_level_to_arg,
+        'Rotate_BBox': _rotate_level_to_arg,
+        'Rotate_Only_BBoxes': _rotate_level_to_arg,
+        'ShearX_Only_BBoxes': _shear_level_to_arg,
+        'ShearY_Only_BBoxes': _shear_level_to_arg,
+        # pylint:disable=g-long-lambda
+        'TranslateX_Only_BBoxes':
+        lambda level: _translate_level_to_arg(level, 120),  # hparams.translate_bbox_const
+        'TranslateY_Only_BBoxes':
+        lambda level: _translate_level_to_arg(level, 120),  # hparams.translate_bbox_const
+        # pylint:enable=g-long-lambda
+        'Flip_Only_BBoxes': lambda level: (),
+        'Solarize_Only_BBoxes':
+        lambda level: (int((level / _MAX_LEVEL) * 256), ),
+        'Equalize_Only_BBoxes': lambda level: (),
+        # pylint:disable=g-long-lambda
+        'Cutout_Only_BBoxes':
+        lambda level: (int((level / _MAX_LEVEL) * 50), ),  # hparams.cutout_bbox_const
+        # pylint:enable=g-long-lambda
+    }
+def bbox_wrapper(func):
+    """Adds a bboxes function argument to func and returns unchanged bboxes."""
+    def wrapper(images, bboxes, *args, **kwargs):
+        return (func(images, *args, **kwargs), bboxes)
+    return wrapper
+def _parse_policy_info(name, prob, level, replace_value, augmentation_hparams):
+    """Return the function that corresponds to `name` and update `level` param."""
+    func = NAME_TO_FUNC[name]
+    args = level_to_arg(augmentation_hparams)[name](level)
+    # Check to see if prob is passed into function. This is used for operations
+    # where we alter bboxes independently.
+    # pytype:disable=wrong-arg-types
+    if 'prob' in inspect.getfullargspec(func)[0]:
+        args = tuple([prob] + list(args))
+    # pytype:enable=wrong-arg-types
+    # Add in replace arg if it is required for the function that is being called.
+    if 'replace' in inspect.getfullargspec(func)[0]:
+        # Make sure replace is the final argument
+        assert 'replace' == inspect.getfullargspec(func)[0][-1]
+        args = tuple(list(args) + [replace_value])
+    # Add bboxes as the second positional argument for the function if it does
+    # not already exist.
+    if 'bboxes' not in inspect.getfullargspec(func)[0]:
+        func = bbox_wrapper(func)
+    return (func, prob, args)
+def _apply_func_with_prob(func, image, args, prob, bboxes):
+    """Apply `func` to image w/ `args` as input with probability `prob`."""
+    assert isinstance(args, tuple)
+    assert 'bboxes' == inspect.getfullargspec(func)[0][1]
+    # If prob is a function argument, then this randomness is being handled
+    # inside the function, so make sure it is always called.
+    if 'prob' in inspect.getfullargspec(func)[0]:
+        prob = 1.0
+    # Apply the function with probability `prob`.
+    should_apply_op = np.floor(np.random.rand() + 0.5) >= 1
+    if should_apply_op:
+        augmented_image, augmented_bboxes = func(image, bboxes, *args)
+    else:
+        augmented_image, augmented_bboxes = (image, bboxes)
+    return augmented_image, augmented_bboxes
+def select_and_apply_random_policy(policies, image, bboxes):
+    """Select a random policy from `policies` and apply it to `image`."""
+    policy_to_select = np.random.randint(0, len(policies), dtype=np.int32)
+    # policy_to_select = 6 # for test
+    for (i, policy) in enumerate(policies):
+        if i == policy_to_select:
+            image, bboxes = policy(image, bboxes)
+    return (image, bboxes)
+def build_and_apply_nas_policy(policies, image, bboxes, augmentation_hparams):
+    """Build a policy from the given policies passed in and apply to image.
+    Args:
+        policies: list of lists of tuples in the form `(func, prob, level)`, `func`
+            is a string name of the augmentation function, `prob` is the probability
+            of applying the `func` operation, `level` is the input argument for
+            `func`.
+        image: numpy array that the resulting policy will be applied to.
+        bboxes:
+        augmentation_hparams: Hparams associated with the NAS learned policy.
+    Returns:
+        A version of image that now has data augmentation applied to it based on
+        the `policies` pass into the function. Additionally, returns bboxes if
+        a value for them is passed in that is not None
+    """
+    replace_value = [128, 128, 128]
+    # func is the string name of the augmentation function, prob is the
+    # probability of applying the operation and level is the parameter associated
+    # tf_policies are functions that take in an image and return an augmented
+    # image.
+    tf_policies = []
+    for policy in policies:
+        tf_policy = []
+        # Link string name to the correct python function and make sure the correct
+        # argument is passed into that function.
+        for policy_info in policy:
+            policy_info = list(
+                policy_info) + [replace_value, augmentation_hparams]
+            tf_policy.append(_parse_policy_info(*policy_info))
+        # Now build the tf policy that will apply the augmentation procedue
+        # on image.
+        def make_final_policy(tf_policy_):
+            def final_policy(image_, bboxes_):
+                for func, prob, args in tf_policy_:
+                    image_, bboxes_ = _apply_func_with_prob(func, image_, args,
+                                                            prob, bboxes_)
+                return image_, bboxes_
+            return final_policy
+        tf_policies.append(make_final_policy(tf_policy))
+    augmented_images, augmented_bboxes = select_and_apply_random_policy(
+        tf_policies, image, bboxes)
+    # If no bounding boxes were specified, then just return the images.
+    return (augmented_images, augmented_bboxes)
+# TODO(barretzoph): Add in ArXiv link once paper is out.
+def distort_image_with_autoaugment(image, bboxes, augmentation_name):
+    """Applies the AutoAugment policy to `image` and `bboxes`.
+    Args:
+        image: `Tensor` of shape [height, width, 3] representing an image.
+        bboxes: `Tensor` of shape [N, 4] representing ground truth boxes that are
+            normalized between [0, 1].
+        augmentation_name: The name of the AutoAugment policy to use. The available
+            options are `v0`, `v1`, `v2`, `v3` and `test`. `v0` is the policy used for
+            all of the results in the paper and was found to achieve the best results
+            on the COCO dataset. `v1`, `v2` and `v3` are additional good policies
+            found on the COCO dataset that have slight variation in what operations
+            were used during the search procedure along with how many operations are
+            applied in parallel to a single image (2 vs 3).
+    Returns:
+        A tuple containing the augmented versions of `image` and `bboxes`.
+    """
+    available_policies = {
+        'v0': policy_v0,
+        'v1': policy_v1,
+        'v2': policy_v2,
+        'v3': policy_v3,
+        'test': policy_vtest
+    }
+    if augmentation_name not in available_policies:
+        raise ValueError('Invalid augmentation_name: {}'.format(
+            augmentation_name))
+    policy = available_policies[augmentation_name]()
+    augmentation_hparams = {}
+    return build_and_apply_nas_policy(policy, image, bboxes,
+                                      augmentation_hparams)
--- a/tutorials/pp-series/HRNet-Keypoint/lib/dataset/transform/keypoint_operators.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/dataset/transform/keypoint_operators.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+try:
+    from collections.abc import Sequence
+except Exception:
+    from collections import Sequence
+import cv2
+import numpy as np
+import math
+import copy
+from lib.utils.keypoint_utils import warp_affine_joints, get_affine_transform, affine_transform, get_warp_matrix
+from lib.utils.workspace import serializable
+from lib.utils.logger import setup_logger
+logger = setup_logger(__name__)
+registered_ops = []
+__all__ = [
+    'RandomFlipHalfBodyTransform',
+    'TopDownAffine',
+    'ToHeatmapsTopDown',
+    'TopDownEvalAffine',
+]
+def register_keypointop(cls):
+    return serializable(cls)
+@register_keypointop
+class RandomFlipHalfBodyTransform(object):
+    """apply data augment to image and coords
+    to achieve the flip, scale, rotate and half body transform effect for training image
+    Args:
+        trainsize (list):[w, h], Image target size
+        upper_body_ids (list): The upper body joint ids
+        flip_pairs (list): The left-right joints exchange order list
+        pixel_std (int): The pixel std of the scale
+        scale (float): The scale factor to transform the image
+        rot (int): The rotate factor to transform the image
+        num_joints_half_body (int): The joints threshold of the half body transform
+        prob_half_body (float): The threshold of the half body transform
+        flip (bool): Whether to flip the image
+    Returns:
+        records(dict): contain the image and coords after tranformed
+    """
+    def __init__(self,
+                 trainsize,
+                 upper_body_ids,
+                 flip_pairs,
+                 pixel_std,
+                 scale=0.35,
+                 rot=40,
+                 num_joints_half_body=8,
+                 prob_half_body=0.3,
+                 flip=True,
+                 rot_prob=0.6):
+        super(RandomFlipHalfBodyTransform, self).__init__()
+        self.trainsize = trainsize
+        self.upper_body_ids = upper_body_ids
+        self.flip_pairs = flip_pairs
+        self.pixel_std = pixel_std
+        self.scale = scale
+        self.rot = rot
+        self.num_joints_half_body = num_joints_half_body
+        self.prob_half_body = prob_half_body
+        self.flip = flip
+        self.aspect_ratio = trainsize[0] * 1.0 / trainsize[1]
+        self.rot_prob = rot_prob
+    def halfbody_transform(self, joints, joints_vis):
+        upper_joints = []
+        lower_joints = []
+        for joint_id in range(joints.shape[0]):
+            if joints_vis[joint_id][0] > 0:
+                if joint_id in self.upper_body_ids:
+                    upper_joints.append(joints[joint_id])
+                else:
+                    lower_joints.append(joints[joint_id])
+        if np.random.randn() < 0.5 and len(upper_joints) > 2:
+            selected_joints = upper_joints
+        else:
+            selected_joints = lower_joints if len(
+                lower_joints) > 2 else upper_joints
+        if len(selected_joints) < 2:
+            return None, None
+        selected_joints = np.array(selected_joints, dtype=np.float32)
+        center = selected_joints.mean(axis=0)[:2]
+        left_top = np.amin(selected_joints, axis=0)
+        right_bottom = np.amax(selected_joints, axis=0)
+        w = right_bottom[0] - left_top[0]
+        h = right_bottom[1] - left_top[1]
+        if w > self.aspect_ratio * h:
+            h = w * 1.0 / self.aspect_ratio
+        elif w < self.aspect_ratio * h:
+            w = h * self.aspect_ratio
+        scale = np.array(
+            [w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std],
+            dtype=np.float32)
+        scale = scale * 1.5
+        return center, scale
+    def flip_joints(self, joints, joints_vis, width, matched_parts):
+        joints[:, 0] = width - joints[:, 0] - 1
+        for pair in matched_parts:
+            joints[pair[0], :], joints[pair[1], :] = \
+                joints[pair[1], :], joints[pair[0], :].copy()
+            joints_vis[pair[0], :], joints_vis[pair[1], :] = \
+                joints_vis[pair[1], :], joints_vis[pair[0], :].copy()
+        return joints * joints_vis, joints_vis
+    def __call__(self, records):
+        image = records['image']
+        joints = records['joints']
+        joints_vis = records['joints_vis']
+        c = records['center']
+        s = records['scale']
+        r = 0
+        if (np.sum(joints_vis[:, 0]) > self.num_joints_half_body and
+                np.random.rand() < self.prob_half_body):
+            c_half_body, s_half_body = self.halfbody_transform(joints,
+                                                               joints_vis)
+            if c_half_body is not None and s_half_body is not None:
+                c, s = c_half_body, s_half_body
+        sf = self.scale
+        rf = self.rot
+        s = s * np.clip(np.random.randn() * sf + 1, 1 - sf, 1 + sf)
+        r = np.clip(np.random.randn() * rf, -rf * 2,
+                    rf * 2) if np.random.random() <= self.rot_prob else 0
+        if self.flip and np.random.random() <= 0.5:
+            image = image[:, ::-1, :]
+            joints, joints_vis = self.flip_joints(
+                joints, joints_vis, image.shape[1], self.flip_pairs)
+            c[0] = image.shape[1] - c[0] - 1
+        records['image'] = image
+        records['joints'] = joints
+        records['joints_vis'] = joints_vis
+        records['center'] = c
+        records['scale'] = s
+        records['rotate'] = r
+        return records
+@register_keypointop
+class TopDownAffine(object):
+    """apply affine transform to image and coords
+    Args:
+        trainsize (list): [w, h], the standard size used to train
+        use_udp (bool): whether to use Unbiased Data Processing.
+        records(dict): the dict contained the image and coords
+    Returns:
+        records (dict): contain the image and coords after tranformed
+    """
+    def __init__(self, trainsize, use_udp=False):
+        self.trainsize = trainsize
+        self.use_udp = use_udp
+    def __call__(self, records):
+        image = records['image']
+        joints = records['joints']
+        joints_vis = records['joints_vis']
+        rot = records['rotate'] if "rotate" in records else 0
+        if self.use_udp:
+            trans = get_warp_matrix(
+                rot, records['center'] * 2.0,
+                [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0],
+                records['scale'] * 200.0)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+            joints[:, 0:2] = warp_affine_joints(joints[:, 0:2].copy(), trans)
+        else:
+            trans = get_affine_transform(records['center'], records['scale'] *
+                                         200, rot, self.trainsize)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+            for i in range(joints.shape[0]):
+                if joints_vis[i, 0] > 0.0:
+                    joints[i, 0:2] = affine_transform(joints[i, 0:2], trans)
+        records['image'] = image
+        records['joints'] = joints
+        return records
+@register_keypointop
+class TopDownEvalAffine(object):
+    """apply affine transform to image and coords
+    Args:
+        trainsize (list): [w, h], the standard size used to train
+        use_udp (bool): whether to use Unbiased Data Processing.
+        records(dict): the dict contained the image and coords
+    Returns:
+        records (dict): contain the image and coords after tranformed
+    """
+    def __init__(self, trainsize, use_udp=False):
+        self.trainsize = trainsize
+        self.use_udp = use_udp
+    def __call__(self, records):
+        image = records['image']
+        rot = 0
+        imshape = records['im_shape'][::-1]
+        center = imshape / 2.
+        scale = imshape
+        if self.use_udp:
+            trans = get_warp_matrix(
+                rot, center * 2.0,
+                [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+        else:
+            trans = get_affine_transform(center, scale, rot, self.trainsize)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+        records['image'] = image
+        return records
+@register_keypointop
+class ToHeatmapsTopDown(object):
+    """to generate the gaussin heatmaps of keypoint for heatmap loss
+    Args:
+        hmsize (list): [w, h] output heatmap's size
+        sigma (float): the std of gaussin kernel genereted
+        records(dict): the dict contained the image and coords
+    Returns:
+        records (dict): contain the heatmaps used to heatmaploss
+    """
+    def __init__(self, hmsize, sigma):
+        super(ToHeatmapsTopDown, self).__init__()
+        self.hmsize = np.array(hmsize)
+        self.sigma = sigma
+    def __call__(self, records):
+        joints = records['joints']
+        joints_vis = records['joints_vis']
+        num_joints = joints.shape[0]
+        image_size = np.array(
+            [records['image'].shape[1], records['image'].shape[0]])
+        target_weight = np.ones((num_joints, 1), dtype=np.float32)
+        target_weight[:, 0] = joints_vis[:, 0]
+        target = np.zeros(
+            (num_joints, self.hmsize[1], self.hmsize[0]), dtype=np.float32)
+        tmp_size = self.sigma * 3
+        feat_stride = image_size / self.hmsize
+        for joint_id in range(num_joints):
+            mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5)
+            mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5)
+            # Check that any part of the gaussian is in-bounds
+            ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)]
+            br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)]
+            if ul[0] >= self.hmsize[0] or ul[1] >= self.hmsize[1] or br[
+                    0] < 0 or br[1] < 0:
+                # If not, just return the image as is
+                target_weight[joint_id] = 0
+                continue
+            # # Generate gaussian
+            size = 2 * tmp_size + 1
+            x = np.arange(0, size, 1, np.float32)
+            y = x[:, np.newaxis]
+            x0 = y0 = size // 2
+            # The gaussian is not normalized, we want the center value to equal 1
+            g = np.exp(-((x - x0)**2 + (y - y0)**2) / (2 * self.sigma**2))
+            # Usable gaussian range
+            g_x = max(0, -ul[0]), min(br[0], self.hmsize[0]) - ul[0]
+            g_y = max(0, -ul[1]), min(br[1], self.hmsize[1]) - ul[1]
+            # Image range
+            img_x = max(0, ul[0]), min(br[0], self.hmsize[0])
+            img_y = max(0, ul[1]), min(br[1], self.hmsize[1])
+            v = target_weight[joint_id]
+            if v > 0.5:
+                target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = g[g_y[
+                    0]:g_y[1], g_x[0]:g_x[1]]
+        records['target'] = target
+        records['target_weight'] = target_weight
+        del records['joints'], records['joints_vis']
+        return records
--- a/tutorials/pp-series/HRNet-Keypoint/lib/dataset/transform/operators.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/dataset/transform/operators.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+try:
+    from collections.abc import Sequence
+except Exception:
+    from collections import Sequence
+from numbers import Number, Integral
+import uuid
+import random
+import math
+import numpy as np
+import os
+import copy
+import logging
+import cv2
+import traceback
+from PIL import Image, ImageDraw
+import pickle
+import threading
+MUTEX = threading.Lock()
+from lib.utils.workspace import serializable
+from lib.utils.logger import setup_logger
+logger = setup_logger(__name__)
+registered_ops = []
+def register_op(cls):
+    registered_ops.append(cls.__name__)
+    if not hasattr(BaseOperator, cls.__name__):
+        setattr(BaseOperator, cls.__name__, cls)
+    else:
+        raise KeyError("The {} class has been registered.".format(
+            cls.__name__))
+    return serializable(cls)
+class BboxError(ValueError):
+    pass
+class ImageError(ValueError):
+    pass
+class Compose(object):
+    def __init__(self, transforms, num_classes=80):
+        self.transforms = transforms
+        self.transforms_cls = []
+        for t in self.transforms:
+            for k, v in t.items():
+                op_cls = getattr(transform, k)
+                f = op_cls(**v)
+                if hasattr(f, 'num_classes'):
+                    f.num_classes = num_classes
+                self.transforms_cls.append(f)
+    def __call__(self, data):
+        for f in self.transforms_cls:
+            try:
+                data = f(data)
+            except Exception as e:
+                stack_info = traceback.format_exc()
+                logger.warning("fail to map sample transform [{}] "
+                               "with error: {} and stack:\n{}".format(
+                                   f, e, str(stack_info)))
+                raise e
+        return data
+class BaseOperator(object):
+    def __init__(self, name=None):
+        if name is None:
+            name = self.__class__.__name__
+        self._id = name + '_' + str(uuid.uuid4())[-6:]
+    def apply(self, sample, context=None):
+        """ Process a sample.
+        Args:
+            sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx}
+            context (dict): info about this sample processing
+        Returns:
+            result (dict): a processed sample
+        """
+        return sample
+    def __call__(self, sample, context=None):
+        """ Process a sample.
+        Args:
+            sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx}
+            context (dict): info about this sample processing
+        Returns:
+            result (dict): a processed sample
+        """
+        if isinstance(sample, Sequence):
+            for i in range(len(sample)):
+                sample[i] = self.apply(sample[i], context)
+        else:
+            sample = self.apply(sample, context)
+        return sample
+    def __str__(self):
+        return str(self._id)
+@register_op
+class Decode(BaseOperator):
+    def __init__(self):
+        """ Transform the image data to numpy format following the rgb format
+        """
+        super(Decode, self).__init__()
+    def apply(self, sample, context=None):
+        """ load image if 'im_file' field is not empty but 'image' is"""
+        if 'image' not in sample:
+            with open(sample['im_file'], 'rb') as f:
+                sample['image'] = f.read()
+            sample.pop('im_file')
+        im = sample['image']
+        data = np.frombuffer(im, dtype='uint8')
+        im = cv2.imdecode(data, 1)  # BGR mode, but need RGB mode
+        if 'keep_ori_im' in sample and sample['keep_ori_im']:
+            sample['ori_image'] = im
+        im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+        sample['image'] = im
+        if 'h' not in sample:
+            sample['h'] = im.shape[0]
+        elif sample['h'] != im.shape[0]:
+            logger.warning(
+                "The actual image height: {} is not equal to the "
+                "height: {} in annotation, and update sample['h'] by actual "
+                "image height.".format(im.shape[0], sample['h']))
+            sample['h'] = im.shape[0]
+        if 'w' not in sample:
+            sample['w'] = im.shape[1]
+        elif sample['w'] != im.shape[1]:
+            logger.warning(
+                "The actual image width: {} is not equal to the "
+                "width: {} in annotation, and update sample['w'] by actual "
+                "image width.".format(im.shape[1], sample['w']))
+            sample['w'] = im.shape[1]
+        sample['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
+        sample['scale_factor'] = np.array([1., 1.], dtype=np.float32)
+        return sample
+def _make_dirs(dirname):
+    try:
+        from pathlib import Path
+    except ImportError:
+        from pathlib2 import Path
+    Path(dirname).mkdir(exist_ok=True)
+@register_op
+class Permute(BaseOperator):
+    def __init__(self):
+        """
+        Change the channel to be (C, H, W)
+        """
+        super(Permute, self).__init__()
+    def apply(self, sample, context=None):
+        im = sample['image']
+        im = im.transpose((2, 0, 1))
+        sample['image'] = im
+        return sample
+@register_op
+class NormalizeImage(BaseOperator):
+    def __init__(self,
+                 mean=[0.485, 0.456, 0.406],
+                 std=[1, 1, 1],
+                 is_scale=True):
+        """
+        Args:
+            mean (list): the pixel mean
+            std (list): the pixel variance
+        """
+        super(NormalizeImage, self).__init__()
+        self.mean = mean
+        self.std = std
+        self.is_scale = is_scale
+        if not (isinstance(self.mean, list) and isinstance(self.std, list) and
+                isinstance(self.is_scale, bool)):
+            raise TypeError("{}: input type is invalid.".format(self))
+        from functools import reduce
+        if reduce(lambda x, y: x * y, self.std) == 0:
+            raise ValueError('{}: std is invalid!'.format(self))
+    def apply(self, sample, context=None):
+        """Normalize the image.
+        Operators:
+            1.(optional) Scale the image to [0,1]
+            2. Each pixel minus mean and is divided by std
+        """
+        im = sample['image']
+        im = im.astype(np.float32, copy=False)
+        mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+        std = np.array(self.std)[np.newaxis, np.newaxis, :]
+        if self.is_scale:
+            im = im / 255.0
+        im -= mean
+        im /= std
+        sample['image'] = im
+        return sample
--- a/tutorials/pp-series/HRNet-Keypoint/lib/metrics/__init__.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/metrics/__init__.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import keypoint_metrics
+from . import coco_utils
+from . import json_results
+from . import map_utils
+from .keypoint_metrics import *
+from .coco_utils import *
+from .json_results import *
+from .map_utils import *
+__all__ = keypoint_metrics.__all__ + coco_utils.__all__ + json_results.__all__ + map_utils.__all__
--- a/tutorials/pp-series/HRNet-Keypoint/lib/metrics/coco_utils.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/metrics/coco_utils.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+import numpy as np
+import itertools
+from .json_results import get_det_res, get_det_poly_res, get_seg_res, get_solov2_segm_res, get_keypoint_res
+from .map_utils import draw_pr_curve
+from lib.utils.logger import setup_logger
+logger = setup_logger(__name__)
+__all__ = ['get_infer_results', 'cocoapi_eval', 'json_eval_results']
+def get_infer_results(outs, catid, bias=0):
+    """
+    Get result at the stage of inference.
+    The output format is dictionary containing bbox or mask result.
+    For example, bbox result is a list and each element contains
+    image_id, category_id, bbox and score.
+    """
+    if outs is None or len(outs) == 0:
+        raise ValueError(
+            'The number of valid detection result if zero. Please use reasonable model and check input data.'
+        )
+    im_id = outs['im_id']
+    infer_res = {}
+    if 'bbox' in outs:
+        if len(outs['bbox']) > 0 and len(outs['bbox'][0]) > 6:
+            infer_res['bbox'] = get_det_poly_res(
+                outs['bbox'], outs['bbox_num'], im_id, catid, bias=bias)
+        else:
+            infer_res['bbox'] = get_det_res(
+                outs['bbox'], outs['bbox_num'], im_id, catid, bias=bias)
+    if 'mask' in outs:
+        # mask post process
+        infer_res['mask'] = get_seg_res(outs['mask'], outs['bbox'],
+                                        outs['bbox_num'], im_id, catid)
+    if 'segm' in outs:
+        infer_res['segm'] = get_solov2_segm_res(outs, im_id, catid)
+    if 'keypoint' in outs:
+        infer_res['keypoint'] = get_keypoint_res(outs, im_id)
+        outs['bbox_num'] = [len(infer_res['keypoint'])]
+    return infer_res
+def cocoapi_eval(jsonfile,
+                 style,
+                 coco_gt=None,
+                 anno_file=None,
+                 max_dets=(100, 300, 1000),
+                 classwise=False,
+                 sigmas=None,
+                 use_area=True):
+    """
+    Args:
+        jsonfile (str): Evaluation json file, eg: bbox.json, mask.json.
+        style (str): COCOeval style, can be `bbox` , `segm` , `proposal`, `keypoints` and `keypoints_crowd`.
+        coco_gt (str): Whether to load COCOAPI through anno_file,
+                 eg: coco_gt = COCO(anno_file)
+        anno_file (str): COCO annotations file.
+        max_dets (tuple): COCO evaluation maxDets.
+        classwise (bool): Whether per-category AP and draw P-R Curve or not.
+        sigmas (nparray): keypoint labelling sigmas.
+        use_area (bool): If gt annotations (eg. CrowdPose, AIC)
+                         do not have 'area', please set use_area=False.
+    """
+    assert coco_gt != None or anno_file != None
+    if style == 'keypoints_crowd':
+        #please install xtcocotools==1.6
+        from xtcocotools.coco import COCO
+        from xtcocotools.cocoeval import COCOeval
+    else:
+        from pycocotools.coco import COCO
+        from pycocotools.cocoeval import COCOeval
+    if coco_gt == None:
+        coco_gt = COCO(anno_file)
+    logger.info("Start evaluate...")
+    coco_dt = coco_gt.loadRes(jsonfile)
+    if style == 'proposal':
+        coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')
+        coco_eval.params.useCats = 0
+        coco_eval.params.maxDets = list(max_dets)
+    elif style == 'keypoints_crowd':
+        coco_eval = COCOeval(coco_gt, coco_dt, style, sigmas, use_area)
+    else:
+        coco_eval = COCOeval(coco_gt, coco_dt, style)
+    coco_eval.evaluate()
+    coco_eval.accumulate()
+    coco_eval.summarize()
+    if classwise:
+        # Compute per-category AP and PR curve
+        try:
+            from terminaltables import AsciiTable
+        except Exception as e:
+            logger.error(
+                'terminaltables not found, plaese install terminaltables. '
+                'for example: `pip install terminaltables`.')
+            raise e
+        precisions = coco_eval.eval['precision']
+        cat_ids = coco_gt.getCatIds()
+        # precision: (iou, recall, cls, area range, max dets)
+        assert len(cat_ids) == precisions.shape[2]
+        results_per_category = []
+        for idx, catId in enumerate(cat_ids):
+            # area range index 0: all area ranges
+            # max dets index -1: typically 100 per image
+            nm = coco_gt.loadCats(catId)[0]
+            precision = precisions[:, :, idx, 0, -1]
+            precision = precision[precision > -1]
+            if precision.size:
+                ap = np.mean(precision)
+            else:
+                ap = float('nan')
+            results_per_category.append(
+                (str(nm["name"]), '{:0.3f}'.format(float(ap))))
+            pr_array = precisions[0, :, idx, 0, 2]
+            recall_array = np.arange(0.0, 1.01, 0.01)
+            draw_pr_curve(
+                pr_array,
+                recall_array,
+                out_dir=style + '_pr_curve',
+                file_name='{}_precision_recall_curve.jpg'.format(nm["name"]))
+        num_columns = min(6, len(results_per_category) * 2)
+        results_flatten = list(itertools.chain(*results_per_category))
+        headers = ['category', 'AP'] * (num_columns // 2)
+        results_2d = itertools.zip_longest(
+            * [results_flatten[i::num_columns] for i in range(num_columns)])
+        table_data = [headers]
+        table_data += [result for result in results_2d]
+        table = AsciiTable(table_data)
+        logger.info('Per-category of {} AP: \n{}'.format(style, table.table))
+        logger.info("per-category PR curve has output to {} folder.".format(
+            style + '_pr_curve'))
+    # flush coco evaluation result
+    sys.stdout.flush()
+    return coco_eval.stats
+def json_eval_results(metric, json_directory, dataset):
+    """
+    cocoapi eval with already exists proposal.json, bbox.json or mask.json
+    """
+    assert metric == 'COCO'
+    anno_file = dataset.get_anno()
+    json_file_list = ['proposal.json', 'bbox.json', 'mask.json']
+    if json_directory:
+        assert os.path.exists(
+            json_directory), "The json directory:{} does not exist".format(
+                json_directory)
+        for k, v in enumerate(json_file_list):
+            json_file_list[k] = os.path.join(str(json_directory), v)
+    coco_eval_style = ['proposal', 'bbox', 'segm']
+    for i, v_json in enumerate(json_file_list):
+        if os.path.exists(v_json):
+            cocoapi_eval(v_json, coco_eval_style[i], anno_file=anno_file)
+        else:
+            logger.info("{} not exists!".format(v_json))
--- a/tutorials/pp-series/HRNet-Keypoint/lib/metrics/json_results.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/metrics/json_results.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import six
+import numpy as np
+__all__ = [
+    'get_det_res', 'get_det_poly_res', 'get_seg_res', 'get_solov2_segm_res',
+    'get_keypoint_res'
+]
+def get_det_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0):
+    det_res = []
+    k = 0
+    for i in range(len(bbox_nums)):
+        cur_image_id = int(image_id[i][0])
+        det_nums = bbox_nums[i]
+        for j in range(det_nums):
+            dt = bboxes[k]
+            k = k + 1
+            num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
+            if int(num_id) < 0:
+                continue
+            category_id = label_to_cat_id_map[int(num_id)]
+            w = xmax - xmin + bias
+            h = ymax - ymin + bias
+            bbox = [xmin, ymin, w, h]
+            dt_res = {
+                'image_id': cur_image_id,
+                'category_id': category_id,
+                'bbox': bbox,
+                'score': score
+            }
+            det_res.append(dt_res)
+    return det_res
+def get_det_poly_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0):
+    det_res = []
+    k = 0
+    for i in range(len(bbox_nums)):
+        cur_image_id = int(image_id[i][0])
+        det_nums = bbox_nums[i]
+        for j in range(det_nums):
+            dt = bboxes[k]
+            k = k + 1
+            num_id, score, x1, y1, x2, y2, x3, y3, x4, y4 = dt.tolist()
+            if int(num_id) < 0:
+                continue
+            category_id = label_to_cat_id_map[int(num_id)]
+            rbox = [x1, y1, x2, y2, x3, y3, x4, y4]
+            dt_res = {
+                'image_id': cur_image_id,
+                'category_id': category_id,
+                'bbox': rbox,
+                'score': score
+            }
+            det_res.append(dt_res)
+    return det_res
+def get_seg_res(masks, bboxes, mask_nums, image_id, label_to_cat_id_map):
+    import pycocotools.mask as mask_util
+    seg_res = []
+    k = 0
+    for i in range(len(mask_nums)):
+        cur_image_id = int(image_id[i][0])
+        det_nums = mask_nums[i]
+        for j in range(det_nums):
+            mask = masks[k].astype(np.uint8)
+            score = float(bboxes[k][1])
+            label = int(bboxes[k][0])
+            k = k + 1
+            if label == -1:
+                continue
+            cat_id = label_to_cat_id_map[label]
+            rle = mask_util.encode(
+                np.array(
+                    mask[:, :, None], order="F", dtype="uint8"))[0]
+            if six.PY3:
+                if 'counts' in rle:
+                    rle['counts'] = rle['counts'].decode("utf8")
+            sg_res = {
+                'image_id': cur_image_id,
+                'category_id': cat_id,
+                'segmentation': rle,
+                'score': score
+            }
+            seg_res.append(sg_res)
+    return seg_res
+def get_solov2_segm_res(results, image_id, num_id_to_cat_id_map):
+    import pycocotools.mask as mask_util
+    segm_res = []
+    # for each batch
+    segms = results['segm'].astype(np.uint8)
+    clsid_labels = results['cate_label']
+    clsid_scores = results['cate_score']
+    lengths = segms.shape[0]
+    im_id = int(image_id[0][0])
+    if lengths == 0 or segms is None:
+        return None
+    # for each sample
+    for i in range(lengths - 1):
+        clsid = int(clsid_labels[i])
+        catid = num_id_to_cat_id_map[clsid]
+        score = float(clsid_scores[i])
+        mask = segms[i]
+        segm = mask_util.encode(np.array(mask[:, :, np.newaxis], order='F'))[0]
+        segm['counts'] = segm['counts'].decode('utf8')
+        coco_res = {
+            'image_id': im_id,
+            'category_id': catid,
+            'segmentation': segm,
+            'score': score
+        }
+        segm_res.append(coco_res)
+    return segm_res
+def get_keypoint_res(results, im_id):
+    anns = []
+    preds = results['keypoint']
+    for idx in range(im_id.shape[0]):
+        image_id = im_id[idx].item()
+        kpts, scores = preds[idx]
+        for kpt, score in zip(kpts, scores):
+            kpt = kpt.flatten()
+            ann = {
+                'image_id': image_id,
+                'category_id': 1,  # XXX hard code
+                'keypoints': kpt.tolist(),
+                'score': float(score)
+            }
+            x = kpt[0::3]
+            y = kpt[1::3]
+            x0, x1, y0, y1 = np.min(x).item(), np.max(x).item(), np.min(
+                y).item(), np.max(y).item()
+            ann['area'] = (x1 - x0) * (y1 - y0)
+            ann['bbox'] = [x0, y0, x1 - x0, y1 - y0]
+            anns.append(ann)
+    return anns
--- a/tutorials/pp-series/HRNet-Keypoint/lib/metrics/keypoint_metrics.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/metrics/keypoint_metrics.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import json
+from collections import defaultdict, OrderedDict
+import numpy as np
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval
+from scipy.io import loadmat, savemat
+from lib.utils.keypoint_utils import oks_nms
+from lib.utils.logger import setup_logger
+logger = setup_logger(__name__)
+__all__ = ['KeyPointTopDownCOCOEval']
+class KeyPointTopDownCOCOEval(object):
+    '''
+    Adapted from
+        https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
+        Copyright (c) Microsoft, under the MIT License.
+    '''
+    def __init__(self,
+                 anno_file,
+                 num_samples,
+                 num_joints,
+                 output_eval,
+                 iou_type='keypoints',
+                 in_vis_thre=0.2,
+                 oks_thre=0.9,
+                 save_prediction_only=False):
+        super(KeyPointTopDownCOCOEval, self).__init__()
+        self.coco = COCO(anno_file)
+        self.num_samples = num_samples
+        self.num_joints = num_joints
+        self.iou_type = iou_type
+        self.in_vis_thre = in_vis_thre
+        self.oks_thre = oks_thre
+        self.output_eval = output_eval
+        self.res_file = os.path.join(output_eval, "keypoints_results.json")
+        self.save_prediction_only = save_prediction_only
+        self.reset()
+    def reset(self):
+        self.results = {
+            'all_preds': np.zeros(
+                (self.num_samples, self.num_joints, 3), dtype=np.float32),
+            'all_boxes': np.zeros((self.num_samples, 6)),
+            'image_path': []
+        }
+        self.eval_results = {}
+        self.idx = 0
+    def update(self, inputs, outputs):
+        kpts, _ = outputs['keypoint'][0]
+        num_images = inputs['image'].shape[0]
+        self.results['all_preds'][self.idx:self.idx + num_images, :, 0:
+                                  3] = kpts[:, :, 0:3]
+        self.results['all_boxes'][self.idx:self.idx + num_images, 0:
+                                  2] = inputs['center'].numpy()[:, 0:2]
+        self.results['all_boxes'][self.idx:self.idx + num_images, 2:
+                                  4] = inputs['scale'].numpy()[:, 0:2]
+        self.results['all_boxes'][self.idx:self.idx + num_images, 4] = np.prod(
+            inputs['scale'].numpy() * 200, 1)
+        self.results['all_boxes'][self.idx:self.idx + num_images,
+                                  5] = np.squeeze(inputs['score'].numpy())
+        self.results['image_path'].extend(inputs['im_id'].numpy())
+        self.idx += num_images
+    def _write_coco_keypoint_results(self, keypoints):
+        data_pack = [{
+            'cat_id': 1,
+            'cls': 'person',
+            'ann_type': 'keypoints',
+            'keypoints': keypoints
+        }]
+        results = self._coco_keypoint_results_one_category_kernel(data_pack[0])
+        if not os.path.exists(self.output_eval):
+            os.makedirs(self.output_eval)
+        with open(self.res_file, 'w') as f:
+            json.dump(results, f, sort_keys=True, indent=4)
+            logger.info(f'The keypoint result is saved to {self.res_file}.')
+        try:
+            json.load(open(self.res_file))
+        except Exception:
+            content = []
+            with open(self.res_file, 'r') as f:
+                for line in f:
+                    content.append(line)
+            content[-1] = ']'
+            with open(self.res_file, 'w') as f:
+                for c in content:
+                    f.write(c)
+    def _coco_keypoint_results_one_category_kernel(self, data_pack):
+        cat_id = data_pack['cat_id']
+        keypoints = data_pack['keypoints']
+        cat_results = []
+        for img_kpts in keypoints:
+            if len(img_kpts) == 0:
+                continue
+            _key_points = np.array(
+                [img_kpts[k]['keypoints'] for k in range(len(img_kpts))])
+            _key_points = _key_points.reshape(_key_points.shape[0], -1)
+            result = [{
+                'image_id': img_kpts[k]['image'],
+                'category_id': cat_id,
+                'keypoints': _key_points[k].tolist(),
+                'score': img_kpts[k]['score'],
+                'center': list(img_kpts[k]['center']),
+                'scale': list(img_kpts[k]['scale'])
+            } for k in range(len(img_kpts))]
+            cat_results.extend(result)
+        return cat_results
+    def get_final_results(self, preds, all_boxes, img_path):
+        _kpts = []
+        for idx, kpt in enumerate(preds):
+            _kpts.append({
+                'keypoints': kpt,
+                'center': all_boxes[idx][0:2],
+                'scale': all_boxes[idx][2:4],
+                'area': all_boxes[idx][4],
+                'score': all_boxes[idx][5],
+                'image': int(img_path[idx])
+            })
+        # image x person x (keypoints)
+        kpts = defaultdict(list)
+        for kpt in _kpts:
+            kpts[kpt['image']].append(kpt)
+        # rescoring and oks nms
+        num_joints = preds.shape[1]
+        in_vis_thre = self.in_vis_thre
+        oks_thre = self.oks_thre
+        oks_nmsed_kpts = []
+        for img in kpts.keys():
+            img_kpts = kpts[img]
+            for n_p in img_kpts:
+                box_score = n_p['score']
+                kpt_score = 0
+                valid_num = 0
+                for n_jt in range(0, num_joints):
+                    t_s = n_p['keypoints'][n_jt][2]
+                    if t_s > in_vis_thre:
+                        kpt_score = kpt_score + t_s
+                        valid_num = valid_num + 1
+                if valid_num != 0:
+                    kpt_score = kpt_score / valid_num
+                # rescoring
+                n_p['score'] = kpt_score * box_score
+            keep = oks_nms([img_kpts[i] for i in range(len(img_kpts))],
+                           oks_thre)
+            if len(keep) == 0:
+                oks_nmsed_kpts.append(img_kpts)
+            else:
+                oks_nmsed_kpts.append([img_kpts[_keep] for _keep in keep])
+        self._write_coco_keypoint_results(oks_nmsed_kpts)
+    def accumulate(self):
+        self.get_final_results(self.results['all_preds'],
+                               self.results['all_boxes'],
+                               self.results['image_path'])
+        if self.save_prediction_only:
+            logger.info(f'The keypoint result is saved to {self.res_file} '
+                        'and do not evaluate the mAP.')
+            return
+        coco_dt = self.coco.loadRes(self.res_file)
+        coco_eval = COCOeval(self.coco, coco_dt, 'keypoints')
+        coco_eval.params.useSegm = None
+        coco_eval.evaluate()
+        coco_eval.accumulate()
+        coco_eval.summarize()
+        keypoint_stats = []
+        for ind in range(len(coco_eval.stats)):
+            keypoint_stats.append((coco_eval.stats[ind]))
+        self.eval_results['keypoint'] = keypoint_stats
+    def log(self):
+        if self.save_prediction_only:
+            return
+        stats_names = [
+            'AP', 'Ap .5', 'AP .75', 'AP (M)', 'AP (L)', 'AR', 'AR .5',
+            'AR .75', 'AR (M)', 'AR (L)'
+        ]
+        num_values = len(stats_names)
+        print(' '.join(['| {}'.format(name) for name in stats_names]) + ' |')
+        print('|---' * (num_values + 1) + '|')
+        print(' '.join([
+            '| {:.3f}'.format(value) for value in self.eval_results['keypoint']
+        ]) + ' |')
+    def get_results(self):
+        return self.eval_results
--- a/tutorials/pp-series/HRNet-Keypoint/lib/metrics/map_utils.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/metrics/map_utils.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import os
+import sys
+import math
+import numpy as np
+import itertools
+import paddle
+from lib.utils.logger import setup_logger
+logger = setup_logger(__name__)
+__all__ = [
+    'draw_pr_curve', 'bbox_area', 'jaccard_overlap', 'prune_zero_padding',
+    'DetectionMAP', 'ap_per_class', 'compute_ap', 'get_best_begin_point_single'
+]
+def cal_line_length(point1, point2):
+    import math
+    return math.sqrt(
+        math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1],
+                                                      2))
+def get_best_begin_point_single(coordinate):
+    x1, y1, x2, y2, x3, y3, x4, y4 = coordinate
+    xmin = min(x1, x2, x3, x4)
+    ymin = min(y1, y2, y3, y4)
+    xmax = max(x1, x2, x3, x4)
+    ymax = max(y1, y2, y3, y4)
+    combinate = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
+                 [[x4, y4], [x1, y1], [x2, y2], [x3, y3]],
+                 [[x3, y3], [x4, y4], [x1, y1], [x2, y2]],
+                 [[x2, y2], [x3, y3], [x4, y4], [x1, y1]]]
+    dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]]
+    force = 100000000.0
+    force_flag = 0
+    for i in range(4):
+        temp_force = cal_line_length(combinate[i][0], dst_coordinate[0]) \
+                     + cal_line_length(combinate[i][1], dst_coordinate[1]) \
+                     + cal_line_length(combinate[i][2], dst_coordinate[2]) \
+                     + cal_line_length(combinate[i][3], dst_coordinate[3])
+        if temp_force < force:
+            force = temp_force
+            force_flag = i
+    if force_flag != 0:
+        pass
+    return np.array(combinate[force_flag]).reshape(8)
+def poly2rbox(polys):
+    """
+    poly:[x0,y0,x1,y1,x2,y2,x3,y3]
+    to
+    rotated_boxes:[x_ctr,y_ctr,w,h,angle]
+    """
+    rotated_boxes = []
+    for poly in polys:
+        poly = np.array(poly[:8], dtype=np.float32)
+        pt1 = (poly[0], poly[1])
+        pt2 = (poly[2], poly[3])
+        pt3 = (poly[4], poly[5])
+        pt4 = (poly[6], poly[7])
+        edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[
+            1]) * (pt1[1] - pt2[1]))
+        edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[
+            1]) * (pt2[1] - pt3[1]))
+        width = max(edge1, edge2)
+        height = min(edge1, edge2)
+        rbox_angle = 0
+        if edge1 > edge2:
+            rbox_angle = np.arctan2(
+                float(pt2[1] - pt1[1]), float(pt2[0] - pt1[0]))
+        elif edge2 >= edge1:
+            rbox_angle = np.arctan2(
+                float(pt4[1] - pt1[1]), float(pt4[0] - pt1[0]))
+        def norm_angle(angle, range=[-np.pi / 4, np.pi]):
+            return (angle - range[0]) % range[1] + range[0]
+        rbox_angle = norm_angle(rbox_angle)
+        x_ctr = float(pt1[0] + pt3[0]) / 2
+        y_ctr = float(pt1[1] + pt3[1]) / 2
+        rotated_box = np.array([x_ctr, y_ctr, width, height, rbox_angle])
+        rotated_boxes.append(rotated_box)
+    ret_rotated_boxes = np.array(rotated_boxes)
+    assert ret_rotated_boxes.shape[1] == 5
+    return ret_rotated_boxes
+def rbox2poly_np(rrects):
+    """
+    rrect:[x_ctr,y_ctr,w,h,angle]
+    to
+    poly:[x0,y0,x1,y1,x2,y2,x3,y3]
+    """
+    polys = []
+    for i in range(rrects.shape[0]):
+        rrect = rrects[i]
+        # x_ctr, y_ctr, width, height, angle = rrect[:5]
+        x_ctr = rrect[0]
+        y_ctr = rrect[1]
+        width = rrect[2]
+        height = rrect[3]
+        angle = rrect[4]
+        tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2
+        rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]])
+        R = np.array([[np.cos(angle), -np.sin(angle)],
+                      [np.sin(angle), np.cos(angle)]])
+        poly = R.dot(rect)
+        x0, x1, x2, x3 = poly[0, :4] + x_ctr
+        y0, y1, y2, y3 = poly[1, :4] + y_ctr
+        poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32)
+        poly = get_best_begin_point_single(poly)
+        polys.append(poly)
+    polys = np.array(polys)
+    return polys
+def draw_pr_curve(precision,
+                  recall,
+                  iou=0.5,
+                  out_dir='pr_curve',
+                  file_name='precision_recall_curve.jpg'):
+    if not os.path.exists(out_dir):
+        os.makedirs(out_dir)
+    output_path = os.path.join(out_dir, file_name)
+    try:
+        import matplotlib.pyplot as plt
+    except Exception as e:
+        logger.error('Matplotlib not found, plaese install matplotlib.'
+                     'for example: `pip install matplotlib`.')
+        raise e
+    plt.cla()
+    plt.figure('P-R Curve')
+    plt.title('Precision/Recall Curve(IoU={})'.format(iou))
+    plt.xlabel('Recall')
+    plt.ylabel('Precision')
+    plt.grid(True)
+    plt.plot(recall, precision)
+    plt.savefig(output_path)
+def bbox_area(bbox, is_bbox_normalized):
+    """
+    Calculate area of a bounding box
+    """
+    norm = 1. - float(is_bbox_normalized)
+    width = bbox[2] - bbox[0] + norm
+    height = bbox[3] - bbox[1] + norm
+    return width * height
+def jaccard_overlap(pred, gt, is_bbox_normalized=False):
+    """
+    Calculate jaccard overlap ratio between two bounding box
+    """
+    if pred[0] >= gt[2] or pred[2] <= gt[0] or \
+        pred[1] >= gt[3] or pred[3] <= gt[1]:
+        return 0.
+    inter_xmin = max(pred[0], gt[0])
+    inter_ymin = max(pred[1], gt[1])
+    inter_xmax = min(pred[2], gt[2])
+    inter_ymax = min(pred[3], gt[3])
+    inter_size = bbox_area([inter_xmin, inter_ymin, inter_xmax, inter_ymax],
+                           is_bbox_normalized)
+    pred_size = bbox_area(pred, is_bbox_normalized)
+    gt_size = bbox_area(gt, is_bbox_normalized)
+    overlap = float(inter_size) / (pred_size + gt_size - inter_size)
+    return overlap
+def calc_rbox_iou(pred, gt_rbox):
+    """
+    calc iou between rotated bbox
+    """
+    # calc iou of bounding box for speedup
+    pred = np.array(pred, np.float32).reshape(-1, 8)
+    pred = pred.reshape(-1, 2)
+    gt_poly = rbox2poly_np(np.array(gt_rbox).reshape(-1, 5))[0]
+    gt_poly = gt_poly.reshape(-1, 2)
+    pred_rect = [
+        np.min(pred[:, 0]), np.min(pred[:, 1]), np.max(pred[:, 0]),
+        np.max(pred[:, 1])
+    ]
+    gt_rect = [
+        np.min(gt_poly[:, 0]), np.min(gt_poly[:, 1]), np.max(gt_poly[:, 0]),
+        np.max(gt_poly[:, 1])
+    ]
+    iou = jaccard_overlap(pred_rect, gt_rect, False)
+    if iou <= 0:
+        return iou
+    # calc rbox iou
+    pred = pred.reshape(-1, 8)
+    pred = np.array(pred, np.float32).reshape(-1, 8)
+    pred_rbox = poly2rbox(pred)
+    pred_rbox = pred_rbox.reshape(-1, 5)
+    pred_rbox = pred_rbox.reshape(-1, 5)
+    try:
+        from rbox_iou_ops import rbox_iou
+    except Exception as e:
+        print("import custom_ops error, try install rbox_iou_ops " \
+                  "following ppdet/ext_op/README.md", e)
+        sys.stdout.flush()
+        sys.exit(-1)
+    gt_rbox = np.array(gt_rbox, np.float32).reshape(-1, 5)
+    pd_gt_rbox = paddle.to_tensor(gt_rbox, dtype='float32')
+    pd_pred_rbox = paddle.to_tensor(pred_rbox, dtype='float32')
+    iou = rbox_iou(pd_gt_rbox, pd_pred_rbox)
+    iou = iou.numpy()
+    return iou[0][0]
+def prune_zero_padding(gt_box, gt_label, difficult=None):
+    valid_cnt = 0
+    for i in range(len(gt_box)):
+        if gt_box[i, 0] == 0 and gt_box[i, 1] == 0 and \
+                gt_box[i, 2] == 0 and gt_box[i, 3] == 0:
+            break
+        valid_cnt += 1
+    return (gt_box[:valid_cnt], gt_label[:valid_cnt], difficult[:valid_cnt]
+            if difficult is not None else None)
+class DetectionMAP(object):
+    """
+    Calculate detection mean average precision.
+    Currently support two types: 11point and integral
+    Args:
+        class_num (int): The class number.
+        overlap_thresh (float): The threshold of overlap
+            ratio between prediction bounding box and 
+            ground truth bounding box for deciding 
+            true/false positive. Default 0.5.
+        map_type (str): Calculation method of mean average
+            precision, currently support '11point' and
+            'integral'. Default '11point'.
+        is_bbox_normalized (bool): Whether bounding boxes
+            is normalized to range[0, 1]. Default False.
+        evaluate_difficult (bool): Whether to evaluate
+            difficult bounding boxes. Default False.
+        catid2name (dict): Mapping between category id and category name.
+        classwise (bool): Whether per-category AP and draw
+            P-R Curve or not.
+    """
+    def __init__(self,
+                 class_num,
+                 overlap_thresh=0.5,
+                 map_type='11point',
+                 is_bbox_normalized=False,
+                 evaluate_difficult=False,
+                 catid2name=None,
+                 classwise=False):
+        self.class_num = class_num
+        self.overlap_thresh = overlap_thresh
+        assert map_type in ['11point', 'integral'], \
+                "map_type currently only support '11point' "\
+                "and 'integral'"
+        self.map_type = map_type
+        self.is_bbox_normalized = is_bbox_normalized
+        self.evaluate_difficult = evaluate_difficult
+        self.classwise = classwise
+        self.classes = []
+        for cname in catid2name.values():
+            self.classes.append(cname)
+        self.reset()
+    def update(self, bbox, score, label, gt_box, gt_label, difficult=None):
+        """
+        Update metric statics from given prediction and ground
+        truth infomations.
+        """
+        if difficult is None:
+            difficult = np.zeros_like(gt_label)
+        # record class gt count
+        for gtl, diff in zip(gt_label, difficult):
+            if self.evaluate_difficult or int(diff) == 0:
+                self.class_gt_counts[int(np.array(gtl))] += 1
+        # record class score positive
+        visited = [False] * len(gt_label)
+        for b, s, l in zip(bbox, score, label):
+            pred = b.tolist() if isinstance(b, np.ndarray) else b
+            max_idx = -1
+            max_overlap = -1.0
+            for i, gl in enumerate(gt_label):
+                if int(gl) == int(l):
+                    if len(gt_box[i]) == 5:
+                        overlap = calc_rbox_iou(pred, gt_box[i])
+                    else:
+                        overlap = jaccard_overlap(pred, gt_box[i],
+                                                  self.is_bbox_normalized)
+                    if overlap > max_overlap:
+                        max_overlap = overlap
+                        max_idx = i
+            if max_overlap > self.overlap_thresh:
+                if self.evaluate_difficult or \
+                        int(np.array(difficult[max_idx])) == 0:
+                    if not visited[max_idx]:
+                        self.class_score_poss[int(l)].append([s, 1.0])
+                        visited[max_idx] = True
+                    else:
+                        self.class_score_poss[int(l)].append([s, 0.0])
+            else:
+                self.class_score_poss[int(l)].append([s, 0.0])
+    def reset(self):
+        """
+        Reset metric statics
+        """
+        self.class_score_poss = [[] for _ in range(self.class_num)]
+        self.class_gt_counts = [0] * self.class_num
+        self.mAP = 0.0
+    def accumulate(self):
+        """
+        Accumulate metric results and calculate mAP
+        """
+        mAP = 0.
+        valid_cnt = 0
+        eval_results = []
+        for score_pos, count in zip(self.class_score_poss,
+                                    self.class_gt_counts):
+            if count == 0: continue
+            if len(score_pos) == 0:
+                valid_cnt += 1
+                continue
+            accum_tp_list, accum_fp_list = \
+                    self._get_tp_fp_accum(score_pos)
+            precision = []
+            recall = []
+            for ac_tp, ac_fp in zip(accum_tp_list, accum_fp_list):
+                precision.append(float(ac_tp) / (ac_tp + ac_fp))
+                recall.append(float(ac_tp) / count)
+            one_class_ap = 0.0
+            if self.map_type == '11point':
+                max_precisions = [0.] * 11
+                start_idx = len(precision) - 1
+                for j in range(10, -1, -1):
+                    for i in range(start_idx, -1, -1):
+                        if recall[i] < float(j) / 10.:
+                            start_idx = i
+                            if j > 0:
+                                max_precisions[j - 1] = max_precisions[j]
+                                break
+                        else:
+                            if max_precisions[j] < precision[i]:
+                                max_precisions[j] = precision[i]
+                one_class_ap = sum(max_precisions) / 11.
+                mAP += one_class_ap
+                valid_cnt += 1
+            elif self.map_type == 'integral':
+                import math
+                prev_recall = 0.
+                for i in range(len(precision)):
+                    recall_gap = math.fabs(recall[i] - prev_recall)
+                    if recall_gap > 1e-6:
+                        one_class_ap += precision[i] * recall_gap
+                        prev_recall = recall[i]
+                mAP += one_class_ap
+                valid_cnt += 1
+            else:
+                logger.error("Unspported mAP type {}".format(self.map_type))
+                sys.exit(1)
+            eval_results.append({
+                'class': self.classes[valid_cnt - 1],
+                'ap': one_class_ap,
+                'precision': precision,
+                'recall': recall,
+            })
+        self.eval_results = eval_results
+        self.mAP = mAP / float(valid_cnt) if valid_cnt > 0 else mAP
+    def get_map(self):
+        """
+        Get mAP result
+        """
+        if self.mAP is None:
+            logger.error("mAP is not calculated.")
+        if self.classwise:
+            # Compute per-category AP and PR curve
+            try:
+                from terminaltables import AsciiTable
+            except Exception as e:
+                logger.error(
+                    'terminaltables not found, plaese install terminaltables. '
+                    'for example: `pip install terminaltables`.')
+                raise e
+            results_per_category = []
+            for eval_result in self.eval_results:
+                results_per_category.append(
+                    (str(eval_result['class']),
+                     '{:0.3f}'.format(float(eval_result['ap']))))
+                draw_pr_curve(
+                    eval_result['precision'],
+                    eval_result['recall'],
+                    out_dir='voc_pr_curve',
+                    file_name='{}_precision_recall_curve.jpg'.format(
+                        eval_result['class']))
+            num_columns = min(6, len(results_per_category) * 2)
+            results_flatten = list(itertools.chain(*results_per_category))
+            headers = ['category', 'AP'] * (num_columns // 2)
+            results_2d = itertools.zip_longest(* [
+                results_flatten[i::num_columns] for i in range(num_columns)
+            ])
+            table_data = [headers]
+            table_data += [result for result in results_2d]
+            table = AsciiTable(table_data)
+            logger.info('Per-category of VOC AP: \n{}'.format(table.table))
+            logger.info(
+                "per-category PR curve has output to voc_pr_curve folder.")
+        return self.mAP
+    def _get_tp_fp_accum(self, score_pos_list):
+        """
+        Calculate accumulating true/false positive results from
+        [score, pos] records
+        """
+        sorted_list = sorted(score_pos_list, key=lambda s: s[0], reverse=True)
+        accum_tp = 0
+        accum_fp = 0
+        accum_tp_list = []
+        accum_fp_list = []
+        for (score, pos) in sorted_list:
+            accum_tp += int(pos)
+            accum_tp_list.append(accum_tp)
+            accum_fp += 1 - int(pos)
+            accum_fp_list.append(accum_fp)
+        return accum_tp_list, accum_fp_list
+def ap_per_class(tp, conf, pred_cls, target_cls):
+    """
+    Computes the average precision, given the recall and precision curves.
+    Method originally from https://github.com/rafaelpadilla/Object-Detection-Metrics.
+    Args:
+        tp (list): True positives.
+        conf (list): Objectness value from 0-1.
+        pred_cls (list): Predicted object classes.
+        target_cls (list): Target object classes.
+    """
+    tp, conf, pred_cls, target_cls = np.array(tp), np.array(conf), np.array(
+        pred_cls), np.array(target_cls)
+    # Sort by objectness
+    i = np.argsort(-conf)
+    tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
+    # Find unique classes
+    unique_classes = np.unique(np.concatenate((pred_cls, target_cls), 0))
+    # Create Precision-Recall curve and compute AP for each class
+    ap, p, r = [], [], []
+    for c in unique_classes:
+        i = pred_cls == c
+        n_gt = sum(target_cls == c)  # Number of ground truth objects
+        n_p = sum(i)  # Number of predicted objects
+        if (n_p == 0) and (n_gt == 0):
+            continue
+        elif (n_p == 0) or (n_gt == 0):
+            ap.append(0)
+            r.append(0)
+            p.append(0)
+        else:
+            # Accumulate FPs and TPs
+            fpc = np.cumsum(1 - tp[i])
+            tpc = np.cumsum(tp[i])
+            # Recall
+            recall_curve = tpc / (n_gt + 1e-16)
+            r.append(tpc[-1] / (n_gt + 1e-16))
+            # Precision
+            precision_curve = tpc / (tpc + fpc)
+            p.append(tpc[-1] / (tpc[-1] + fpc[-1]))
+            # AP from recall-precision curve
+            ap.append(compute_ap(recall_curve, precision_curve))
+    return np.array(ap), unique_classes.astype('int32'), np.array(r), np.array(
+        p)
+def compute_ap(recall, precision):
+    """
+    Computes the average precision, given the recall and precision curves.
+    Code originally from https://github.com/rbgirshick/py-faster-rcnn.
+    Args:
+        recall (list): The recall curve.
+        precision (list): The precision curve.
+    Returns:
+        The average precision as computed in py-faster-rcnn.
+    """
+    # correct AP calculation
+    # first append sentinel values at the end
+    mrec = np.concatenate(([0.], recall, [1.]))
+    mpre = np.concatenate(([0.], precision, [0.]))
+    # compute the precision envelope
+    for i in range(mpre.size - 1, 0, -1):
+        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
+    # to calculate area under PR curve, look for points
+    # where X axis (recall) changes value
+    i = np.where(mrec[1:] != mrec[:-1])[0]
+    # and sum (\Delta recall) * prec
+    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
+    return ap
--- a/tutorials/pp-series/HRNet-Keypoint/lib/models/__init__.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/models/__init__.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import hrnet
+from . import lite_hrnet
+from . import keypoint_hrnet
+from . import loss
+from .hrnet import *
+from .keypoint_hrnet import *
+from .loss import *
+from .lite_hrnet import *
+__all__ = hrnet.__all__ + keypoint_hrnet.__all__ \
+          + loss.__all__
--- a/tutorials/pp-series/HRNet-Keypoint/lib/models/hrnet.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/models/hrnet.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn import AdaptiveAvgPool2D, Linear
+from paddle.regularizer import L2Decay
+from paddle import ParamAttr
+from paddle.nn.initializer import Normal, Uniform
+from collections import namedtuple
+from numbers import Integral
+import math
+from lib.utils.workspace import register
+__all__ = ['HRNet']
+class ConvNormLayer(nn.Layer):
+    def __init__(self,
+                 ch_in,
+                 ch_out,
+                 filter_size,
+                 stride=1,
+                 norm_type='bn',
+                 norm_groups=32,
+                 use_dcn=False,
+                 norm_decay=0.,
+                 freeze_norm=False,
+                 act=None,
+                 name=None):
+        super(ConvNormLayer, self).__init__()
+        assert norm_type in ['bn', 'sync_bn', 'gn']
+        self.act = act
+        self.conv = nn.Conv2D(
+            in_channels=ch_in,
+            out_channels=ch_out,
+            kernel_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=1,
+            weight_attr=ParamAttr(initializer=Normal(
+                mean=0., std=0.01)),
+            bias_attr=False)
+        norm_lr = 0. if freeze_norm else 1.
+        param_attr = ParamAttr(
+            learning_rate=norm_lr, regularizer=L2Decay(norm_decay))
+        bias_attr = ParamAttr(
+            learning_rate=norm_lr, regularizer=L2Decay(norm_decay))
+        global_stats = True if freeze_norm else False
+        if norm_type in ['bn', 'sync_bn']:
+            self.norm = nn.BatchNorm(
+                ch_out,
+                param_attr=param_attr,
+                bias_attr=bias_attr,
+                use_global_stats=global_stats)
+        elif norm_type == 'gn':
+            self.norm = nn.GroupNorm(
+                num_groups=norm_groups,
+                num_channels=ch_out,
+                weight_attr=param_attr,
+                bias_attr=bias_attr)
+        norm_params = self.norm.parameters()
+        if freeze_norm:
+            for param in norm_params:
+                param.stop_gradient = True
+    def forward(self, inputs):
+        out = self.conv(inputs)
+        out = self.norm(out)
+        if self.act == 'relu':
+            out = F.relu(out)
+        return out
+class ShapeSpec(
+        namedtuple("_ShapeSpec", ["channels", "height", "width", "stride"])):
+    def __new__(cls, channels=None, height=None, width=None, stride=None):
+        return super(ShapeSpec, cls).__new__(cls, channels, height, width,
+                                             stride)
+class Layer1(nn.Layer):
+    def __init__(self,
+                 num_channels,
+                 has_se=False,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 name=None):
+        super(Layer1, self).__init__()
+        self.bottleneck_block_list = []
+        for i in range(4):
+            bottleneck_block = self.add_sublayer(
+                "block_{}_{}".format(name, i + 1),
+                BottleneckBlock(
+                    num_channels=num_channels if i == 0 else 256,
+                    num_filters=64,
+                    has_se=has_se,
+                    stride=1,
+                    downsample=True if i == 0 else False,
+                    norm_decay=norm_decay,
+                    freeze_norm=freeze_norm,
+                    name=name + '_' + str(i + 1)))
+            self.bottleneck_block_list.append(bottleneck_block)
+    def forward(self, input):
+        conv = input
+        for block_func in self.bottleneck_block_list:
+            conv = block_func(conv)
+        return conv
+class TransitionLayer(nn.Layer):
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 name=None):
+        super(TransitionLayer, self).__init__()
+        num_in = len(in_channels)
+        num_out = len(out_channels)
+        out = []
+        self.conv_bn_func_list = []
+        for i in range(num_out):
+            residual = None
+            if i < num_in:
+                if in_channels[i] != out_channels[i]:
+                    residual = self.add_sublayer(
+                        "transition_{}_layer_{}".format(name, i + 1),
+                        ConvNormLayer(
+                            ch_in=in_channels[i],
+                            ch_out=out_channels[i],
+                            filter_size=3,
+                            norm_decay=norm_decay,
+                            freeze_norm=freeze_norm,
+                            act='relu',
+                            name=name + '_layer_' + str(i + 1)))
+            else:
+                residual = self.add_sublayer(
+                    "transition_{}_layer_{}".format(name, i + 1),
+                    ConvNormLayer(
+                        ch_in=in_channels[-1],
+                        ch_out=out_channels[i],
+                        filter_size=3,
+                        stride=2,
+                        norm_decay=norm_decay,
+                        freeze_norm=freeze_norm,
+                        act='relu',
+                        name=name + '_layer_' + str(i + 1)))
+            self.conv_bn_func_list.append(residual)
+    def forward(self, input):
+        outs = []
+        for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
+            if conv_bn_func is None:
+                outs.append(input[idx])
+            else:
+                if idx < len(input):
+                    outs.append(conv_bn_func(input[idx]))
+                else:
+                    outs.append(conv_bn_func(input[-1]))
+        return outs
+class Branches(nn.Layer):
+    def __init__(self,
+                 block_num,
+                 in_channels,
+                 out_channels,
+                 has_se=False,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 name=None):
+        super(Branches, self).__init__()
+        self.basic_block_list = []
+        for i in range(len(out_channels)):
+            self.basic_block_list.append([])
+            for j in range(block_num):
+                in_ch = in_channels[i] if j == 0 else out_channels[i]
+                basic_block_func = self.add_sublayer(
+                    "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
+                    BasicBlock(
+                        num_channels=in_ch,
+                        num_filters=out_channels[i],
+                        has_se=has_se,
+                        norm_decay=norm_decay,
+                        freeze_norm=freeze_norm,
+                        name=name + '_branch_layer_' + str(i + 1) + '_' +
+                        str(j + 1)))
+                self.basic_block_list[i].append(basic_block_func)
+    def forward(self, inputs):
+        outs = []
+        for idx, input in enumerate(inputs):
+            conv = input
+            basic_block_list = self.basic_block_list[idx]
+            for basic_block_func in basic_block_list:
+                conv = basic_block_func(conv)
+            outs.append(conv)
+        return outs
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 num_channels,
+                 num_filters,
+                 has_se,
+                 stride=1,
+                 downsample=False,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 name=None):
+        super(BottleneckBlock, self).__init__()
+        self.has_se = has_se
+        self.downsample = downsample
+        self.conv1 = ConvNormLayer(
+            ch_in=num_channels,
+            ch_out=num_filters,
+            filter_size=1,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            act="relu",
+            name=name + "_conv1")
+        self.conv2 = ConvNormLayer(
+            ch_in=num_filters,
+            ch_out=num_filters,
+            filter_size=3,
+            stride=stride,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            act="relu",
+            name=name + "_conv2")
+        self.conv3 = ConvNormLayer(
+            ch_in=num_filters,
+            ch_out=num_filters * 4,
+            filter_size=1,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            act=None,
+            name=name + "_conv3")
+        if self.downsample:
+            self.conv_down = ConvNormLayer(
+                ch_in=num_channels,
+                ch_out=num_filters * 4,
+                filter_size=1,
+                norm_decay=norm_decay,
+                freeze_norm=freeze_norm,
+                act=None,
+                name=name + "_downsample")
+        if self.has_se:
+            self.se = SELayer(
+                num_channels=num_filters * 4,
+                num_filters=num_filters * 4,
+                reduction_ratio=16,
+                name='fc' + name)
+    def forward(self, input):
+        residual = input
+        conv1 = self.conv1(input)
+        conv2 = self.conv2(conv1)
+        conv3 = self.conv3(conv2)
+        if self.downsample:
+            residual = self.conv_down(input)
+        if self.has_se:
+            conv3 = self.se(conv3)
+        y = paddle.add(x=residual, y=conv3)
+        y = F.relu(y)
+        return y
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 num_channels,
+                 num_filters,
+                 stride=1,
+                 has_se=False,
+                 downsample=False,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 name=None):
+        super(BasicBlock, self).__init__()
+        self.has_se = has_se
+        self.downsample = downsample
+        self.conv1 = ConvNormLayer(
+            ch_in=num_channels,
+            ch_out=num_filters,
+            filter_size=3,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            stride=stride,
+            act="relu",
+            name=name + "_conv1")
+        self.conv2 = ConvNormLayer(
+            ch_in=num_filters,
+            ch_out=num_filters,
+            filter_size=3,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            stride=1,
+            act=None,
+            name=name + "_conv2")
+        if self.downsample:
+            self.conv_down = ConvNormLayer(
+                ch_in=num_channels,
+                ch_out=num_filters * 4,
+                filter_size=1,
+                norm_decay=norm_decay,
+                freeze_norm=freeze_norm,
+                act=None,
+                name=name + "_downsample")
+        if self.has_se:
+            self.se = SELayer(
+                num_channels=num_filters,
+                num_filters=num_filters,
+                reduction_ratio=16,
+                name='fc' + name)
+    def forward(self, input):
+        residual = input
+        conv1 = self.conv1(input)
+        conv2 = self.conv2(conv1)
+        if self.downsample:
+            residual = self.conv_down(input)
+        if self.has_se:
+            conv2 = self.se(conv2)
+        y = paddle.add(x=residual, y=conv2)
+        y = F.relu(y)
+        return y
+class SELayer(nn.Layer):
+    def __init__(self, num_channels, num_filters, reduction_ratio, name=None):
+        super(SELayer, self).__init__()
+        self.pool2d_gap = AdaptiveAvgPool2D(1)
+        self._num_channels = num_channels
+        med_ch = int(num_channels / reduction_ratio)
+        stdv = 1.0 / math.sqrt(num_channels * 1.0)
+        self.squeeze = Linear(
+            num_channels,
+            med_ch,
+            weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv)))
+        stdv = 1.0 / math.sqrt(med_ch * 1.0)
+        self.excitation = Linear(
+            med_ch,
+            num_filters,
+            weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv)))
+    def forward(self, input):
+        pool = self.pool2d_gap(input)
+        pool = paddle.squeeze(pool, axis=[2, 3])
+        squeeze = self.squeeze(pool)
+        squeeze = F.relu(squeeze)
+        excitation = self.excitation(squeeze)
+        excitation = F.sigmoid(excitation)
+        excitation = paddle.unsqueeze(excitation, axis=[2, 3])
+        out = input * excitation
+        return out
+class Stage(nn.Layer):
+    def __init__(self,
+                 num_channels,
+                 num_modules,
+                 num_filters,
+                 has_se=False,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 multi_scale_output=True,
+                 name=None):
+        super(Stage, self).__init__()
+        self._num_modules = num_modules
+        self.stage_func_list = []
+        for i in range(num_modules):
+            if i == num_modules - 1 and not multi_scale_output:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        norm_decay=norm_decay,
+                        freeze_norm=freeze_norm,
+                        multi_scale_output=False,
+                        name=name + '_' + str(i + 1)))
+            else:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        norm_decay=norm_decay,
+                        freeze_norm=freeze_norm,
+                        name=name + '_' + str(i + 1)))
+            self.stage_func_list.append(stage_func)
+    def forward(self, input):
+        out = input
+        for idx in range(self._num_modules):
+            out = self.stage_func_list[idx](out)
+        return out
+class HighResolutionModule(nn.Layer):
+    def __init__(self,
+                 num_channels,
+                 num_filters,
+                 has_se=False,
+                 multi_scale_output=True,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 name=None):
+        super(HighResolutionModule, self).__init__()
+        self.branches_func = Branches(
+            block_num=4,
+            in_channels=num_channels,
+            out_channels=num_filters,
+            has_se=has_se,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            name=name)
+        self.fuse_func = FuseLayers(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            multi_scale_output=multi_scale_output,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            name=name)
+    def forward(self, input):
+        out = self.branches_func(input)
+        out = self.fuse_func(out)
+        return out
+class FuseLayers(nn.Layer):
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 multi_scale_output=True,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 name=None):
+        super(FuseLayers, self).__init__()
+        self._actual_ch = len(in_channels) if multi_scale_output else 1
+        self._in_channels = in_channels
+        self.residual_func_list = []
+        for i in range(self._actual_ch):
+            for j in range(len(in_channels)):
+                residual_func = None
+                if j > i:
+                    residual_func = self.add_sublayer(
+                        "residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
+                        ConvNormLayer(
+                            ch_in=in_channels[j],
+                            ch_out=out_channels[i],
+                            filter_size=1,
+                            stride=1,
+                            act=None,
+                            norm_decay=norm_decay,
+                            freeze_norm=freeze_norm,
+                            name=name + '_layer_' + str(i + 1) + '_' +
+                            str(j + 1)))
+                    self.residual_func_list.append(residual_func)
+                elif j < i:
+                    pre_num_filters = in_channels[j]
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(
+                                    name, i + 1, j + 1, k + 1),
+                                ConvNormLayer(
+                                    ch_in=pre_num_filters,
+                                    ch_out=out_channels[i],
+                                    filter_size=3,
+                                    stride=2,
+                                    norm_decay=norm_decay,
+                                    freeze_norm=freeze_norm,
+                                    act=None,
+                                    name=name + '_layer_' + str(i + 1) + '_' +
+                                    str(j + 1) + '_' + str(k + 1)))
+                            pre_num_filters = out_channels[i]
+                        else:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(
+                                    name, i + 1, j + 1, k + 1),
+                                ConvNormLayer(
+                                    ch_in=pre_num_filters,
+                                    ch_out=out_channels[j],
+                                    filter_size=3,
+                                    stride=2,
+                                    norm_decay=norm_decay,
+                                    freeze_norm=freeze_norm,
+                                    act="relu",
+                                    name=name + '_layer_' + str(i + 1) + '_' +
+                                    str(j + 1) + '_' + str(k + 1)))
+                            pre_num_filters = out_channels[j]
+                        self.residual_func_list.append(residual_func)
+    def forward(self, input):
+        outs = []
+        residual_func_idx = 0
+        for i in range(self._actual_ch):
+            residual = input[i]
+            for j in range(len(self._in_channels)):
+                if j > i:
+                    y = self.residual_func_list[residual_func_idx](input[j])
+                    residual_func_idx += 1
+                    y = F.interpolate(y, scale_factor=2**(j - i))
+                    residual = paddle.add(x=residual, y=y)
+                elif j < i:
+                    y = input[j]
+                    for k in range(i - j):
+                        y = self.residual_func_list[residual_func_idx](y)
+                        residual_func_idx += 1
+                    residual = paddle.add(x=residual, y=y)
+            residual = F.relu(residual)
+            outs.append(residual)
+        return outs
+@register
+class HRNet(nn.Layer):
+    """
+    HRNet, see https://arxiv.org/abs/1908.07919
+    Args:
+        width (int): the width of HRNet
+        has_se (bool): whether to add SE block for each stage
+        freeze_at (int): the stage to freeze
+        freeze_norm (bool): whether to freeze norm in HRNet
+        norm_decay (float): weight decay for normalization layer weights
+        return_idx (List): the stage to return
+        upsample (bool): whether to upsample and concat the backbone feats
+    """
+    def __init__(self,
+                 width=18,
+                 has_se=False,
+                 freeze_at=0,
+                 freeze_norm=True,
+                 norm_decay=0.,
+                 return_idx=[0, 1, 2, 3],
+                 upsample=False):
+        super(HRNet, self).__init__()
+        self.width = width
+        self.has_se = has_se
+        if isinstance(return_idx, Integral):
+            return_idx = [return_idx]
+        assert len(return_idx) > 0, "need one or more return index"
+        self.freeze_at = freeze_at
+        self.return_idx = return_idx
+        self.upsample = upsample
+        self.channels = {
+            18: [[18, 36], [18, 36, 72], [18, 36, 72, 144]],
+            30: [[30, 60], [30, 60, 120], [30, 60, 120, 240]],
+            32: [[32, 64], [32, 64, 128], [32, 64, 128, 256]],
+            40: [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
+            44: [[44, 88], [44, 88, 176], [44, 88, 176, 352]],
+            48: [[48, 96], [48, 96, 192], [48, 96, 192, 384]],
+            60: [[60, 120], [60, 120, 240], [60, 120, 240, 480]],
+            64: [[64, 128], [64, 128, 256], [64, 128, 256, 512]]
+        }
+        channels_2, channels_3, channels_4 = self.channels[width]
+        num_modules_2, num_modules_3, num_modules_4 = 1, 4, 3
+        self._out_channels = [sum(channels_4)] if self.upsample else channels_4
+        self._out_strides = [4] if self.upsample else [4, 8, 16, 32]
+        self.conv_layer1_1 = ConvNormLayer(
+            ch_in=3,
+            ch_out=64,
+            filter_size=3,
+            stride=2,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            act='relu',
+            name="layer1_1")
+        self.conv_layer1_2 = ConvNormLayer(
+            ch_in=64,
+            ch_out=64,
+            filter_size=3,
+            stride=2,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            act='relu',
+            name="layer1_2")
+        self.la1 = Layer1(
+            num_channels=64,
+            has_se=has_se,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            name="layer2")
+        self.tr1 = TransitionLayer(
+            in_channels=[256],
+            out_channels=channels_2,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            name="tr1")
+        self.st2 = Stage(
+            num_channels=channels_2,
+            num_modules=num_modules_2,
+            num_filters=channels_2,
+            has_se=self.has_se,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            name="st2")
+        self.tr2 = TransitionLayer(
+            in_channels=channels_2,
+            out_channels=channels_3,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            name="tr2")
+        self.st3 = Stage(
+            num_channels=channels_3,
+            num_modules=num_modules_3,
+            num_filters=channels_3,
+            has_se=self.has_se,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            name="st3")
+        self.tr3 = TransitionLayer(
+            in_channels=channels_3,
+            out_channels=channels_4,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            name="tr3")
+        self.st4 = Stage(
+            num_channels=channels_4,
+            num_modules=num_modules_4,
+            num_filters=channels_4,
+            has_se=self.has_se,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            multi_scale_output=len(return_idx) > 1,
+            name="st4")
+    def forward(self, inputs):
+        x = inputs['image']
+        conv1 = self.conv_layer1_1(x)
+        conv2 = self.conv_layer1_2(conv1)
+        la1 = self.la1(conv2)
+        tr1 = self.tr1([la1])
+        st2 = self.st2(tr1)
+        tr2 = self.tr2(st2)
+        st3 = self.st3(tr2)
+        tr3 = self.tr3(st3)
+        st4 = self.st4(tr3)
+        if self.upsample:
+            # Upsampling
+            x0_h, x0_w = st4[0].shape[2:4]
+            x1 = F.upsample(st4[1], size=(x0_h, x0_w), mode='bilinear')
+            x2 = F.upsample(st4[2], size=(x0_h, x0_w), mode='bilinear')
+            x3 = F.upsample(st4[3], size=(x0_h, x0_w), mode='bilinear')
+            x = paddle.concat([st4[0], x1, x2, x3], 1)
+            return x
+        res = []
+        for i, layer in enumerate(st4):
+            if i == self.freeze_at:
+                layer.stop_gradient = True
+            if i in self.return_idx:
+                res.append(layer)
+        return res
+    def out_shape(self):
+        if self.upsample:
+            self.return_idx = [0]
+        return [
+            ShapeSpec(
+                channels=self._out_channels[i], stride=self._out_strides[i])
+            for i in self.return_idx
+        ]
--- a/tutorials/pp-series/HRNet-Keypoint/lib/models/keypoint_hrnet.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/models/keypoint_hrnet.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle
+import paddle.nn as nn
+from paddle.nn.initializer import Normal, Constant
+import numpy as np
+import math
+import cv2
+from ..utils.keypoint_utils import transform_preds
+from ..utils.workspace import register, create
+__all__ = ['TopDownHRNet']
+class BaseArch(nn.Layer):
+    def __init__(self):
+        super(BaseArch, self).__init__()
+        self.inputs = {}
+        self.fuse_norm = False
+    def load_meanstd(self, cfg_transform):
+        self.scale = 1.
+        self.mean = paddle.to_tensor([0.485, 0.456, 0.406]).reshape(
+            (1, 3, 1, 1))
+        self.std = paddle.to_tensor([0.229, 0.224, 0.225]).reshape(
+            (1, 3, 1, 1))
+        for item in cfg_transform:
+            if 'NormalizeImage' in item:
+                self.mean = paddle.to_tensor(item['NormalizeImage'][
+                    'mean']).reshape((1, 3, 1, 1))
+                self.std = paddle.to_tensor(item['NormalizeImage'][
+                    'std']).reshape((1, 3, 1, 1))
+                if item['NormalizeImage'].get('is_scale', True):
+                    self.scale = 1. / 255.
+                break
+    def forward(self, inputs):
+        if self.fuse_norm:
+            image = inputs['image']
+            self.inputs['image'] = (image * self.scale - self.mean) / self.std
+            self.inputs['im_shape'] = inputs['im_shape']
+            self.inputs['scale_factor'] = inputs['scale_factor']
+        else:
+            self.inputs = inputs
+        self.model_arch()
+        if self.training:
+            out = self.get_loss()
+        else:
+            out = self.get_pred()
+        return out
+    def build_inputs(self, data, input_def):
+        inputs = {}
+        for i, k in enumerate(input_def):
+            inputs[k] = data[i]
+        return inputs
+    def model_arch(self, ):
+        pass
+    def get_loss(self, ):
+        raise NotImplementedError("Should implement get_loss method!")
+    def get_pred(self, ):
+        raise NotImplementedError("Should implement get_pred method!")
+def Conv2d(in_channels,
+           out_channels,
+           kernel_size,
+           stride=1,
+           padding=0,
+           dilation=1,
+           groups=1,
+           bias=True,
+           weight_init=Normal(std=0.001),
+           bias_init=Constant(0.)):
+    weight_attr = paddle.framework.ParamAttr(initializer=weight_init)
+    if bias:
+        bias_attr = paddle.framework.ParamAttr(initializer=bias_init)
+    else:
+        bias_attr = False
+    conv = nn.Conv2D(
+        in_channels,
+        out_channels,
+        kernel_size,
+        stride,
+        padding,
+        dilation,
+        groups,
+        weight_attr=weight_attr,
+        bias_attr=bias_attr)
+    return conv
+@register
+class TopDownHRNet(BaseArch):
+    __category__ = 'architecture'
+    __inject__ = ['loss']
+    def __init__(self,
+                 width,
+                 num_joints,
+                 backbone='HRNet',
+                 loss='KeyPointMSELoss',
+                 post_process='HRNetPostProcess',
+                 flip_perm=None,
+                 flip=True,
+                 shift_heatmap=True,
+                 use_dark=True):
+        """
+        HRNet network, see https://arxiv.org/abs/1902.09212
+        Args:
+            backbone (nn.Layer): backbone instance
+            post_process (object): `HRNetPostProcess` instance
+            flip_perm (list): The left-right joints exchange order list
+            use_dark(bool): Whether to use DARK in post processing
+        """
+        super(TopDownHRNet, self).__init__()
+        self.backbone = backbone
+        self.post_process = HRNetPostProcess(use_dark)
+        self.loss = loss
+        self.flip_perm = flip_perm
+        self.flip = flip
+        self.final_conv = Conv2d(width, num_joints, 1, 1, 0, bias=True)
+        self.shift_heatmap = shift_heatmap
+        self.deploy = False
+    @classmethod
+    def from_config(cls, cfg, *args, **kwargs):
+        # backbone
+        backbone = create(cfg['backbone'])
+        return {'backbone': backbone, }
+    def _forward(self):
+        output = dict()
+        feats = self.backbone(self.inputs)
+        output["feats"] = feats
+        hrnet_outputs = self.final_conv(feats[0])
+        output["output"] = hrnet_outputs
+        if self.training:
+            loss = self.loss(hrnet_outputs, self.inputs)
+            output["loss"] = loss
+            return output
+        elif self.deploy:
+            outshape = hrnet_outputs.shape
+            max_idx = paddle.argmax(
+                hrnet_outputs.reshape(
+                    (outshape[0], outshape[1], outshape[2] * outshape[3])),
+                axis=-1)
+            return hrnet_outputs, max_idx
+        else:
+            if self.flip:
+                self.inputs['image'] = self.inputs['image'].flip([3])
+                feats = self.backbone(self.inputs)
+                output_flipped = self.final_conv(feats[0])
+                output_flipped = self.flip_back(output_flipped.numpy(),
+                                                self.flip_perm)
+                output_flipped = paddle.to_tensor(output_flipped.copy())
+                if self.shift_heatmap:
+                    output_flipped[:, :, :, 1:] = output_flipped.clone(
+                    )[:, :, :, 0:-1]
+                hrnet_outputs = (hrnet_outputs + output_flipped) * 0.5
+            imshape = (self.inputs['im_shape'].numpy()
+                       )[:, ::-1] if 'im_shape' in self.inputs else None
+            center = self.inputs['center'].numpy(
+            ) if 'center' in self.inputs else np.round(imshape / 2.)
+            scale = self.inputs['scale'].numpy(
+            ) if 'scale' in self.inputs else imshape / 200.
+            outputs = self.post_process(hrnet_outputs, center, scale)
+            return outputs
+    def get_loss(self):
+        return self._forward()
+    def get_pred(self):
+        res_lst = self._forward()
+        outputs = {'keypoint': res_lst}
+        return outputs
+    def flip_back(self, output_flipped, matched_parts):
+        assert output_flipped.ndim == 4,\
+                'output_flipped should be [batch_size, num_joints, height, width]'
+        output_flipped = output_flipped[:, :, :, ::-1]
+        for pair in matched_parts:
+            tmp = output_flipped[:, pair[0], :, :].copy()
+            output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
+            output_flipped[:, pair[1], :, :] = tmp
+        return output_flipped
+class HRNetPostProcess(object):
+    def __init__(self, use_dark=True):
+        self.use_dark = use_dark
+    def get_max_preds(self, heatmaps):
+        '''get predictions from score maps
+        Args:
+            heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
+        Returns:
+            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
+            maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints
+        '''
+        assert isinstance(heatmaps,
+                          np.ndarray), 'heatmaps should be numpy.ndarray'
+        assert heatmaps.ndim == 4, 'batch_images should be 4-ndim'
+        batch_size = heatmaps.shape[0]
+        num_joints = heatmaps.shape[1]
+        width = heatmaps.shape[3]
+        heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1))
+        idx = np.argmax(heatmaps_reshaped, 2)
+        maxvals = np.amax(heatmaps_reshaped, 2)
+        maxvals = maxvals.reshape((batch_size, num_joints, 1))
+        idx = idx.reshape((batch_size, num_joints, 1))
+        preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
+        preds[:, :, 0] = (preds[:, :, 0]) % width
+        preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
+        pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
+        pred_mask = pred_mask.astype(np.float32)
+        preds *= pred_mask
+        return preds, maxvals
+    def gaussian_blur(self, heatmap, kernel):
+        border = (kernel - 1) // 2
+        batch_size = heatmap.shape[0]
+        num_joints = heatmap.shape[1]
+        height = heatmap.shape[2]
+        width = heatmap.shape[3]
+        for i in range(batch_size):
+            for j in range(num_joints):
+                origin_max = np.max(heatmap[i, j])
+                dr = np.zeros((height + 2 * border, width + 2 * border))
+                dr[border:-border, border:-border] = heatmap[i, j].copy()
+                dr = cv2.GaussianBlur(dr, (kernel, kernel), 0)
+                heatmap[i, j] = dr[border:-border, border:-border].copy()
+                heatmap[i, j] *= origin_max / np.max(heatmap[i, j])
+        return heatmap
+    def dark_parse(self, hm, coord):
+        heatmap_height = hm.shape[0]
+        heatmap_width = hm.shape[1]
+        px = int(coord[0])
+        py = int(coord[1])
+        if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2:
+            dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1])
+            dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px])
+            dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2])
+            dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \
+                + hm[py-1][px-1])
+            dyy = 0.25 * (
+                hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px])
+            derivative = np.matrix([[dx], [dy]])
+            hessian = np.matrix([[dxx, dxy], [dxy, dyy]])
+            if dxx * dyy - dxy**2 != 0:
+                hessianinv = hessian.I
+                offset = -hessianinv * derivative
+                offset = np.squeeze(np.array(offset.T), axis=0)
+                coord += offset
+        return coord
+    def dark_postprocess(self, hm, coords, kernelsize):
+        '''DARK postpocessing, Zhang et al. Distribution-Aware Coordinate
+        Representation for Human Pose Estimation (CVPR 2020).
+        '''
+        hm = self.gaussian_blur(hm, kernelsize)
+        hm = np.maximum(hm, 1e-10)
+        hm = np.log(hm)
+        for n in range(coords.shape[0]):
+            for p in range(coords.shape[1]):
+                coords[n, p] = self.dark_parse(hm[n][p], coords[n][p])
+        return coords
+    def get_final_preds(self, heatmaps, center, scale, kernelsize=3):
+        """the highest heatvalue location with a quarter offset in the
+        direction from the highest response to the second highest response.
+        Args:
+            heatmaps (numpy.ndarray): The predicted heatmaps
+            center (numpy.ndarray): The boxes center
+            scale (numpy.ndarray): The scale factor
+        Returns:
+            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
+            maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints
+        """
+        coords, maxvals = self.get_max_preds(heatmaps)
+        heatmap_height = heatmaps.shape[2]
+        heatmap_width = heatmaps.shape[3]
+        if self.use_dark:
+            coords = self.dark_postprocess(heatmaps, coords, kernelsize)
+        else:
+            for n in range(coords.shape[0]):
+                for p in range(coords.shape[1]):
+                    hm = heatmaps[n][p]
+                    px = int(math.floor(coords[n][p][0] + 0.5))
+                    py = int(math.floor(coords[n][p][1] + 0.5))
+                    if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1:
+                        diff = np.array([
+                            hm[py][px + 1] - hm[py][px - 1],
+                            hm[py + 1][px] - hm[py - 1][px]
+                        ])
+                        coords[n][p] += np.sign(diff) * .25
+        preds = coords.copy()
+        # Transform back
+        for i in range(coords.shape[0]):
+            preds[i] = transform_preds(coords[i], center[i], scale[i],
+                                       [heatmap_width, heatmap_height])
+        return preds, maxvals
+    def __call__(self, output, center, scale):
+        preds, maxvals = self.get_final_preds(output.numpy(), center, scale)
+        outputs = [[
+            np.concatenate(
+                (preds, maxvals), axis=-1), np.mean(
+                    maxvals, axis=1)
+        ]]
+        return outputs
--- a/tutorials/pp-series/HRNet-Keypoint/lib/models/lite_hrnet.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/models/lite_hrnet.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from numbers import Integral
+from paddle import ParamAttr
+from paddle.regularizer import L2Decay
+from paddle.nn.initializer import Normal, Constant
+from lib.utils.workspace import register
+from .hrnet import ShapeSpec
+__all__ = ['LiteHRNet']
+def channel_shuffle(x, groups):
+    batch_size, num_channels, height, width = x.shape[0:4]
+    assert num_channels % groups == 0, 'num_channels should be divisible by groups'
+    channels_per_group = num_channels // groups
+    x = paddle.reshape(
+        x=x, shape=[batch_size, groups, channels_per_group, height, width])
+    x = paddle.transpose(x=x, perm=[0, 2, 1, 3, 4])
+    x = paddle.reshape(x=x, shape=[batch_size, num_channels, height, width])
+    return x
+class ConvNormLayer(nn.Layer):
+    def __init__(self,
+                 ch_in,
+                 ch_out,
+                 filter_size,
+                 stride=1,
+                 groups=1,
+                 norm_type=None,
+                 norm_groups=32,
+                 norm_decay=0.,
+                 freeze_norm=False,
+                 act=None):
+        super(ConvNormLayer, self).__init__()
+        self.act = act
+        norm_lr = 0. if freeze_norm else 1.
+        if norm_type is not None:
+            assert norm_type in ['bn', 'sync_bn', 'gn'],\
+                "norm_type should be one of ['bn', 'sync_bn', 'gn'], but got {}".format(norm_type)
+            param_attr = ParamAttr(
+                initializer=Constant(1.0),
+                learning_rate=norm_lr,
+                regularizer=L2Decay(norm_decay), )
+            bias_attr = ParamAttr(
+                learning_rate=norm_lr, regularizer=L2Decay(norm_decay))
+            global_stats = True if freeze_norm else False
+            if norm_type in ['bn', 'sync_bn']:
+                self.norm = nn.BatchNorm(
+                    ch_out,
+                    param_attr=param_attr,
+                    bias_attr=bias_attr,
+                    use_global_stats=global_stats, )
+            elif norm_type == 'gn':
+                self.norm = nn.GroupNorm(
+                    num_groups=norm_groups,
+                    num_channels=ch_out,
+                    weight_attr=param_attr,
+                    bias_attr=bias_attr)
+            norm_params = self.norm.parameters()
+            if freeze_norm:
+                for param in norm_params:
+                    param.stop_gradient = True
+            conv_bias_attr = False
+        else:
+            conv_bias_attr = True
+            self.norm = None
+        self.conv = nn.Conv2D(
+            in_channels=ch_in,
+            out_channels=ch_out,
+            kernel_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=groups,
+            weight_attr=ParamAttr(initializer=Normal(
+                mean=0., std=0.001)),
+            bias_attr=conv_bias_attr)
+    def forward(self, inputs):
+        out = self.conv(inputs)
+        if self.norm is not None:
+            out = self.norm(out)
+        if self.act == 'relu':
+            out = F.relu(out)
+        elif self.act == 'sigmoid':
+            out = F.sigmoid(out)
+        return out
+class DepthWiseSeparableConvNormLayer(nn.Layer):
+    def __init__(self,
+                 ch_in,
+                 ch_out,
+                 filter_size,
+                 stride=1,
+                 dw_norm_type=None,
+                 pw_norm_type=None,
+                 norm_decay=0.,
+                 freeze_norm=False,
+                 dw_act=None,
+                 pw_act=None):
+        super(DepthWiseSeparableConvNormLayer, self).__init__()
+        self.depthwise_conv = ConvNormLayer(
+            ch_in=ch_in,
+            ch_out=ch_in,
+            filter_size=filter_size,
+            stride=stride,
+            groups=ch_in,
+            norm_type=dw_norm_type,
+            act=dw_act,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm, )
+        self.pointwise_conv = ConvNormLayer(
+            ch_in=ch_in,
+            ch_out=ch_out,
+            filter_size=1,
+            stride=1,
+            norm_type=pw_norm_type,
+            act=pw_act,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm, )
+    def forward(self, x):
+        x = self.depthwise_conv(x)
+        x = self.pointwise_conv(x)
+        return x
+class CrossResolutionWeightingModule(nn.Layer):
+    def __init__(self,
+                 channels,
+                 ratio=16,
+                 norm_type='bn',
+                 freeze_norm=False,
+                 norm_decay=0.):
+        super(CrossResolutionWeightingModule, self).__init__()
+        self.channels = channels
+        total_channel = sum(channels)
+        self.conv1 = ConvNormLayer(
+            ch_in=total_channel,
+            ch_out=total_channel // ratio,
+            filter_size=1,
+            stride=1,
+            norm_type=norm_type,
+            act='relu',
+            freeze_norm=freeze_norm,
+            norm_decay=norm_decay)
+        self.conv2 = ConvNormLayer(
+            ch_in=total_channel // ratio,
+            ch_out=total_channel,
+            filter_size=1,
+            stride=1,
+            norm_type=norm_type,
+            act='sigmoid',
+            freeze_norm=freeze_norm,
+            norm_decay=norm_decay)
+    def forward(self, x):
+        mini_size = x[-1].shape[-2:]
+        out = [F.adaptive_avg_pool2d(s, mini_size) for s in x[:-1]] + [x[-1]]
+        out = paddle.concat(out, 1)
+        out = self.conv1(out)
+        out = self.conv2(out)
+        out = paddle.split(out, self.channels, 1)
+        out = [
+            s * F.interpolate(
+                a, s.shape[-2:], mode='nearest') for s, a in zip(x, out)
+        ]
+        return out
+class SpatialWeightingModule(nn.Layer):
+    def __init__(self, in_channel, ratio=16, freeze_norm=False, norm_decay=0.):
+        super(SpatialWeightingModule, self).__init__()
+        self.global_avgpooling = nn.AdaptiveAvgPool2D(1)
+        self.conv1 = ConvNormLayer(
+            ch_in=in_channel,
+            ch_out=in_channel // ratio,
+            filter_size=1,
+            stride=1,
+            act='relu',
+            freeze_norm=freeze_norm,
+            norm_decay=norm_decay)
+        self.conv2 = ConvNormLayer(
+            ch_in=in_channel // ratio,
+            ch_out=in_channel,
+            filter_size=1,
+            stride=1,
+            act='sigmoid',
+            freeze_norm=freeze_norm,
+            norm_decay=norm_decay)
+    def forward(self, x):
+        out = self.global_avgpooling(x)
+        out = self.conv1(out)
+        out = self.conv2(out)
+        return x * out
+class ConditionalChannelWeightingBlock(nn.Layer):
+    def __init__(self,
+                 in_channels,
+                 stride,
+                 reduce_ratio,
+                 norm_type='bn',
+                 freeze_norm=False,
+                 norm_decay=0.):
+        super(ConditionalChannelWeightingBlock, self).__init__()
+        assert stride in [1, 2]
+        branch_channels = [channel // 2 for channel in in_channels]
+        self.cross_resolution_weighting = CrossResolutionWeightingModule(
+            branch_channels,
+            ratio=reduce_ratio,
+            norm_type=norm_type,
+            freeze_norm=freeze_norm,
+            norm_decay=norm_decay)
+        self.depthwise_convs = nn.LayerList([
+            ConvNormLayer(
+                channel,
+                channel,
+                filter_size=3,
+                stride=stride,
+                groups=channel,
+                norm_type=norm_type,
+                freeze_norm=freeze_norm,
+                norm_decay=norm_decay) for channel in branch_channels
+        ])
+        self.spatial_weighting = nn.LayerList([
+            SpatialWeightingModule(
+                channel,
+                ratio=4,
+                freeze_norm=freeze_norm,
+                norm_decay=norm_decay) for channel in branch_channels
+        ])
+    def forward(self, x):
+        x = [s.chunk(2, axis=1) for s in x]
+        x1 = [s[0] for s in x]
+        x2 = [s[1] for s in x]
+        x2 = self.cross_resolution_weighting(x2)
+        x2 = [dw(s) for s, dw in zip(x2, self.depthwise_convs)]
+        x2 = [sw(s) for s, sw in zip(x2, self.spatial_weighting)]
+        out = [paddle.concat([s1, s2], axis=1) for s1, s2 in zip(x1, x2)]
+        out = [channel_shuffle(s, groups=2) for s in out]
+        return out
+class ShuffleUnit(nn.Layer):
+    def __init__(self,
+                 in_channel,
+                 out_channel,
+                 stride,
+                 norm_type='bn',
+                 freeze_norm=False,
+                 norm_decay=0.):
+        super(ShuffleUnit, self).__init__()
+        branch_channel = out_channel // 2
+        self.stride = stride
+        if self.stride == 1:
+            assert in_channel == branch_channel * 2,\
+                "when stride=1, in_channel {} should equal to branch_channel*2 {}".format(in_channel, branch_channel * 2)
+        if stride > 1:
+            self.branch1 = nn.Sequential(
+                ConvNormLayer(
+                    ch_in=in_channel,
+                    ch_out=in_channel,
+                    filter_size=3,
+                    stride=self.stride,
+                    groups=in_channel,
+                    norm_type=norm_type,
+                    freeze_norm=freeze_norm,
+                    norm_decay=norm_decay),
+                ConvNormLayer(
+                    ch_in=in_channel,
+                    ch_out=branch_channel,
+                    filter_size=1,
+                    stride=1,
+                    norm_type=norm_type,
+                    act='relu',
+                    freeze_norm=freeze_norm,
+                    norm_decay=norm_decay), )
+        self.branch2 = nn.Sequential(
+            ConvNormLayer(
+                ch_in=branch_channel if stride == 1 else in_channel,
+                ch_out=branch_channel,
+                filter_size=1,
+                stride=1,
+                norm_type=norm_type,
+                act='relu',
+                freeze_norm=freeze_norm,
+                norm_decay=norm_decay),
+            ConvNormLayer(
+                ch_in=branch_channel,
+                ch_out=branch_channel,
+                filter_size=3,
+                stride=self.stride,
+                groups=branch_channel,
+                norm_type=norm_type,
+                freeze_norm=freeze_norm,
+                norm_decay=norm_decay),
+            ConvNormLayer(
+                ch_in=branch_channel,
+                ch_out=branch_channel,
+                filter_size=1,
+                stride=1,
+                norm_type=norm_type,
+                act='relu',
+                freeze_norm=freeze_norm,
+                norm_decay=norm_decay), )
+    def forward(self, x):
+        if self.stride > 1:
+            x1 = self.branch1(x)
+            x2 = self.branch2(x)
+        else:
+            x1, x2 = x.chunk(2, axis=1)
+            x2 = self.branch2(x2)
+        out = paddle.concat([x1, x2], axis=1)
+        out = channel_shuffle(out, groups=2)
+        return out
+class IterativeHead(nn.Layer):
+    def __init__(self,
+                 in_channels,
+                 norm_type='bn',
+                 freeze_norm=False,
+                 norm_decay=0.):
+        super(IterativeHead, self).__init__()
+        num_branches = len(in_channels)
+        self.in_channels = in_channels[::-1]
+        projects = []
+        for i in range(num_branches):
+            if i != num_branches - 1:
+                projects.append(
+                    DepthWiseSeparableConvNormLayer(
+                        ch_in=self.in_channels[i],
+                        ch_out=self.in_channels[i + 1],
+                        filter_size=3,
+                        stride=1,
+                        dw_act=None,
+                        pw_act='relu',
+                        dw_norm_type=norm_type,
+                        pw_norm_type=norm_type,
+                        freeze_norm=freeze_norm,
+                        norm_decay=norm_decay))
+            else:
+                projects.append(
+                    DepthWiseSeparableConvNormLayer(
+                        ch_in=self.in_channels[i],
+                        ch_out=self.in_channels[i],
+                        filter_size=3,
+                        stride=1,
+                        dw_act=None,
+                        pw_act='relu',
+                        dw_norm_type=norm_type,
+                        pw_norm_type=norm_type,
+                        freeze_norm=freeze_norm,
+                        norm_decay=norm_decay))
+        self.projects = nn.LayerList(projects)
+    def forward(self, x):
+        x = x[::-1]
+        y = []
+        last_x = None
+        for i, s in enumerate(x):
+            if last_x is not None:
+                last_x = F.interpolate(
+                    last_x,
+                    size=s.shape[-2:],
+                    mode='bilinear',
+                    align_corners=True)
+                s = s + last_x
+            s = self.projects[i](s)
+            y.append(s)
+            last_x = s
+        return y[::-1]
+class Stem(nn.Layer):
+    def __init__(self,
+                 in_channel,
+                 stem_channel,
+                 out_channel,
+                 expand_ratio,
+                 norm_type='bn',
+                 freeze_norm=False,
+                 norm_decay=0.):
+        super(Stem, self).__init__()
+        self.conv1 = ConvNormLayer(
+            in_channel,
+            stem_channel,
+            filter_size=3,
+            stride=2,
+            norm_type=norm_type,
+            act='relu',
+            freeze_norm=freeze_norm,
+            norm_decay=norm_decay)
+        mid_channel = int(round(stem_channel * expand_ratio))
+        branch_channel = stem_channel // 2
+        if stem_channel == out_channel:
+            inc_channel = out_channel - branch_channel
+        else:
+            inc_channel = out_channel - stem_channel
+        self.branch1 = nn.Sequential(
+            ConvNormLayer(
+                ch_in=branch_channel,
+                ch_out=branch_channel,
+                filter_size=3,
+                stride=2,
+                groups=branch_channel,
+                norm_type=norm_type,
+                freeze_norm=freeze_norm,
+                norm_decay=norm_decay),
+            ConvNormLayer(
+                ch_in=branch_channel,
+                ch_out=inc_channel,
+                filter_size=1,
+                stride=1,
+                norm_type=norm_type,
+                act='relu',
+                freeze_norm=freeze_norm,
+                norm_decay=norm_decay), )
+        self.expand_conv = ConvNormLayer(
+            ch_in=branch_channel,
+            ch_out=mid_channel,
+            filter_size=1,
+            stride=1,
+            norm_type=norm_type,
+            act='relu',
+            freeze_norm=freeze_norm,
+            norm_decay=norm_decay)
+        self.depthwise_conv = ConvNormLayer(
+            ch_in=mid_channel,
+            ch_out=mid_channel,
+            filter_size=3,
+            stride=2,
+            groups=mid_channel,
+            norm_type=norm_type,
+            freeze_norm=freeze_norm,
+            norm_decay=norm_decay)
+        self.linear_conv = ConvNormLayer(
+            ch_in=mid_channel,
+            ch_out=branch_channel
+            if stem_channel == out_channel else stem_channel,
+            filter_size=1,
+            stride=1,
+            norm_type=norm_type,
+            act='relu',
+            freeze_norm=freeze_norm,
+            norm_decay=norm_decay)
+    def forward(self, x):
+        x = self.conv1(x)
+        x1, x2 = x.chunk(2, axis=1)
+        x1 = self.branch1(x1)
+        x2 = self.expand_conv(x2)
+        x2 = self.depthwise_conv(x2)
+        x2 = self.linear_conv(x2)
+        out = paddle.concat([x1, x2], axis=1)
+        out = channel_shuffle(out, groups=2)
+        return out
+class LiteHRNetModule(nn.Layer):
+    def __init__(self,
+                 num_branches,
+                 num_blocks,
+                 in_channels,
+                 reduce_ratio,
+                 module_type,
+                 multiscale_output=False,
+                 with_fuse=True,
+                 norm_type='bn',
+                 freeze_norm=False,
+                 norm_decay=0.):
+        super(LiteHRNetModule, self).__init__()
+        assert num_branches == len(in_channels),\
+            "num_branches {} should equal to num_in_channels {}".format(num_branches, len(in_channels))
+        assert module_type in ['LITE', 'NAIVE'],\
+            "module_type should be one of ['LITE', 'NAIVE']"
+        self.num_branches = num_branches
+        self.in_channels = in_channels
+        self.multiscale_output = multiscale_output
+        self.with_fuse = with_fuse
+        self.norm_type = 'bn'
+        self.module_type = module_type
+        if self.module_type == 'LITE':
+            self.layers = self._make_weighting_blocks(
+                num_blocks,
+                reduce_ratio,
+                freeze_norm=freeze_norm,
+                norm_decay=norm_decay)
+        elif self.module_type == 'NAIVE':
+            self.layers = self._make_naive_branches(
+                num_branches,
+                num_blocks,
+                freeze_norm=freeze_norm,
+                norm_decay=norm_decay)
+        if self.with_fuse:
+            self.fuse_layers = self._make_fuse_layers(
+                freeze_norm=freeze_norm, norm_decay=norm_decay)
+            self.relu = nn.ReLU()
+    def _make_weighting_blocks(self,
+                               num_blocks,
+                               reduce_ratio,
+                               stride=1,
+                               freeze_norm=False,
+                               norm_decay=0.):
+        layers = []
+        for i in range(num_blocks):
+            layers.append(
+                ConditionalChannelWeightingBlock(
+                    self.in_channels,
+                    stride=stride,
+                    reduce_ratio=reduce_ratio,
+                    norm_type=self.norm_type,
+                    freeze_norm=freeze_norm,
+                    norm_decay=norm_decay))
+        return nn.Sequential(*layers)
+    def _make_naive_branches(self,
+                             num_branches,
+                             num_blocks,
+                             freeze_norm=False,
+                             norm_decay=0.):
+        branches = []
+        for branch_idx in range(num_branches):
+            layers = []
+            for i in range(num_blocks):
+                layers.append(
+                    ShuffleUnit(
+                        self.in_channels[branch_idx],
+                        self.in_channels[branch_idx],
+                        stride=1,
+                        norm_type=self.norm_type,
+                        freeze_norm=freeze_norm,
+                        norm_decay=norm_decay))
+            branches.append(nn.Sequential(*layers))
+        return nn.LayerList(branches)
+    def _make_fuse_layers(self, freeze_norm=False, norm_decay=0.):
+        if self.num_branches == 1:
+            return None
+        fuse_layers = []
+        num_out_branches = self.num_branches if self.multiscale_output else 1
+        for i in range(num_out_branches):
+            fuse_layer = []
+            for j in range(self.num_branches):
+                if j > i:
+                    fuse_layer.append(
+                        nn.Sequential(
+                            nn.Conv2D(
+                                self.in_channels[j],
+                                self.in_channels[i],
+                                kernel_size=1,
+                                stride=1,
+                                padding=0,
+                                weight_attr=paddle.ParamAttr(
+                                    initializer=Normal(std=0.001)),
+                                bias_attr=False),
+                            nn.BatchNorm(self.in_channels[i]),
+                            nn.Upsample(
+                                scale_factor=2**(j - i), mode='nearest')))
+                elif j == i:
+                    fuse_layer.append(None)
+                else:
+                    conv_downsamples = []
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            conv_downsamples.append(
+                                nn.Sequential(
+                                    nn.Conv2D(
+                                        self.in_channels[j],
+                                        self.in_channels[j],
+                                        kernel_size=3,
+                                        stride=2,
+                                        padding=1,
+                                        groups=self.in_channels[j],
+                                        weight_attr=paddle.ParamAttr(
+                                            initializer=Normal(std=0.001)),
+                                        bias_attr=False),
+                                    nn.BatchNorm(self.in_channels[j]),
+                                    nn.Conv2D(
+                                        self.in_channels[j],
+                                        self.in_channels[i],
+                                        kernel_size=1,
+                                        stride=1,
+                                        padding=0,
+                                        weight_attr=paddle.ParamAttr(
+                                            initializer=Normal(std=0.001)),
+                                        bias_attr=False),
+                                    nn.BatchNorm(self.in_channels[i])))
+                        else:
+                            conv_downsamples.append(
+                                nn.Sequential(
+                                    nn.Conv2D(
+                                        self.in_channels[j],
+                                        self.in_channels[j],
+                                        kernel_size=3,
+                                        stride=2,
+                                        padding=1,
+                                        groups=self.in_channels[j],
+                                        weight_attr=paddle.ParamAttr(
+                                            initializer=Normal(std=0.001)),
+                                        bias_attr=False),
+                                    nn.BatchNorm(self.in_channels[j]),
+                                    nn.Conv2D(
+                                        self.in_channels[j],
+                                        self.in_channels[j],
+                                        kernel_size=1,
+                                        stride=1,
+                                        padding=0,
+                                        weight_attr=paddle.ParamAttr(
+                                            initializer=Normal(std=0.001)),
+                                        bias_attr=False),
+                                    nn.BatchNorm(self.in_channels[j]),
+                                    nn.ReLU()))
+                    fuse_layer.append(nn.Sequential(*conv_downsamples))
+            fuse_layers.append(nn.LayerList(fuse_layer))
+        return nn.LayerList(fuse_layers)
+    def forward(self, x):
+        if self.num_branches == 1:
+            return [self.layers[0](x[0])]
+        if self.module_type == 'LITE':
+            out = self.layers(x)
+        elif self.module_type == 'NAIVE':
+            for i in range(self.num_branches):
+                x[i] = self.layers[i](x[i])
+            out = x
+        if self.with_fuse:
+            out_fuse = []
+            for i in range(len(self.fuse_layers)):
+                y = out[0] if i == 0 else self.fuse_layers[i][0](out[0])
+                for j in range(self.num_branches):
+                    if j == 0:
+                        y += y
+                    elif i == j:
+                        y += out[j]
+                    else:
+                        y += self.fuse_layers[i][j](out[j])
+                    if i == 0:
+                        out[i] = y
+                out_fuse.append(self.relu(y))
+            out = out_fuse
+        elif not self.multiscale_output:
+            out = [out[0]]
+        return out
+@register
+class LiteHRNet(nn.Layer):
+    """
+    @inproceedings{Yulitehrnet21,
+    title={Lite-HRNet: A Lightweight High-Resolution Network},
+        author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
+        booktitle={CVPR},year={2021}
+    }
+    Args:
+        network_type (str): the network_type should be one of ["lite_18", "lite_30", "naive", "wider_naive"],
+            "naive": Simply combining the shuffle block in ShuffleNet and the highresolution design pattern in HRNet.
+            "wider_naive": Naive network with wider channels in each block.
+            "lite_18": Lite-HRNet-18, which replaces the pointwise convolution in a shuffle block by conditional channel weighting.
+            "lite_30": Lite-HRNet-30, with more blocks compared with Lite-HRNet-18.
+        freeze_at (int): the stage to freeze
+        freeze_norm (bool): whether to freeze norm in HRNet
+        norm_decay (float): weight decay for normalization layer weights
+        return_idx (List): the stage to return
+    """
+    def __init__(self,
+                 network_type,
+                 freeze_at=0,
+                 freeze_norm=True,
+                 norm_decay=0.,
+                 return_idx=[0, 1, 2, 3]):
+        super(LiteHRNet, self).__init__()
+        if isinstance(return_idx, Integral):
+            return_idx = [return_idx]
+        assert network_type in ["lite_18", "lite_30", "naive", "wider_naive"],\
+            "the network_type should be one of [lite_18, lite_30, naive, wider_naive]"
+        assert len(return_idx) > 0, "need one or more return index"
+        self.freeze_at = freeze_at
+        self.freeze_norm = freeze_norm
+        self.norm_decay = norm_decay
+        self.return_idx = return_idx
+        self.norm_type = 'bn'
+        self.module_configs = {
+            "lite_18": {
+                "num_modules": [2, 4, 2],
+                "num_branches": [2, 3, 4],
+                "num_blocks": [2, 2, 2],
+                "module_type": ["LITE", "LITE", "LITE"],
+                "reduce_ratios": [8, 8, 8],
+                "num_channels": [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
+            },
+            "lite_30": {
+                "num_modules": [3, 8, 3],
+                "num_branches": [2, 3, 4],
+                "num_blocks": [2, 2, 2],
+                "module_type": ["LITE", "LITE", "LITE"],
+                "reduce_ratios": [8, 8, 8],
+                "num_channels": [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
+            },
+            "naive": {
+                "num_modules": [2, 4, 2],
+                "num_branches": [2, 3, 4],
+                "num_blocks": [2, 2, 2],
+                "module_type": ["NAIVE", "NAIVE", "NAIVE"],
+                "reduce_ratios": [1, 1, 1],
+                "num_channels": [[30, 60], [30, 60, 120], [30, 60, 120, 240]],
+            },
+            "wider_naive": {
+                "num_modules": [2, 4, 2],
+                "num_branches": [2, 3, 4],
+                "num_blocks": [2, 2, 2],
+                "module_type": ["NAIVE", "NAIVE", "NAIVE"],
+                "reduce_ratios": [1, 1, 1],
+                "num_channels": [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
+            },
+        }
+        self.stages_config = self.module_configs[network_type]
+        self.stem = Stem(3, 32, 32, 1)
+        num_channels_pre_layer = [32]
+        for stage_idx in range(3):
+            num_channels = self.stages_config["num_channels"][stage_idx]
+            setattr(self, 'transition{}'.format(stage_idx),
+                    self._make_transition_layer(num_channels_pre_layer,
+                                                num_channels, self.freeze_norm,
+                                                self.norm_decay))
+            stage, num_channels_pre_layer = self._make_stage(
+                self.stages_config, stage_idx, num_channels, True,
+                self.freeze_norm, self.norm_decay)
+            setattr(self, 'stage{}'.format(stage_idx), stage)
+        self.head_layer = IterativeHead(num_channels_pre_layer, 'bn',
+                                        self.freeze_norm, self.norm_decay)
+    def _make_transition_layer(self,
+                               num_channels_pre_layer,
+                               num_channels_cur_layer,
+                               freeze_norm=False,
+                               norm_decay=0.):
+        num_branches_pre = len(num_channels_pre_layer)
+        num_branches_cur = len(num_channels_cur_layer)
+        transition_layers = []
+        for i in range(num_branches_cur):
+            if i < num_branches_pre:
+                if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
+                    transition_layers.append(
+                        nn.Sequential(
+                            nn.Conv2D(
+                                num_channels_pre_layer[i],
+                                num_channels_pre_layer[i],
+                                kernel_size=3,
+                                stride=1,
+                                padding=1,
+                                groups=num_channels_pre_layer[i],
+                                weight_attr=paddle.ParamAttr(
+                                    initializer=Normal(std=0.001)),
+                                bias_attr=False),
+                            nn.BatchNorm(num_channels_pre_layer[i]),
+                            nn.Conv2D(
+                                num_channels_pre_layer[i],
+                                num_channels_cur_layer[i],
+                                kernel_size=1,
+                                stride=1,
+                                padding=0,
+                                weight_attr=paddle.ParamAttr(
+                                    initializer=Normal(std=0.001)),
+                                bias_attr=False),
+                            nn.BatchNorm(num_channels_cur_layer[i]),
+                            nn.ReLU()))
+                else:
+                    transition_layers.append(None)
+            else:
+                conv_downsamples = []
+                for j in range(i + 1 - num_branches_pre):
+                    conv_downsamples.append(
+                        nn.Sequential(
+                            nn.Conv2D(
+                                num_channels_pre_layer[-1],
+                                num_channels_pre_layer[-1],
+                                groups=num_channels_pre_layer[-1],
+                                kernel_size=3,
+                                stride=2,
+                                padding=1,
+                                weight_attr=paddle.ParamAttr(
+                                    initializer=Normal(std=0.001)),
+                                bias_attr=False),
+                            nn.BatchNorm(num_channels_pre_layer[-1]),
+                            nn.Conv2D(
+                                num_channels_pre_layer[-1],
+                                num_channels_cur_layer[i]
+                                if j == i - num_branches_pre else
+                                num_channels_pre_layer[-1],
+                                kernel_size=1,
+                                stride=1,
+                                padding=0,
+                                weight_attr=paddle.ParamAttr(
+                                    initializer=Normal(std=0.001)),
+                                bias_attr=False),
+                            nn.BatchNorm(num_channels_cur_layer[i]
+                                         if j == i - num_branches_pre else
+                                         num_channels_pre_layer[-1]),
+                            nn.ReLU()))
+                transition_layers.append(nn.Sequential(*conv_downsamples))
+        return nn.LayerList(transition_layers)
+    def _make_stage(self,
+                    stages_config,
+                    stage_idx,
+                    in_channels,
+                    multiscale_output,
+                    freeze_norm=False,
+                    norm_decay=0.):
+        num_modules = stages_config["num_modules"][stage_idx]
+        num_branches = stages_config["num_branches"][stage_idx]
+        num_blocks = stages_config["num_blocks"][stage_idx]
+        reduce_ratio = stages_config['reduce_ratios'][stage_idx]
+        module_type = stages_config['module_type'][stage_idx]
+        modules = []
+        for i in range(num_modules):
+            if not multiscale_output and i == num_modules - 1:
+                reset_multiscale_output = False
+            else:
+                reset_multiscale_output = True
+            modules.append(
+                LiteHRNetModule(
+                    num_branches,
+                    num_blocks,
+                    in_channels,
+                    reduce_ratio,
+                    module_type,
+                    multiscale_output=reset_multiscale_output,
+                    with_fuse=True,
+                    freeze_norm=freeze_norm,
+                    norm_decay=norm_decay))
+            in_channels = modules[-1].in_channels
+        return nn.Sequential(*modules), in_channels
+    def forward(self, inputs):
+        x = inputs['image']
+        x = self.stem(x)
+        y_list = [x]
+        for stage_idx in range(3):
+            x_list = []
+            transition = getattr(self, 'transition{}'.format(stage_idx))
+            for j in range(self.stages_config["num_branches"][stage_idx]):
+                if transition[j] is not None:
+                    if j >= len(y_list):
+                        x_list.append(transition[j](y_list[-1]))
+                    else:
+                        x_list.append(transition[j](y_list[j]))
+                else:
+                    x_list.append(y_list[j])
+            y_list = getattr(self, 'stage{}'.format(stage_idx))(x_list)
+        x = self.head_layer(y_list)
+        res = []
+        for i, layer in enumerate(x):
+            if i == self.freeze_at:
+                layer.stop_gradient = True
+            if i in self.return_idx:
+                res.append(layer)
+        return res
+    def out_shape(self):
+        return [
+            ShapeSpec(
+                channels=self._out_channels[i], stride=self._out_strides[i])
+            for i in self.return_idx
+        ]
--- a/tutorials/pp-series/HRNet-Keypoint/lib/models/loss.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/models/loss.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from itertools import cycle, islice
+from collections import abc
+import paddle
+import paddle.nn as nn
+from lib.utils.workspace import register, serializable
+__all__ = ['KeyPointMSELoss']
+@register
+@serializable
+class KeyPointMSELoss(nn.Layer):
+    def __init__(self, use_target_weight=True, loss_scale=0.5):
+        """
+        KeyPointMSELoss layer
+        Args:
+            use_target_weight (bool): whether to use target weight
+        """
+        super(KeyPointMSELoss, self).__init__()
+        self.criterion = nn.MSELoss(reduction='mean')
+        self.use_target_weight = use_target_weight
+        self.loss_scale = loss_scale
+    def forward(self, output, records):
+        target = records['target']
+        target_weight = records['target_weight']
+        batch_size = output.shape[0]
+        num_joints = output.shape[1]
+        heatmaps_pred = output.reshape(
+            (batch_size, num_joints, -1)).split(num_joints, 1)
+        heatmaps_gt = target.reshape(
+            (batch_size, num_joints, -1)).split(num_joints, 1)
+        loss = 0
+        for idx in range(num_joints):
+            heatmap_pred = heatmaps_pred[idx].squeeze()
+            heatmap_gt = heatmaps_gt[idx].squeeze()
+            if self.use_target_weight:
+                loss += self.loss_scale * self.criterion(
+                    heatmap_pred.multiply(target_weight[:, idx]),
+                    heatmap_gt.multiply(target_weight[:, idx]))
+            else:
+                loss += self.loss_scale * self.criterion(heatmap_pred,
+                                                         heatmap_gt)
+        loss = loss / num_joints
+        return loss
+@register
+@serializable
+class DistMSELoss(nn.Layer):
+    def __init__(self,
+                 use_target_weight=True,
+                 loss_scale=0.5,
+                 key=None,
+                 weight=1.0):
+        super().__init__()
+        self.criterion = nn.MSELoss(reduction='mean')
+        self.use_target_weight = use_target_weight
+        self.loss_scale = loss_scale
+        self.key = key
+        self.weight = weight
+    def forward(self, student_out, teacher_out, records):
+        if self.key is not None:
+            student_out = student_out[self.key]
+            teacher_out = teacher_out[self.key]
+        target_weight = records['target_weight']
+        batch_size = student_out.shape[0]
+        num_joints = student_out.shape[1]
+        heatmaps_pred = student_out.reshape(
+            (batch_size, num_joints, -1)).split(num_joints, 1)
+        heatmaps_gt = teacher_out.reshape(
+            (batch_size, num_joints, -1)).split(num_joints, 1)
+        loss = 0
+        for idx in range(num_joints):
+            heatmap_pred = heatmaps_pred[idx].squeeze()
+            heatmap_gt = heatmaps_gt[idx].squeeze()
+            if self.use_target_weight:
+                loss += self.loss_scale * self.criterion(
+                    heatmap_pred.multiply(target_weight[:, idx]),
+                    heatmap_gt.multiply(target_weight[:, idx]))
+            else:
+                loss += self.loss_scale * self.criterion(heatmap_pred,
+                                                         heatmap_gt)
+        loss = loss / num_joints * self.weight
+        return loss
--- a/tutorials/pp-series/HRNet-Keypoint/lib/slim/__init__.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/slim/__init__.py
+from . import quant
+from .quant import *
+import yaml
+from lib.utils.workspace import load_config, create
+from lib.utils.checkpoint import load_pretrain_weight
+def build_slim_model(cfg, mode='train'):
+    assert cfg.slim == 'QAT', 'Only QAT is supported now'
+    model = create(cfg.architecture)
+    if mode == 'train':
+        load_pretrain_weight(model, cfg.pretrain_weights)
+    slim = create(cfg.slim)
+    cfg['slim_type'] = cfg.slim
+    # TODO: fix quant export model in framework.
+    if mode == 'test' and cfg.slim == 'QAT':
+        slim.quant_config['activation_preprocess_type'] = None
+    cfg['model'] = slim(model)
+    cfg['slim'] = slim
+    if mode != 'train':
+        load_pretrain_weight(cfg['model'], cfg.weights)
+    return cfg
--- a/tutorials/pp-series/HRNet-Keypoint/lib/slim/quant.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/slim/quant.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from paddle.utils import try_import
+from lib.utils.workspace import register, serializable
+from lib.utils.logger import setup_logger
+logger = setup_logger(__name__)
+@register
+@serializable
+class QAT(object):
+    def __init__(self, quant_config, print_model):
+        super(QAT, self).__init__()
+        self.quant_config = quant_config
+        self.print_model = print_model
+    def __call__(self, model):
+        paddleslim = try_import('paddleslim')
+        self.quanter = paddleslim.dygraph.quant.QAT(config=self.quant_config)
+        if self.print_model:
+            logger.info("Model before quant:")
+            logger.info(model)
+        self.quanter.quantize(model)
+        if self.print_model:
+            logger.info("Quantized model:")
+            logger.info(model)
+        return model
+    def save_quantized_model(self, layer, path, input_spec=None, **config):
+        self.quanter.save_quantized_model(
+            model=layer, path=path, input_spec=input_spec, **config)
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/__init__.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/__init__.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import check
+from . import checkpoint
+from . import cli
+from . import download
+from . import env
+from . import logger
+from . import stats
+from . import visualizer
+from . import workspace
+from . import config
+from . import keypoint_utils
+from .workspace import *
+from .visualizer import *
+from .cli import *
+from .download import *
+from .env import *
+from .logger import *
+from .stats import *
+from .checkpoint import *
+from .check import *
+from .config import *
+from .keypoint_utils import *
+__all__ = workspace.__all__ + visualizer.__all__ + cli.__all__ \
+          + download.__all__ + env.__all__ + logger.__all__ \
+          + stats.__all__ + checkpoint.__all__ + check.__all__ \
+          + config.__all__ + keypoint_utils.__all__
\ No newline at end of file
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/check.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/check.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import sys
+import paddle
+import six
+import paddle.version as fluid_version
+from .logger import setup_logger
+logger = setup_logger(__name__)
+__all__ = ['check_gpu', 'check_version', 'check_config']
+def check_gpu(use_gpu):
+    """
+    Log error and exit when set use_gpu=true in paddlepaddle
+    cpu version.
+    """
+    err = "Config use_gpu cannot be set as true while you are " \
+          "using paddlepaddle cpu version ! \nPlease try: \n" \
+          "\t1. Install paddlepaddle-gpu to run model on GPU \n" \
+          "\t2. Set use_gpu as false in config file to run " \
+          "model on CPU"
+    try:
+        if use_gpu and not paddle.is_compiled_with_cuda():
+            logger.error(err)
+            sys.exit(1)
+    except Exception as e:
+        pass
+def check_version(version='2.0'):
+    """
+    Log error and exit when the installed version of paddlepaddle is
+    not satisfied.
+    """
+    err = "PaddlePaddle version {} or higher is required, " \
+          "or a suitable develop version is satisfied as well. \n" \
+          "Please make sure the version is good with your code.".format(version)
+    version_installed = [
+        fluid_version.major, fluid_version.minor, fluid_version.patch,
+        fluid_version.rc
+    ]
+    if version_installed == ['0', '0', '0', '0']:
+        return
+    version_split = version.split('.')
+    length = min(len(version_installed), len(version_split))
+    for i in six.moves.range(length):
+        if version_installed[i] > version_split[i]:
+            return
+        if version_installed[i] < version_split[i]:
+            raise Exception(err)
+def check_config(cfg):
+    """
+    Check the correctness of the configuration file. Log error and exit
+    when Config is not compliant.
+    """
+    err = "'{}' not specified in config file. Please set it in config file."
+    check_list = ['architecture', 'num_classes']
+    try:
+        for var in check_list:
+            if not var in cfg:
+                logger.error(err.format(var))
+                sys.exit(1)
+    except Exception as e:
+        pass
+    if 'log_iter' not in cfg:
+        cfg.log_iter = 20
+    return cfg
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/checkpoint.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/checkpoint.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import errno
+import os
+import time
+import numpy as np
+import paddle
+import paddle.nn as nn
+from .download import get_weights_path
+from .logger import setup_logger
+logger = setup_logger(__name__)
+__all__ = [
+    'is_url', 'load_weight', 'match_state_dict', 'load_pretrain_weight',
+    'save_model'
+]
+def is_url(path):
+    """
+    Whether path is URL.
+    Args:
+        path (string): URL string or not.
+    """
+    return path.startswith('http://') \
+            or path.startswith('https://') \
+            or path.startswith('ppdet://')
+def _get_unique_endpoints(trainer_endpoints):
+    # Sorting is to avoid different environmental variables for each card
+    trainer_endpoints.sort()
+    ips = set()
+    unique_endpoints = set()
+    for endpoint in trainer_endpoints:
+        ip = endpoint.split(":")[0]
+        if ip in ips:
+            continue
+        ips.add(ip)
+        unique_endpoints.add(endpoint)
+    logger.info("unique_endpoints {}".format(unique_endpoints))
+    return unique_endpoints
+def _strip_postfix(path):
+    path, ext = os.path.splitext(path)
+    assert ext in ['', '.pdparams', '.pdopt', '.pdmodel'], \
+            "Unknown postfix {} from weights".format(ext)
+    return path
+def load_weight(model, weight, optimizer=None):
+    if is_url(weight):
+        weight = get_weights_path(weight)
+    path = _strip_postfix(weight)
+    pdparam_path = path + '.pdparams'
+    if not os.path.exists(pdparam_path):
+        raise ValueError("Model pretrain path {} does not "
+                         "exists.".format(pdparam_path))
+    param_state_dict = paddle.load(pdparam_path)
+    model_dict = model.state_dict()
+    model_weight = {}
+    incorrect_keys = 0
+    for key in model_dict.keys():
+        if key in param_state_dict.keys():
+            model_weight[key] = param_state_dict[key]
+        else:
+            logger.info('Unmatched key: {}'.format(key))
+            incorrect_keys += 1
+    assert incorrect_keys == 0, "Load weight {} incorrectly, \
+            {} keys unmatched, please check again.".format(weight,
+                                                           incorrect_keys)
+    logger.info('Finish resuming model weights: {}'.format(pdparam_path))
+    model.set_dict(model_weight)
+    last_epoch = 0
+    if optimizer is not None and os.path.exists(path + '.pdopt'):
+        optim_state_dict = paddle.load(path + '.pdopt')
+        # to solve resume bug, will it be fixed in paddle 2.0
+        for key in optimizer.state_dict().keys():
+            if not key in optim_state_dict.keys():
+                optim_state_dict[key] = optimizer.state_dict()[key]
+        if 'last_epoch' in optim_state_dict:
+            last_epoch = optim_state_dict.pop('last_epoch')
+        optimizer.set_state_dict(optim_state_dict)
+    return last_epoch
+def match_state_dict(model_state_dict, weight_state_dict):
+    """
+    Match between the model state dict and pretrained weight state dict.
+    Return the matched state dict.
+    The method supposes that all the names in pretrained weight state dict are
+    subclass of the names in models`, if the prefix 'backbone.' in pretrained weight
+    keys is stripped. And we could get the candidates for each model key. Then we 
+    select the name with the longest matched size as the final match result. For
+    example, the model state dict has the name of 
+    'backbone.res2.res2a.branch2a.conv.weight' and the pretrained weight as
+    name of 'res2.res2a.branch2a.conv.weight' and 'branch2a.conv.weight'. We
+    match the 'res2.res2a.branch2a.conv.weight' to the model key.
+    """
+    model_keys = sorted(model_state_dict.keys())
+    weight_keys = sorted(weight_state_dict.keys())
+    def match(a, b):
+        if a.startswith('backbone.res5'):
+            # In Faster RCNN, res5 pretrained weights have prefix of backbone, 
+            # however, the corresponding model weights have difficult prefix,
+            # bbox_head.
+            b = b[9:]
+        return a == b or a.endswith("." + b)
+    match_matrix = np.zeros([len(model_keys), len(weight_keys)])
+    for i, m_k in enumerate(model_keys):
+        for j, w_k in enumerate(weight_keys):
+            if match(m_k, w_k):
+                match_matrix[i, j] = len(w_k)
+    max_id = match_matrix.argmax(1)
+    max_len = match_matrix.max(1)
+    max_id[max_len == 0] = -1
+    not_load_weight_name = []
+    for match_idx in range(len(max_id)):
+        if match_idx < len(weight_keys) and max_id[match_idx] == -1:
+            not_load_weight_name.append(weight_keys[match_idx])
+    if len(not_load_weight_name) > 0:
+        logger.info('{} in pretrained weight is not used in the model, '
+                    'and its will not be loaded'.format(not_load_weight_name))
+    matched_keys = {}
+    result_state_dict = {}
+    for model_id, weight_id in enumerate(max_id):
+        if weight_id == -1:
+            continue
+        model_key = model_keys[model_id]
+        weight_key = weight_keys[weight_id]
+        weight_value = weight_state_dict[weight_key]
+        model_value_shape = list(model_state_dict[model_key].shape)
+        if list(weight_value.shape) != model_value_shape:
+            logger.info(
+                'The shape {} in pretrained weight {} is unmatched with '
+                'the shape {} in model {}. And the weight {} will not be '
+                'loaded'.format(weight_value.shape, weight_key,
+                                model_value_shape, model_key, weight_key))
+            continue
+        assert model_key not in result_state_dict
+        result_state_dict[model_key] = weight_value
+        if weight_key in matched_keys:
+            raise ValueError('Ambiguity weight {} loaded, it matches at least '
+                             '{} and {} in the model'.format(
+                                 weight_key, model_key, matched_keys[
+                                     weight_key]))
+        matched_keys[weight_key] = model_key
+    return result_state_dict
+def load_pretrain_weight(model, pretrain_weight):
+    if is_url(pretrain_weight):
+        pretrain_weight = get_weights_path(pretrain_weight)
+    path = _strip_postfix(pretrain_weight)
+    if not (os.path.isdir(path) or os.path.isfile(path) or
+            os.path.exists(path + '.pdparams')):
+        raise ValueError("Model pretrain path `{}` does not exists. "
+                         "If you don't want to load pretrain model, "
+                         "please delete `pretrain_weights` field in "
+                         "config file.".format(path))
+    model_dict = model.state_dict()
+    weights_path = path + '.pdparams'
+    param_state_dict = paddle.load(weights_path)
+    param_state_dict = match_state_dict(model_dict, param_state_dict)
+    model.set_dict(param_state_dict)
+    logger.info('Finish loading model weights: {}'.format(weights_path))
+def save_model(model, optimizer, save_dir, save_name, last_epoch):
+    """
+    save model into disk.
+    Args:
+        model (paddle.nn.Layer): the Layer instalce to save parameters.
+        optimizer (paddle.optimizer.Optimizer): the Optimizer instance to
+            save optimizer states.
+        save_dir (str): the directory to be saved.
+        save_name (str): the path to be saved.
+        last_epoch (int): the epoch index.
+    """
+    if paddle.distributed.get_rank() != 0:
+        return
+    if not os.path.exists(save_dir):
+        os.makedirs(save_dir)
+    save_path = os.path.join(save_dir, save_name)
+    if isinstance(model, nn.Layer):
+        paddle.save(model.state_dict(), save_path + ".pdparams")
+    else:
+        assert isinstance(model,
+                          dict), 'model is not a instance of nn.layer or dict'
+        paddle.save(model, save_path + ".pdparams")
+    state_dict = optimizer.state_dict()
+    state_dict['last_epoch'] = last_epoch
+    paddle.save(state_dict, save_path + ".pdopt")
+    logger.info("Save checkpoint: {}".format(save_dir))
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/cli.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/cli.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from argparse import ArgumentParser, RawDescriptionHelpFormatter
+import yaml
+import re
+from .workspace import get_registered_modules, dump_value
+__all__ = ['ColorTTY', 'ArgsParser']
+class ColorTTY(object):
+    def __init__(self):
+        super(ColorTTY, self).__init__()
+        self.colors = ['red', 'green', 'yellow', 'blue', 'magenta', 'cyan']
+    def __getattr__(self, attr):
+        if attr in self.colors:
+            color = self.colors.index(attr) + 31
+            def color_message(message):
+                return "[{}m{}[0m".format(color, message)
+            setattr(self, attr, color_message)
+            return color_message
+    def bold(self, message):
+        return self.with_code('01', message)
+    def with_code(self, code, message):
+        return "[{}m{}[0m".format(code, message)
+class ArgsParser(ArgumentParser):
+    def __init__(self):
+        super(ArgsParser, self).__init__(
+            formatter_class=RawDescriptionHelpFormatter)
+        self.add_argument("-c", "--config", help="configuration file to use")
+        self.add_argument(
+            "-o", "--opt", nargs='*', help="set configuration options")
+    def parse_args(self, argv=None):
+        args = super(ArgsParser, self).parse_args(argv)
+        assert args.config is not None, \
+            "Please specify --config=configure_file_path."
+        args.opt = self._parse_opt(args.opt)
+        return args
+    def _parse_opt(self, opts):
+        config = {}
+        if not opts:
+            return config
+        for s in opts:
+            s = s.strip()
+            k, v = s.split('=', 1)
+            if '.' not in k:
+                config[k] = yaml.load(v, Loader=yaml.Loader)
+            else:
+                keys = k.split('.')
+                if keys[0] not in config:
+                    config[keys[0]] = {}
+                cur = config[keys[0]]
+                for idx, key in enumerate(keys[1:]):
+                    if idx == len(keys) - 2:
+                        cur[key] = yaml.load(v, Loader=yaml.Loader)
+                    else:
+                        cur[key] = {}
+                        cur = cur[key]
+        return config
+def print_total_cfg(config):
+    modules = get_registered_modules()
+    color_tty = ColorTTY()
+    green = '___{}___'.format(color_tty.colors.index('green') + 31)
+    styled = {}
+    for key in config.keys():
+        if not config[key]:  # empty schema
+            continue
+        if key not in modules and not hasattr(config[key], '__dict__'):
+            styled[key] = config[key]
+            continue
+        elif key in modules:
+            module = modules[key]
+        else:
+            type_name = type(config[key]).__name__
+            if type_name in modules:
+                module = modules[type_name].copy()
+                module.update({
+                    k: v
+                    for k, v in config[key].__dict__.items()
+                    if k in module.schema
+                })
+                key += " ({})".format(type_name)
+        default = module.find_default_keys()
+        missing = module.find_missing_keys()
+        mismatch = module.find_mismatch_keys()
+        extra = module.find_extra_keys()
+        dep_missing = []
+        for dep in module.inject:
+            if isinstance(module[dep], str) and module[dep] != '<value>':
+                if module[dep] not in modules:  # not a valid module
+                    dep_missing.append(dep)
+                else:
+                    dep_mod = modules[module[dep]]
+                    # empty dict but mandatory
+                    if not dep_mod and dep_mod.mandatory():
+                        dep_missing.append(dep)
+        override = list(
+            set(module.keys()) - set(default) - set(extra) - set(dep_missing))
+        replacement = {}
+        for name in set(override + default + extra + mismatch + missing):
+            new_name = name
+            if name in missing:
+                value = "<missing>"
+            else:
+                value = module[name]
+            if name in extra:
+                value = dump_value(value) + " <extraneous>"
+            elif name in mismatch:
+                value = dump_value(value) + " <type mismatch>"
+            elif name in dep_missing:
+                value = dump_value(value) + " <module config missing>"
+            elif name in override and value != '<missing>':
+                mark = green
+                new_name = mark + name
+            replacement[new_name] = value
+        styled[key] = replacement
+    buffer = yaml.dump(styled, default_flow_style=False, default_style='')
+    buffer = (re.sub(r"<missing>", r"[31m<missing>[0m", buffer))
+    buffer = (re.sub(r"<extraneous>", r"[33m<extraneous>[0m", buffer))
+    buffer = (re.sub(r"<type mismatch>", r"[31m<type mismatch>[0m", buffer))
+    buffer = (re.sub(r"<module config missing>",
+                     r"[31m<module config missing>[0m", buffer))
+    buffer = re.sub(r"___(\d+)___(.*?):", r"[\1m\2[0m:", buffer)
+    print(buffer)
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/config/__init__.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/config/__init__.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import yaml_helpers
+from .yaml_helpers import *
+__all__ = yaml_helpers.__all__
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/config/schema.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/config/schema.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+import inspect
+import importlib
+import re
+try:
+    from docstring_parser import parse as doc_parse
+except Exception:
+    def doc_parse(*args):
+        pass
+try:
+    from typeguard import check_type
+except Exception:
+    def check_type(*args):
+        pass
+__all__ = ['SchemaValue', 'SchemaDict', 'SharedConfig', 'extract_schema']
+class SchemaValue(object):
+    def __init__(self, name, doc='', type=None):
+        super(SchemaValue, self).__init__()
+        self.name = name
+        self.doc = doc
+        self.type = type
+    def set_default(self, value):
+        self.default = value
+    def has_default(self):
+        return hasattr(self, 'default')
+class SchemaDict(dict):
+    def __init__(self, **kwargs):
+        super().__init__()
+        self.schema = {}
+        self.strict = False
+        self.doc = ""
+        self.update(kwargs)
+    def __setitem__(self, key, value):
+        # XXX also update regular dict to SchemaDict??
+        if isinstance(value, dict) and key in self and isinstance(self[key],
+                                                                  SchemaDict):
+            self[key].update(value)
+        else:
+            super().__setitem__(key, value)
+    def __missing__(self, key):
+        if self.has_default(key):
+            return self.schema[key].default
+        elif key in self.schema:
+            return self.schema[key]
+        else:
+            raise KeyError(key)
+    def copy(self):
+        newone = SchemaDict()
+        newone.__dict__.update(self.__dict__)
+        newone.update(self)
+        return newone
+    def set_schema(self, key, value):
+        assert isinstance(value, SchemaValue)
+        self.schema[key] = value
+    def set_strict(self, strict):
+        self.strict = strict
+    def has_default(self, key):
+        return key in self.schema and self.schema[key].has_default()
+    def is_default(self, key):
+        if not self.has_default(key):
+            return False
+        if hasattr(self[key], '__dict__'):
+            return True
+        else:
+            return key not in self or self[key] == self.schema[key].default
+    def find_default_keys(self):
+        return [
+            k for k in list(self.keys()) + list(self.schema.keys())
+            if self.is_default(k)
+        ]
+    def mandatory(self):
+        return any([k for k in self.schema.keys() if not self.has_default(k)])
+    def find_missing_keys(self):
+        missing = [
+            k for k in self.schema.keys()
+            if k not in self and not self.has_default(k)
+        ]
+        placeholders = [k for k in self if self[k] in ('<missing>', '<value>')]
+        return missing + placeholders
+    def find_extra_keys(self):
+        return list(set(self.keys()) - set(self.schema.keys()))
+    def find_mismatch_keys(self):
+        mismatch_keys = []
+        for arg in self.schema.values():
+            if arg.type is not None:
+                try:
+                    check_type("{}.{}".format(self.name, arg.name),
+                               self[arg.name], arg.type)
+                except Exception:
+                    mismatch_keys.append(arg.name)
+        return mismatch_keys
+    def validate(self):
+        missing_keys = self.find_missing_keys()
+        if missing_keys:
+            raise ValueError("Missing param for class<{}>: {}".format(
+                self.name, ", ".join(missing_keys)))
+        extra_keys = self.find_extra_keys()
+        if extra_keys and self.strict:
+            raise ValueError("Extraneous param for class<{}>: {}".format(
+                self.name, ", ".join(extra_keys)))
+        mismatch_keys = self.find_mismatch_keys()
+        if mismatch_keys:
+            raise TypeError("Wrong param type for class<{}>: {}".format(
+                self.name, ", ".join(mismatch_keys)))
+class SharedConfig(object):
+    """
+    Representation class for `__shared__` annotations, which work as follows:
+    - if `key` is set for the module in config file, its value will take
+      precedence
+    - if `key` is not set for the module but present in the config file, its
+      value will be used
+    - otherwise, use the provided `default_value` as fallback
+    Args:
+        key: config[key] will be injected
+        default_value: fallback value
+    """
+    def __init__(self, key, default_value=None):
+        super(SharedConfig, self).__init__()
+        self.key = key
+        self.default_value = default_value
+def extract_schema(cls):
+    """
+    Extract schema from a given class
+    Args:
+        cls (type): Class from which to extract.
+    Returns:
+        schema (SchemaDict): Extracted schema.
+    """
+    ctor = cls.__init__
+    # python 2 compatibility
+    if hasattr(inspect, 'getfullargspec'):
+        argspec = inspect.getfullargspec(ctor)
+        annotations = argspec.annotations
+        has_kwargs = argspec.varkw is not None
+    else:
+        argspec = inspect.getfullargspec(ctor)
+        # python 2 type hinting workaround, see pep-3107
+        # however, since `typeguard` does not support python 2, type checking
+        # is still python 3 only for now
+        annotations = getattr(ctor, '__annotations__', {})
+        has_kwargs = argspec.varkw is not None
+    names = [arg for arg in argspec.args if arg != 'self']
+    defaults = argspec.defaults
+    num_defaults = argspec.defaults is not None and len(argspec.defaults) or 0
+    num_required = len(names) - num_defaults
+    docs = cls.__doc__
+    if docs is None and getattr(cls, '__category__', None) == 'op':
+        docs = cls.__call__.__doc__
+    try:
+        docstring = doc_parse(docs)
+    except Exception:
+        docstring = None
+    if docstring is None:
+        comments = {}
+    else:
+        comments = {}
+        for p in docstring.params:
+            match_obj = re.match('^([a-zA-Z_]+[a-zA-Z_0-9]*).*', p.arg_name)
+            if match_obj is not None:
+                comments[match_obj.group(1)] = p.description
+    schema = SchemaDict()
+    schema.name = cls.__name__
+    schema.doc = ""
+    if docs is not None:
+        start_pos = docs[0] == '\n' and 1 or 0
+        schema.doc = docs[start_pos:].split("\n")[0].strip()
+    # XXX handle paddle's weird doc convention
+    if '**' == schema.doc[:2] and '**' == schema.doc[-2:]:
+        schema.doc = schema.doc[2:-2].strip()
+    schema.category = hasattr(cls, '__category__') and getattr(
+        cls, '__category__') or 'module'
+    schema.strict = not has_kwargs
+    schema.pymodule = importlib.import_module(cls.__module__)
+    schema.inject = getattr(cls, '__inject__', [])
+    schema.shared = getattr(cls, '__shared__', [])
+    for idx, name in enumerate(names):
+        comment = name in comments and comments[name] or name
+        if name in schema.inject:
+            type_ = None
+        else:
+            type_ = name in annotations and annotations[name] or None
+        value_schema = SchemaValue(name, comment, type_)
+        if name in schema.shared:
+            assert idx >= num_required, "shared config must have default value"
+            default = defaults[idx - num_required]
+            value_schema.set_default(SharedConfig(name, default))
+        elif idx >= num_required:
+            default = defaults[idx - num_required]
+            value_schema.set_default(default)
+        schema.set_schema(name, value_schema)
+    return schema
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/config/yaml_helpers.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/config/yaml_helpers.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import importlib
+import inspect
+import yaml
+from .schema import SharedConfig
+__all__ = ['serializable', 'Callable']
+def represent_dictionary_order(self, dict_data):
+    return self.represent_mapping('tag:yaml.org,2002:map', dict_data.items())
+def setup_orderdict():
+    from collections import OrderedDict
+    yaml.add_representer(OrderedDict, represent_dictionary_order)
+def _make_python_constructor(cls):
+    def python_constructor(loader, node):
+        if isinstance(node, yaml.SequenceNode):
+            args = loader.construct_sequence(node, deep=True)
+            return cls(*args)
+        else:
+            kwargs = loader.construct_mapping(node, deep=True)
+            try:
+                return cls(**kwargs)
+            except Exception as ex:
+                print("Error when construct {} instance from yaml config".
+                      format(cls.__name__))
+                raise ex
+    return python_constructor
+def _make_python_representer(cls):
+    # python 2 compatibility
+    if hasattr(inspect, 'getfullargspec'):
+        argspec = inspect.getfullargspec(cls)
+    else:
+        argspec = inspect.getfullargspec(cls.__init__)
+    argnames = [arg for arg in argspec.args if arg != 'self']
+    def python_representer(dumper, obj):
+        if argnames:
+            data = {name: getattr(obj, name) for name in argnames}
+        else:
+            data = obj.__dict__
+        if '_id' in data:
+            del data['_id']
+        return dumper.represent_mapping(u'!{}'.format(cls.__name__), data)
+    return python_representer
+def serializable(cls):
+    """
+    Add loader and dumper for given class, which must be
+    "trivially serializable"
+    Args:
+        cls: class to be serialized
+    Returns: cls
+    """
+    yaml.add_constructor(u'!{}'.format(cls.__name__),
+                         _make_python_constructor(cls))
+    yaml.add_representer(cls, _make_python_representer(cls))
+    return cls
+yaml.add_representer(SharedConfig,
+                     lambda d, o: d.represent_data(o.default_value))
+@serializable
+class Callable(object):
+    """
+    Helper to be used in Yaml for creating arbitrary class objects
+    Args:
+        full_type (str): the full module path to target function
+    """
+    def __init__(self, full_type, args=[], kwargs={}):
+        super(Callable, self).__init__()
+        self.full_type = full_type
+        self.args = args
+        self.kwargs = kwargs
+    def __call__(self):
+        if '.' in self.full_type:
+            idx = self.full_type.rfind('.')
+            module = importlib.import_module(self.full_type[:idx])
+            func_name = self.full_type[idx + 1:]
+        else:
+            try:
+                module = importlib.import_module('builtins')
+            except Exception:
+                module = importlib.import_module('__builtin__')
+            func_name = self.full_type
+        func = getattr(module, func_name)
+        return func(*self.args, **self.kwargs)
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/download.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/download.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import os.path as osp
+import sys
+import yaml
+import time
+import shutil
+import requests
+import tqdm
+import hashlib
+import base64
+import binascii
+import tarfile
+import zipfile
+from paddle.utils.download import _get_unique_endpoints
+from .logger import setup_logger
+logger = setup_logger(__name__)
+__all__ = [
+    'get_weights_path',
+    'get_dataset_path',
+    'get_config_path',
+    'download_dataset',
+]
+WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights")
+DATASET_HOME = osp.expanduser("~/.cache/paddle/dataset")
+CONFIGS_HOME = osp.expanduser("~/.cache/paddle/configs")
+# dict of {dataset_name: (download_info, sub_dirs)}
+# download info: [(url, md5sum)]
+DATASETS = {
+    'coco': ([
+        (
+            'http://images.cocodataset.org/zips/train2017.zip',
+            'cced6f7f71b7629ddf16f17bbcfab6b2', ),
+        (
+            'http://images.cocodataset.org/zips/val2017.zip',
+            '442b8da7639aecaf257c1dceb8ba8c80', ),
+        (
+            'http://images.cocodataset.org/annotations/annotations_trainval2017.zip',
+            'f4bbac642086de4f52a3fdda2de5fa2c', ),
+    ], ["annotations", "train2017", "val2017"]),
+}
+DOWNLOAD_RETRY_LIMIT = 3
+PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX = 'https://paddledet.bj.bcebos.com/'
+def parse_url(url):
+    url = url.replace("ppdet://", PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX)
+    return url
+def get_weights_path(url):
+    """Get weights path from WEIGHTS_HOME, if not exists,
+    download it from url.
+    """
+    url = parse_url(url)
+    path, _ = get_path(url, WEIGHTS_HOME)
+    return path
+def get_config_path(url):
+    """Get weights path from CONFIGS_HOME, if not exists,
+    download it from url.
+    """
+    url = parse_url(url)
+    path = map_path(url, CONFIGS_HOME, path_depth=2)
+    if os.path.isfile(path):
+        return path
+    # config file not found, try download
+    # 1. clear configs directory
+    if osp.isdir(CONFIGS_HOME):
+        shutil.rmtree(CONFIGS_HOME)
+    # 2. get url
+    try:
+        from ppdet import __version__ as version
+    except ImportError:
+        version = None
+    cfg_url = "ppdet://configs/{}/configs.tar".format(version) \
+                if version else "ppdet://configs/configs.tar"
+    cfg_url = parse_url(cfg_url)
+    # 3. download and decompress
+    cfg_fullname = _download_dist(cfg_url, osp.dirname(CONFIGS_HOME))
+    _decompress_dist(cfg_fullname)
+    # 4. check config file existing
+    if os.path.isfile(path):
+        return path
+    else:
+        logger.error("Get config {} failed after download, please contact us on " \
+            "https://github.com/PaddlePaddle/PaddleDetection/issues".format(path))
+        sys.exit(1)
+def get_dataset_path(path, annotation, image_dir):
+    """
+    If path exists, return path.
+    Otherwise, get dataset path from DATASET_HOME, if not exists,
+    download it.
+    """
+    if _dataset_exists(path, annotation, image_dir):
+        return path
+    logger.info(
+        "Dataset {} is not valid for reason above, try searching {} or "
+        "downloading dataset...".format(osp.realpath(path), DATASET_HOME))
+    data_name = os.path.split(path.strip().lower())[-1]
+    for name, dataset in DATASETS.items():
+        if data_name == name:
+            logger.debug("Parse dataset_dir {} as dataset "
+                         "{}".format(path, name))
+            if name == 'objects365':
+                raise NotImplementedError(
+                    "Dataset {} is not valid for download automatically. "
+                    "Please apply and download the dataset from "
+                    "https://www.objects365.org/download.html".format(name))
+            data_dir = osp.join(DATASET_HOME, name)
+            if name == 'mot':
+                if osp.exists(path) or osp.exists(data_dir):
+                    return data_dir
+                else:
+                    raise NotImplementedError(
+                        "Dataset {} is not valid for download automatically. "
+                        "Please apply and download the dataset following docs/tutorials/PrepareMOTDataSet.md".
+                        format(name))
+            if name == "spine_coco":
+                if _dataset_exists(data_dir, annotation, image_dir):
+                    return data_dir
+            # For voc, only check dir VOCdevkit/VOC2012, VOCdevkit/VOC2007
+            if name in ['voc', 'fruit', 'roadsign_voc']:
+                exists = True
+                for sub_dir in dataset[1]:
+                    check_dir = osp.join(data_dir, sub_dir)
+                    if osp.exists(check_dir):
+                        logger.info("Found {}".format(check_dir))
+                    else:
+                        exists = False
+                if exists:
+                    return data_dir
+            # voc exist is checked above, voc is not exist here
+            check_exist = name != 'voc' and name != 'fruit' and name != 'roadsign_voc'
+            for url, md5sum in dataset[0]:
+                get_path(url, data_dir, md5sum, check_exist)
+            # voc should create list after download
+            if name == 'voc':
+                create_voc_list(data_dir)
+            return data_dir
+    # not match any dataset in DATASETS
+    raise ValueError(
+        "Dataset {} is not valid and cannot parse dataset type "
+        "'{}' for automaticly downloading, which only supports "
+        "'voc' , 'coco', 'wider_face', 'fruit', 'roadsign_voc' and 'mot' currently".
+        format(path, osp.split(path)[-1]))
+def map_path(url, root_dir, path_depth=1):
+    # parse path after download to decompress under root_dir
+    assert path_depth > 0, "path_depth should be a positive integer"
+    dirname = url
+    for _ in range(path_depth):
+        dirname = osp.dirname(dirname)
+    fpath = osp.relpath(url, dirname)
+    zip_formats = ['.zip', '.tar', '.gz']
+    for zip_format in zip_formats:
+        fpath = fpath.replace(zip_format, '')
+    return osp.join(root_dir, fpath)
+def get_path(url, root_dir, md5sum=None, check_exist=True):
+    """ Download from given url to root_dir.
+    if file or directory specified by url is exists under
+    root_dir, return the path directly, otherwise download
+    from url and decompress it, return the path.
+    url (str): download url
+    root_dir (str): root dir for downloading, it should be
+                    WEIGHTS_HOME or DATASET_HOME
+    md5sum (str): md5 sum of download package
+    """
+    # parse path after download to decompress under root_dir
+    fullpath = map_path(url, root_dir)
+    # For same zip file, decompressed directory name different
+    # from zip file name, rename by following map
+    decompress_name_map = {
+        "VOCtrainval_11-May-2012": "VOCdevkit/VOC2012",
+        "VOCtrainval_06-Nov-2007": "VOCdevkit/VOC2007",
+        "VOCtest_06-Nov-2007": "VOCdevkit/VOC2007",
+        "annotations_trainval": "annotations"
+    }
+    for k, v in decompress_name_map.items():
+        if fullpath.find(k) >= 0:
+            fullpath = osp.join(osp.split(fullpath)[0], v)
+    if osp.exists(fullpath) and check_exist:
+        if not osp.isfile(fullpath) or \
+                _check_exist_file_md5(fullpath, md5sum, url):
+            logger.debug("Found {}".format(fullpath))
+            return fullpath, True
+        else:
+            os.remove(fullpath)
+    fullname = _download_dist(url, root_dir, md5sum)
+    # new weights format which postfix is 'pdparams' not
+    # need to decompress
+    if osp.splitext(fullname)[-1] not in ['.pdparams', '.yml']:
+        _decompress_dist(fullname)
+    return fullpath, False
+def download_dataset(path, dataset=None):
+    if dataset not in DATASETS.keys():
+        logger.error("Unknown dataset {}, it should be "
+                     "{}".format(dataset, DATASETS.keys()))
+        return
+    dataset_info = DATASETS[dataset][0]
+    for info in dataset_info:
+        get_path(info[0], path, info[1], False)
+    logger.debug("Download dataset {} finished.".format(dataset))
+def _dataset_exists(path, annotation, image_dir):
+    """
+    Check if user define dataset exists
+    """
+    if not osp.exists(path):
+        logger.warning("Config dataset_dir {} is not exits, "
+                       "dataset config is not valid".format(path))
+        return False
+    if annotation:
+        annotation_path = osp.join(path, annotation)
+        if not osp.isfile(annotation_path):
+            logger.warning("Config annotation {} is not a "
+                           "file, dataset config is not "
+                           "valid".format(annotation_path))
+            return False
+    if image_dir:
+        image_path = osp.join(path, image_dir)
+        if not osp.isdir(image_path):
+            logger.warning("Config image_dir {} is not a "
+                           "directory, dataset config is not "
+                           "valid".format(image_path))
+            return False
+    return True
+def _download(url, path, md5sum=None):
+    """
+    Download from url, save to path.
+    url (str): download url
+    path (str): download to given path
+    """
+    if not osp.exists(path):
+        os.makedirs(path)
+    fname = osp.split(url)[-1]
+    fullname = osp.join(path, fname)
+    retry_cnt = 0
+    while not (osp.exists(fullname) and _check_exist_file_md5(fullname, md5sum,
+                                                              url)):
+        if retry_cnt < DOWNLOAD_RETRY_LIMIT:
+            retry_cnt += 1
+        else:
+            raise RuntimeError("Download from {} failed. "
+                               "Retry limit reached".format(url))
+        logger.info("Downloading {} from {}".format(fname, url))
+        # NOTE: windows path join may incur \, which is invalid in url
+        if sys.platform == "win32":
+            url = url.replace('\\', '/')
+        req = requests.get(url, stream=True)
+        if req.status_code != 200:
+            raise RuntimeError("Downloading from {} failed with code "
+                               "{}!".format(url, req.status_code))
+        # For protecting download interupted, download to
+        # tmp_fullname firstly, move tmp_fullname to fullname
+        # after download finished
+        tmp_fullname = fullname + "_tmp"
+        total_size = req.headers.get('content-length')
+        with open(tmp_fullname, 'wb') as f:
+            if total_size:
+                for chunk in tqdm.tqdm(
+                        req.iter_content(chunk_size=1024),
+                        total=(int(total_size) + 1023) // 1024,
+                        unit='KB'):
+                    f.write(chunk)
+            else:
+                for chunk in req.iter_content(chunk_size=1024):
+                    if chunk:
+                        f.write(chunk)
+        shutil.move(tmp_fullname, fullname)
+        return fullname
+def _download_dist(url, path, md5sum=None):
+    env = os.environ
+    if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env:
+        trainer_id = int(env['PADDLE_TRAINER_ID'])
+        num_trainers = int(env['PADDLE_TRAINERS_NUM'])
+        if num_trainers <= 1:
+            return _download(url, path, md5sum)
+        else:
+            fname = osp.split(url)[-1]
+            fullname = osp.join(path, fname)
+            lock_path = fullname + '.download.lock'
+            if not osp.isdir(path):
+                os.makedirs(path)
+            if not osp.exists(fullname):
+                from paddle.distributed import ParallelEnv
+                unique_endpoints = _get_unique_endpoints(ParallelEnv()
+                                                         .trainer_endpoints[:])
+                with open(lock_path, 'w'):  # touch    
+                    os.utime(lock_path, None)
+                if ParallelEnv().current_endpoint in unique_endpoints:
+                    _download(url, path, md5sum)
+                    os.remove(lock_path)
+                else:
+                    while os.path.exists(lock_path):
+                        time.sleep(0.5)
+            return fullname
+    else:
+        return _download(url, path, md5sum)
+def _check_exist_file_md5(filename, md5sum, url):
+    # if md5sum is None, and file to check is weights file, 
+    # read md5um from url and check, else check md5sum directly
+    return _md5check_from_url(filename, url) if md5sum is None \
+            and filename.endswith('pdparams') \
+            else _md5check(filename, md5sum)
+def _md5check_from_url(filename, url):
+    # For weights in bcebos URLs, MD5 value is contained
+    # in request header as 'content_md5'
+    req = requests.get(url, stream=True)
+    content_md5 = req.headers.get('content-md5')
+    req.close()
+    if not content_md5 or _md5check(
+            filename,
+            binascii.hexlify(base64.b64decode(content_md5.strip('"'))).decode(
+            )):
+        return True
+    else:
+        return False
+def _md5check(fullname, md5sum=None):
+    if md5sum is None:
+        return True
+    logger.debug("File {} md5 checking...".format(fullname))
+    md5 = hashlib.md5()
+    with open(fullname, 'rb') as f:
+        for chunk in iter(lambda: f.read(4096), b""):
+            md5.update(chunk)
+    calc_md5sum = md5.hexdigest()
+    if calc_md5sum != md5sum:
+        logger.warning("File {} md5 check failed, {}(calc) != "
+                       "{}(base)".format(fullname, calc_md5sum, md5sum))
+        return False
+    return True
+def _decompress(fname):
+    """
+    Decompress for zip and tar file
+    """
+    logger.info("Decompressing {}...".format(fname))
+    # For protecting decompressing interupted,
+    # decompress to fpath_tmp directory firstly, if decompress
+    # successed, move decompress files to fpath and delete
+    # fpath_tmp and remove download compress file.
+    fpath = osp.split(fname)[0]
+    fpath_tmp = osp.join(fpath, 'tmp')
+    if osp.isdir(fpath_tmp):
+        shutil.rmtree(fpath_tmp)
+        os.makedirs(fpath_tmp)
+    if fname.find('tar') >= 0:
+        with tarfile.open(fname) as tf:
+            tf.extractall(path=fpath_tmp)
+    elif fname.find('zip') >= 0:
+        with zipfile.ZipFile(fname) as zf:
+            zf.extractall(path=fpath_tmp)
+    elif fname.find('.txt') >= 0:
+        return
+    else:
+        raise TypeError("Unsupport compress file type {}".format(fname))
+    for f in os.listdir(fpath_tmp):
+        src_dir = osp.join(fpath_tmp, f)
+        dst_dir = osp.join(fpath, f)
+        _move_and_merge_tree(src_dir, dst_dir)
+    shutil.rmtree(fpath_tmp)
+    os.remove(fname)
+def _decompress_dist(fname):
+    env = os.environ
+    if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env:
+        trainer_id = int(env['PADDLE_TRAINER_ID'])
+        num_trainers = int(env['PADDLE_TRAINERS_NUM'])
+        if num_trainers <= 1:
+            _decompress(fname)
+        else:
+            lock_path = fname + '.decompress.lock'
+            from paddle.distributed import ParallelEnv
+            unique_endpoints = _get_unique_endpoints(ParallelEnv()
+                                                     .trainer_endpoints[:])
+            # NOTE(dkp): _decompress_dist always performed after
+            # _download_dist, in _download_dist sub-trainers is waiting
+            # for download lock file release with sleeping, if decompress
+            # prograss is very fast and finished with in the sleeping gap
+            # time, e.g in tiny dataset such as coco_ce, spine_coco, main
+            # trainer may finish decompress and release lock file, so we
+            # only craete lock file in main trainer and all sub-trainer
+            # wait 1s for main trainer to create lock file, for 1s is
+            # twice as sleeping gap, this waiting time can keep all
+            # trainer pipeline in order
+            # **change this if you have more elegent methods**
+            if ParallelEnv().current_endpoint in unique_endpoints:
+                with open(lock_path, 'w'):  # touch    
+                    os.utime(lock_path, None)
+                _decompress(fname)
+                os.remove(lock_path)
+            else:
+                time.sleep(1)
+                while os.path.exists(lock_path):
+                    time.sleep(0.5)
+    else:
+        _decompress(fname)
+def _move_and_merge_tree(src, dst):
+    """
+    Move src directory to dst, if dst is already exists,
+    merge src to dst
+    """
+    if not osp.exists(dst):
+        shutil.move(src, dst)
+    elif osp.isfile(src):
+        shutil.move(src, dst)
+    else:
+        for fp in os.listdir(src):
+            src_fp = osp.join(src, fp)
+            dst_fp = osp.join(dst, fp)
+            if osp.isdir(src_fp):
+                if osp.isdir(dst_fp):
+                    _move_and_merge_tree(src_fp, dst_fp)
+                else:
+                    shutil.move(src_fp, dst_fp)
+            elif osp.isfile(src_fp) and \
+                    not osp.isfile(dst_fp):
+                shutil.move(src_fp, dst_fp)
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/env.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/env.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import random
+import numpy as np
+import paddle
+from paddle.distributed import fleet
+__all__ = ['init_parallel_env', 'set_random_seed', 'init_fleet_env']
+def init_fleet_env(find_unused_parameters=False):
+    strategy = fleet.DistributedStrategy()
+    strategy.find_unused_parameters = find_unused_parameters
+    fleet.init(is_collective=True, strategy=strategy)
+def init_parallel_env():
+    env = os.environ
+    dist = 'PADDLE_TRAINER_ID' in env and 'PADDLE_TRAINERS_NUM' in env
+    if dist:
+        trainer_id = int(env['PADDLE_TRAINER_ID'])
+        local_seed = (99 + trainer_id)
+        random.seed(local_seed)
+        np.random.seed(local_seed)
+    paddle.distributed.init_parallel_env()
+def set_random_seed(seed):
+    paddle.seed(seed)
+    random.seed(seed)
+    np.random.seed(seed)
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/keypoint_utils.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/keypoint_utils.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import cv2
+import numpy as np
+__all__ = [
+    'get_affine_mat_kernel', 'get_affine_transform', 'get_warp_matrix',
+    'rotate_point', 'transpred', 'warp_affine_joints', 'affine_transform',
+    'transform_preds', 'oks_iou', 'oks_nms', 'rescore', 'soft_oks_nms'
+]
+def get_affine_mat_kernel(h, w, s, inv=False):
+    if w < h:
+        w_ = s
+        h_ = int(np.ceil((s / w * h) / 64.) * 64)
+        scale_w = w
+        scale_h = h_ / w_ * w
+    else:
+        h_ = s
+        w_ = int(np.ceil((s / h * w) / 64.) * 64)
+        scale_h = h
+        scale_w = w_ / h_ * h
+    center = np.array([np.round(w / 2.), np.round(h / 2.)])
+    size_resized = (w_, h_)
+    trans = get_affine_transform(
+        center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv)
+    return trans, size_resized
+def get_affine_transform(center,
+                         input_size,
+                         rot,
+                         output_size,
+                         shift=(0., 0.),
+                         inv=False):
+    """Get the affine transform matrix, given the center/scale/rot/output_size.
+    Args:
+        center (np.ndarray[2, ]): Center of the bounding box (x, y).
+        input_size (np.ndarray[2, ]): Size of input feature (width, height).
+        rot (float): Rotation angle (degree).
+        output_size (np.ndarray[2, ]): Size of the destination heatmaps.
+        shift (0-100%): Shift translation ratio wrt the width/height.
+            Default (0., 0.).
+        inv (bool): Option to inverse the affine transform direction.
+            (inv=False: src->dst or inv=True: dst->src)
+    Returns:
+        np.ndarray: The transform matrix.
+    """
+    assert len(center) == 2
+    assert len(output_size) == 2
+    assert len(shift) == 2
+    if not isinstance(input_size, (np.ndarray, list)):
+        input_size = np.array([input_size, input_size], dtype=np.float32)
+    scale_tmp = input_size
+    shift = np.array(shift)
+    src_w = scale_tmp[0]
+    dst_w = output_size[0]
+    dst_h = output_size[1]
+    rot_rad = np.pi * rot / 180
+    src_dir = rotate_point([0., src_w * -0.5], rot_rad)
+    dst_dir = np.array([0., dst_w * -0.5])
+    src = np.zeros((3, 2), dtype=np.float32)
+    src[0, :] = center + scale_tmp * shift
+    src[1, :] = center + src_dir + scale_tmp * shift
+    src[2, :] = _get_3rd_point(src[0, :], src[1, :])
+    dst = np.zeros((3, 2), dtype=np.float32)
+    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
+    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
+    dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
+    if inv:
+        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
+    else:
+        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
+    return trans
+def get_warp_matrix(theta, size_input, size_dst, size_target):
+    """Calculate the transformation matrix under the constraint of unbiased.
+    Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
+    Data Processing for Human Pose Estimation (CVPR 2020).
+    Args:
+        theta (float): Rotation angle in degrees.
+        size_input (np.ndarray): Size of input image [w, h].
+        size_dst (np.ndarray): Size of output image [w, h].
+        size_target (np.ndarray): Size of ROI in input plane [w, h].
+    Returns:
+        matrix (np.ndarray): A matrix for transformation.
+    """
+    theta = np.deg2rad(theta)
+    matrix = np.zeros((2, 3), dtype=np.float32)
+    scale_x = size_dst[0] / size_target[0]
+    scale_y = size_dst[1] / size_target[1]
+    matrix[0, 0] = np.cos(theta) * scale_x
+    matrix[0, 1] = -np.sin(theta) * scale_x
+    matrix[0, 2] = scale_x * (
+        -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
+        np.sin(theta) + 0.5 * size_target[0])
+    matrix[1, 0] = np.sin(theta) * scale_y
+    matrix[1, 1] = np.cos(theta) * scale_y
+    matrix[1, 2] = scale_y * (
+        -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
+        np.cos(theta) + 0.5 * size_target[1])
+    return matrix
+def _get_3rd_point(a, b):
+    """To calculate the affine matrix, three pairs of points are required. This
+    function is used to get the 3rd point, given 2D points a & b.
+    The 3rd point is defined by rotating vector `a - b` by 90 degrees
+    anticlockwise, using b as the rotation center.
+    Args:
+        a (np.ndarray): point(x,y)
+        b (np.ndarray): point(x,y)
+    Returns:
+        np.ndarray: The 3rd point.
+    """
+    assert len(
+        a) == 2, 'input of _get_3rd_point should be point with length of 2'
+    assert len(
+        b) == 2, 'input of _get_3rd_point should be point with length of 2'
+    direction = a - b
+    third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
+    return third_pt
+def rotate_point(pt, angle_rad):
+    """Rotate a point by an angle.
+    Args:
+        pt (list[float]): 2 dimensional point to be rotated
+        angle_rad (float): rotation angle by radian
+    Returns:
+        list[float]: Rotated point.
+    """
+    assert len(pt) == 2
+    sn, cs = np.sin(angle_rad), np.cos(angle_rad)
+    new_x = pt[0] * cs - pt[1] * sn
+    new_y = pt[0] * sn + pt[1] * cs
+    rotated_pt = [new_x, new_y]
+    return rotated_pt
+def transpred(kpts, h, w, s):
+    trans, _ = get_affine_mat_kernel(h, w, s, inv=True)
+    return warp_affine_joints(kpts[..., :2].copy(), trans)
+def warp_affine_joints(joints, mat):
+    """Apply affine transformation defined by the transform matrix on the
+    joints.
+    Args:
+        joints (np.ndarray[..., 2]): Origin coordinate of joints.
+        mat (np.ndarray[3, 2]): The affine matrix.
+    Returns:
+        matrix (np.ndarray[..., 2]): Result coordinate of joints.
+    """
+    joints = np.array(joints)
+    shape = joints.shape
+    joints = joints.reshape(-1, 2)
+    return np.dot(np.concatenate(
+        (joints, joints[:, 0:1] * 0 + 1), axis=1),
+                  mat.T).reshape(shape)
+def affine_transform(pt, t):
+    new_pt = np.array([pt[0], pt[1], 1.]).T
+    new_pt = np.dot(t, new_pt)
+    return new_pt[:2]
+def transform_preds(coords, center, scale, output_size):
+    target_coords = np.zeros(coords.shape)
+    trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1)
+    for p in range(coords.shape[0]):
+        target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
+    return target_coords
+def oks_iou(g, d, a_g, a_d, sigmas=None, in_vis_thre=None):
+    if not isinstance(sigmas, np.ndarray):
+        sigmas = np.array([
+            .26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07,
+            .87, .87, .89, .89
+        ]) / 10.0
+    vars = (sigmas * 2)**2
+    xg = g[0::3]
+    yg = g[1::3]
+    vg = g[2::3]
+    ious = np.zeros((d.shape[0]))
+    for n_d in range(0, d.shape[0]):
+        xd = d[n_d, 0::3]
+        yd = d[n_d, 1::3]
+        vd = d[n_d, 2::3]
+        dx = xd - xg
+        dy = yd - yg
+        e = (dx**2 + dy**2) / vars / ((a_g + a_d[n_d]) / 2 + np.spacing(1)) / 2
+        if in_vis_thre is not None:
+            ind = list(vg > in_vis_thre) and list(vd > in_vis_thre)
+            e = e[ind]
+        ious[n_d] = np.sum(np.exp(-e)) / e.shape[0] if e.shape[0] != 0 else 0.0
+    return ious
+def oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None):
+    """greedily select boxes with high confidence and overlap with current maximum <= thresh
+    rule out overlap >= thresh
+    Args:
+        kpts_db (list): The predicted keypoints within the image
+        thresh (float): The threshold to select the boxes
+        sigmas (np.array): The variance to calculate the oks iou
+            Default: None
+        in_vis_thre (float): The threshold to select the high confidence boxes
+            Default: None
+    Return:
+        keep (list): indexes to keep
+    """
+    if len(kpts_db) == 0:
+        return []
+    scores = np.array([kpts_db[i]['score'] for i in range(len(kpts_db))])
+    kpts = np.array(
+        [kpts_db[i]['keypoints'].flatten() for i in range(len(kpts_db))])
+    areas = np.array([kpts_db[i]['area'] for i in range(len(kpts_db))])
+    order = scores.argsort()[::-1]
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]],
+                          sigmas, in_vis_thre)
+        inds = np.where(oks_ovr <= thresh)[0]
+        order = order[inds + 1]
+    return keep
+def rescore(overlap, scores, thresh, type='gaussian'):
+    assert overlap.shape[0] == scores.shape[0]
+    if type == 'linear':
+        inds = np.where(overlap >= thresh)[0]
+        scores[inds] = scores[inds] * (1 - overlap[inds])
+    else:
+        scores = scores * np.exp(-overlap**2 / thresh)
+    return scores
+def soft_oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None):
+    """greedily select boxes with high confidence and overlap with current maximum <= thresh
+    rule out overlap >= thresh
+    Args:
+        kpts_db (list): The predicted keypoints within the image
+        thresh (float): The threshold to select the boxes
+        sigmas (np.array): The variance to calculate the oks iou
+            Default: None
+        in_vis_thre (float): The threshold to select the high confidence boxes
+            Default: None
+    Return:
+        keep (list): indexes to keep
+    """
+    if len(kpts_db) == 0:
+        return []
+    scores = np.array([kpts_db[i]['score'] for i in range(len(kpts_db))])
+    kpts = np.array(
+        [kpts_db[i]['keypoints'].flatten() for i in range(len(kpts_db))])
+    areas = np.array([kpts_db[i]['area'] for i in range(len(kpts_db))])
+    order = scores.argsort()[::-1]
+    scores = scores[order]
+    # max_dets = order.size
+    max_dets = 20
+    keep = np.zeros(max_dets, dtype=np.intp)
+    keep_cnt = 0
+    while order.size > 0 and keep_cnt < max_dets:
+        i = order[0]
+        oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]],
+                          sigmas, in_vis_thre)
+        order = order[1:]
+        scores = rescore(oks_ovr, scores[1:], thresh)
+        tmp = scores.argsort()[::-1]
+        order = order[tmp]
+        scores = scores[tmp]
+        keep[keep_cnt] = i
+        keep_cnt += 1
+    keep = keep[:keep_cnt]
+    return keep
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/logger.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/logger.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import logging
+import os
+import sys
+import paddle.distributed as dist
+__all__ = ['setup_logger']
+logger_initialized = []
+def setup_logger(name="ppdet", output=None):
+    """
+    Initialize logger and set its verbosity level to INFO.
+    Args:
+        output (str): a file name or a directory to save log. If None, will not save log file.
+            If ends with ".txt" or ".log", assumed to be a file name.
+            Otherwise, logs will be saved to `output/log.txt`.
+        name (str): the root module name of this logger
+    Returns:
+        logging.Logger: a logger
+    """
+    logger = logging.getLogger(name)
+    if name in logger_initialized:
+        return logger
+    logger.setLevel(logging.INFO)
+    logger.propagate = False
+    formatter = logging.Formatter(
+        "[%(asctime)s] %(name)s %(levelname)s: %(message)s",
+        datefmt="%m/%d %H:%M:%S")
+    # stdout logging: master only
+    local_rank = dist.get_rank()
+    if local_rank == 0:
+        ch = logging.StreamHandler(stream=sys.stdout)
+        ch.setLevel(logging.DEBUG)
+        ch.setFormatter(formatter)
+        logger.addHandler(ch)
+    # file logging: all workers
+    if output is not None:
+        if output.endswith(".txt") or output.endswith(".log"):
+            filename = output
+        else:
+            filename = os.path.join(output, "log.txt")
+        if local_rank > 0:
+            filename = filename + ".rank{}".format(local_rank)
+        os.makedirs(os.path.dirname(filename))
+        fh = logging.FileHandler(filename, mode='a')
+        fh.setLevel(logging.DEBUG)
+        fh.setFormatter(logging.Formatter())
+        logger.addHandler(fh)
+    logger_initialized.append(name)
+    return logger
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/stats.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/stats.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import collections
+import numpy as np
+__all__ = ['SmoothedValue', 'TrainingStats']
+class SmoothedValue(object):
+    """Track a series of values and provide access to smoothed values over a
+    window or the global series average.
+    """
+    def __init__(self, window_size=20, fmt=None):
+        if fmt is None:
+            fmt = "{median:.4f} ({avg:.4f})"
+        self.deque = collections.deque(maxlen=window_size)
+        self.fmt = fmt
+        self.total = 0.
+        self.count = 0
+    def update(self, value, n=1):
+        self.deque.append(value)
+        self.count += n
+        self.total += value * n
+    @property
+    def median(self):
+        return np.median(self.deque)
+    @property
+    def avg(self):
+        return np.mean(self.deque)
+    @property
+    def max(self):
+        return np.max(self.deque)
+    @property
+    def value(self):
+        return self.deque[-1]
+    @property
+    def global_avg(self):
+        return self.total / self.count
+    def __str__(self):
+        return self.fmt.format(
+            median=self.median, avg=self.avg, max=self.max, value=self.value)
+class TrainingStats(object):
+    def __init__(self, window_size, delimiter=' '):
+        self.meters = None
+        self.window_size = window_size
+        self.delimiter = delimiter
+    def update(self, stats):
+        if self.meters is None:
+            self.meters = {
+                k: SmoothedValue(self.window_size)
+                for k in stats.keys()
+            }
+        for k, v in self.meters.items():
+            v.update(stats[k].numpy())
+    def get(self, extras=None):
+        stats = collections.OrderedDict()
+        if extras:
+            for k, v in extras.items():
+                stats[k] = v
+        for k, v in self.meters.items():
+            stats[k] = format(v.median, '.6f')
+        return stats
+    def log(self, extras=None):
+        d = self.get(extras)
+        strs = []
+        for k, v in d.items():
+            strs.append("{}: {}".format(k, str(v)))
+        return self.delimiter.join(strs)
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/visualizer.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/visualizer.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import numpy as np
+from PIL import Image, ImageDraw
+import cv2
+import math
+from .logger import setup_logger
+logger = setup_logger(__name__)
+__all__ = ['colormap', 'visualize_results']
+def colormap(rgb=False):
+    """
+    Get colormap
+    The code of this function is copied from https://github.com/facebookresearch/Detectron/blob/main/detectron/utils/colormap.py
+    """
+    color_list = np.array([
+        0.000, 0.447, 0.741, 0.850, 0.325, 0.098, 0.929, 0.694, 0.125, 0.494,
+        0.184, 0.556, 0.466, 0.674, 0.188, 0.301, 0.745, 0.933, 0.635, 0.078,
+        0.184, 0.300, 0.300, 0.300, 0.600, 0.600, 0.600, 1.000, 0.000, 0.000,
+        1.000, 0.500, 0.000, 0.749, 0.749, 0.000, 0.000, 1.000, 0.000, 0.000,
+        0.000, 1.000, 0.667, 0.000, 1.000, 0.333, 0.333, 0.000, 0.333, 0.667,
+        0.000, 0.333, 1.000, 0.000, 0.667, 0.333, 0.000, 0.667, 0.667, 0.000,
+        0.667, 1.000, 0.000, 1.000, 0.333, 0.000, 1.000, 0.667, 0.000, 1.000,
+        1.000, 0.000, 0.000, 0.333, 0.500, 0.000, 0.667, 0.500, 0.000, 1.000,
+        0.500, 0.333, 0.000, 0.500, 0.333, 0.333, 0.500, 0.333, 0.667, 0.500,
+        0.333, 1.000, 0.500, 0.667, 0.000, 0.500, 0.667, 0.333, 0.500, 0.667,
+        0.667, 0.500, 0.667, 1.000, 0.500, 1.000, 0.000, 0.500, 1.000, 0.333,
+        0.500, 1.000, 0.667, 0.500, 1.000, 1.000, 0.500, 0.000, 0.333, 1.000,
+        0.000, 0.667, 1.000, 0.000, 1.000, 1.000, 0.333, 0.000, 1.000, 0.333,
+        0.333, 1.000, 0.333, 0.667, 1.000, 0.333, 1.000, 1.000, 0.667, 0.000,
+        1.000, 0.667, 0.333, 1.000, 0.667, 0.667, 1.000, 0.667, 1.000, 1.000,
+        1.000, 0.000, 1.000, 1.000, 0.333, 1.000, 1.000, 0.667, 1.000, 0.167,
+        0.000, 0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000,
+        0.000, 0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000,
+        0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000,
+        0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000, 0.000,
+        0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000, 0.833,
+        0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.143, 0.143, 0.143, 0.286,
+        0.286, 0.286, 0.429, 0.429, 0.429, 0.571, 0.571, 0.571, 0.714, 0.714,
+        0.714, 0.857, 0.857, 0.857, 1.000, 1.000, 1.000
+    ]).astype(np.float32)
+    color_list = color_list.reshape((-1, 3)) * 255
+    if not rgb:
+        color_list = color_list[:, ::-1]
+    return color_list
+def visualize_results(image,
+                      bbox_res,
+                      keypoint_res,
+                      im_id,
+                      catid2name,
+                      threshold=0.5):
+    """
+    Visualize bbox and mask results
+    """
+    if bbox_res is not None:
+        image = draw_bbox(image, im_id, catid2name, bbox_res, threshold)
+    if keypoint_res is not None:
+        image = draw_pose(image, keypoint_res, threshold)
+    return image
+def draw_bbox(image, im_id, catid2name, bboxes, threshold):
+    """
+    Draw bbox on image
+    """
+    draw = ImageDraw.Draw(image)
+    catid2color = {}
+    color_list = colormap(rgb=True)[:40]
+    for dt in np.array(bboxes):
+        if im_id != dt['image_id']:
+            continue
+        catid, bbox, score = dt['category_id'], dt['bbox'], dt['score']
+        if score < threshold:
+            continue
+        if catid not in catid2color:
+            idx = np.random.randint(len(color_list))
+            catid2color[catid] = color_list[idx]
+        color = tuple(catid2color[catid])
+        # draw bbox
+        if len(bbox) == 4:
+            # draw bbox
+            xmin, ymin, w, h = bbox
+            xmax = xmin + w
+            ymax = ymin + h
+            draw.line(
+                [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
+                 (xmin, ymin)],
+                width=2,
+                fill=color)
+        elif len(bbox) == 8:
+            x1, y1, x2, y2, x3, y3, x4, y4 = bbox
+            draw.line(
+                [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
+                width=2,
+                fill=color)
+            xmin = min(x1, x2, x3, x4)
+            ymin = min(y1, y2, y3, y4)
+        else:
+            logger.error('the shape of bbox must be [M, 4] or [M, 8]!')
+        # draw label
+        text = "{} {:.2f}".format(catid2name[catid], score)
+        tw, th = draw.textsize(text)
+        draw.rectangle(
+            [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color)
+        draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
+    return image
+def save_result(save_path, results, catid2name, threshold):
+    """
+    save result as txt
+    """
+    img_id = int(results["im_id"])
+    with open(save_path, 'w') as f:
+        if "bbox_res" in results:
+            for dt in results["bbox_res"]:
+                catid, bbox, score = dt['category_id'], dt['bbox'], dt['score']
+                if score < threshold:
+                    continue
+                # each bbox result as a line
+                # for rbox: classname score x1 y1 x2 y2 x3 y3 x4 y4
+                # for bbox: classname score x1 y1 w h
+                bbox_pred = '{} {} '.format(catid2name[catid],
+                                            score) + ' '.join(
+                                                [str(e) for e in bbox])
+                f.write(bbox_pred + '\n')
+        elif "keypoint_res" in results:
+            for dt in results["keypoint_res"]:
+                kpts = dt['keypoints']
+                scores = dt['score']
+                keypoint_pred = [img_id, scores, kpts]
+                print(keypoint_pred, file=f)
+        else:
+            print("No valid results found, skip txt save")
+def draw_pose(image,
+              results,
+              visual_thread=0.6,
+              save_name='pose.jpg',
+              save_dir='output',
+              returnimg=False,
+              ids=None):
+    try:
+        import matplotlib.pyplot as plt
+        import matplotlib
+        plt.switch_backend('agg')
+    except Exception as e:
+        logger.error('Matplotlib not found, please install matplotlib.'
+                     'for example: `pip install matplotlib`.')
+        raise e
+    skeletons = np.array([item['keypoints'] for item in results])
+    kpt_nums = 17
+    if len(skeletons) > 0:
+        kpt_nums = int(skeletons.shape[1] / 3)
+    skeletons = skeletons.reshape(-1, kpt_nums, 3)
+    if kpt_nums == 17:  #plot coco keypoint
+        EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7),
+                 (6, 8), (7, 9), (8, 10), (5, 11), (6, 12), (11, 13), (12, 14),
+                 (13, 15), (14, 16), (11, 12)]
+    else:  #plot mpii keypoint
+        EDGES = [(0, 1), (1, 2), (3, 4), (4, 5), (2, 6), (3, 6), (6, 7),
+                 (7, 8), (8, 9), (10, 11), (11, 12), (13, 14), (14, 15),
+                 (8, 12), (8, 13)]
+    NUM_EDGES = len(EDGES)
+    colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
+            [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
+            [170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
+    cmap = matplotlib.cm.get_cmap('hsv')
+    plt.figure()
+    img = np.array(image).astype('float32')
+    color_set = results['colors'] if 'colors' in results else None
+    if 'bbox' in results and ids is None:
+        bboxs = results['bbox']
+        for j, rect in enumerate(bboxs):
+            xmin, ymin, xmax, ymax = rect
+            color = colors[0] if color_set is None else colors[color_set[j] %
+                                                               len(colors)]
+            cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1)
+    canvas = img.copy()
+    for i in range(kpt_nums):
+        for j in range(len(skeletons)):
+            if skeletons[j][i, 2] < visual_thread:
+                continue
+            if ids is None:
+                color = colors[i] if color_set is None else colors[color_set[j]
+                                                                   %
+                                                                   len(colors)]
+            else:
+                color = get_color(ids[j])
+            cv2.circle(
+                canvas,
+                tuple(skeletons[j][i, 0:2].astype('int32')),
+                2,
+                color,
+                thickness=-1)
+    to_plot = cv2.addWeighted(img, 0.3, canvas, 0.7, 0)
+    fig = matplotlib.pyplot.gcf()
+    stickwidth = 2
+    for i in range(NUM_EDGES):
+        for j in range(len(skeletons)):
+            edge = EDGES[i]
+            if skeletons[j][edge[0], 2] < visual_thread or skeletons[j][edge[
+                    1], 2] < visual_thread:
+                continue
+            cur_canvas = canvas.copy()
+            X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]]
+            Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]]
+            mX = np.mean(X)
+            mY = np.mean(Y)
+            length = ((X[0] - X[1])**2 + (Y[0] - Y[1])**2)**0.5
+            angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
+            polygon = cv2.ellipse2Poly((int(mY), int(mX)),
+                                       (int(length / 2), stickwidth),
+                                       int(angle), 0, 360, 1)
+            if ids is None:
+                color = colors[i] if color_set is None else colors[color_set[j]
+                                                                   %
+                                                                   len(colors)]
+            else:
+                color = get_color(ids[j])
+            cv2.fillConvexPoly(cur_canvas, polygon, color)
+            canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0)
+    image = Image.fromarray(canvas.astype('uint8'))
+    plt.close()
+    return image
--- a/tutorials/pp-series/HRNet-Keypoint/lib/utils/workspace.py
+++ b/tutorials/pp-series/HRNet-Keypoint/lib/utils/workspace.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+import importlib
+import os
+import sys
+import yaml
+import collections
+try:
+    collectionsAbc = collections.abc
+except AttributeError:
+    collectionsAbc = collections
+from .config.schema import SchemaDict, SharedConfig, extract_schema
+from .config.yaml_helpers import serializable
+__all__ = [
+    'global_config',
+    'load_config',
+    'merge_config',
+    'get_registered_modules',
+    'create',
+    'register',
+    'serializable',
+    'dump_value',
+]
+def dump_value(value):
+    # XXX this is hackish, but collections.abc is not available in python 2
+    if hasattr(value, '__dict__') or isinstance(value, (dict, tuple, list)):
+        value = yaml.dump(value, default_flow_style=True)
+        value = value.replace('\n', '')
+        value = value.replace('...', '')
+        return "'{}'".format(value)
+    else:
+        # primitive types
+        return str(value)
+class AttrDict(dict):
+    """Single level attribute dict, NOT recursive"""
+    def __init__(self, **kwargs):
+        super(AttrDict, self).__init__()
+        super(AttrDict, self).update(kwargs)
+    def __getattr__(self, key):
+        if key in self:
+            return self[key]
+        raise AttributeError("object has no attribute '{}'".format(key))
+global_config = AttrDict()
+def load_config(file_path):
+    """
+    Load config from file.
+    Args:
+        file_path (str): Path of the config file to be loaded.
+    Returns: global config
+    """
+    _, ext = os.path.splitext(file_path)
+    assert ext in ['.yml', '.yaml'], "only support yaml files for now"
+    # load config from file and merge into global config
+    with open(file_path) as f:
+        cfg = yaml.load(f, Loader=yaml.Loader)
+    cfg['filename'] = os.path.splitext(os.path.split(file_path)[-1])[0]
+    merge_config(cfg)
+    return global_config
+def dict_merge(dct, merge_dct):
+    """ Recursive dict merge. Inspired by :meth:``dict.update()``, instead of
+    updating only top-level keys, dict_merge recurses down into dicts nested
+    to an arbitrary depth, updating keys. The ``merge_dct`` is merged into
+    ``dct``.
+    Args:
+        dct: dict onto which the merge is executed
+        merge_dct: dct merged into dct
+    Returns: dct
+    """
+    for k, v in merge_dct.items():
+        if (k in dct and isinstance(dct[k], dict) and
+                isinstance(merge_dct[k], collectionsAbc.Mapping)):
+            dict_merge(dct[k], merge_dct[k])
+        else:
+            dct[k] = merge_dct[k]
+    return dct
+def merge_config(config, another_cfg=None):
+    """
+    Merge config into global config or another_cfg.
+    Args:
+        config (dict): Config to be merged.
+    Returns: global config
+    """
+    global global_config
+    dct = another_cfg or global_config
+    return dict_merge(dct, config)
+def get_registered_modules():
+    return {
+        k: v
+        for k, v in global_config.items() if isinstance(v, SchemaDict)
+    }
+def make_partial(cls):
+    op_module = importlib.import_module(cls.__op__.__module__)
+    op = getattr(op_module, cls.__op__.__name__)
+    cls.__category__ = getattr(cls, '__category__', None) or 'op'
+    def partial_apply(self, *args, **kwargs):
+        kwargs_ = self.__dict__.copy()
+        kwargs_.update(kwargs)
+        return op(*args, **kwargs_)
+    if getattr(cls, '__append_doc__', True):  # XXX should default to True?
+        if sys.version_info[0] > 2:
+            cls.__doc__ = "Wrapper for `{}` OP".format(op.__name__)
+            cls.__init__.__doc__ = op.__doc__
+            cls.__call__ = partial_apply
+            cls.__call__.__doc__ = op.__doc__
+        else:
+            # XXX work around for python 2
+            partial_apply.__doc__ = op.__doc__
+            cls.__call__ = partial_apply
+    return cls
+def register(cls):
+    """
+    Register a given module class.
+    Args:
+        cls (type): Module class to be registered.
+    Returns: cls
+    """
+    if cls.__name__ in global_config:
+        raise ValueError("Module class already registered: {}".format(
+            cls.__name__))
+    if hasattr(cls, '__op__'):
+        cls = make_partial(cls)
+    global_config[cls.__name__] = extract_schema(cls)
+    return cls
+def create(cls_or_name, **kwargs):
+    """
+    Create an instance of given module class.
+    Args:
+        cls_or_name (type or str): Class of which to create instance.
+    Returns: instance of type `cls_or_name`
+    """
+    assert type(cls_or_name) in [type, str
+                                 ], "should be a class or name of a class"
+    name = type(cls_or_name) == str and cls_or_name or cls_or_name.__name__
+    assert name in global_config and \
+        isinstance(global_config[name], SchemaDict), \
+        "the module {} is not registered".format(name)
+    config = global_config[name]
+    cls = getattr(config.pymodule, name)
+    cls_kwargs = {}
+    cls_kwargs.update(global_config[name])
+    # parse `shared` annoation of registered modules
+    if getattr(config, 'shared', None):
+        for k in config.shared:
+            target_key = config[k]
+            shared_conf = config.schema[k].default
+            assert isinstance(shared_conf, SharedConfig)
+            if target_key is not None and not isinstance(target_key,
+                                                         SharedConfig):
+                continue  # value is given for the module
+            elif shared_conf.key in global_config:
+                # `key` is present in config
+                cls_kwargs[k] = global_config[shared_conf.key]
+            else:
+                cls_kwargs[k] = shared_conf.default_value
+    # parse `inject` annoation of registered modules
+    if getattr(cls, 'from_config', None):
+        cls_kwargs.update(cls.from_config(config, **kwargs))
+    if getattr(config, 'inject', None):
+        for k in config.inject:
+            target_key = config[k]
+            # optional dependency
+            if target_key is None:
+                continue
+            if isinstance(target_key, dict) or hasattr(target_key, '__dict__'):
+                if 'name' not in target_key.keys():
+                    continue
+                inject_name = str(target_key['name'])
+                if inject_name not in global_config:
+                    raise ValueError(
+                        "Missing injection name {} and check it's name in cfg file".
+                        format(k))
+                target = global_config[inject_name]
+                for i, v in target_key.items():
+                    if i == 'name':
+                        continue
+                    target[i] = v
+                if isinstance(target, SchemaDict):
+                    cls_kwargs[k] = create(inject_name)
+            elif isinstance(target_key, str):
+                if target_key not in global_config:
+                    raise ValueError("Missing injection config:", target_key)
+                target = global_config[target_key]
+                if isinstance(target, SchemaDict):
+                    cls_kwargs[k] = create(target_key)
+                elif hasattr(target, '__dict__'):  # serialized object
+                    cls_kwargs[k] = target
+            else:
+                raise ValueError("Unsupported injection type:", target_key)
+    # prevent modification of global config values of reference types
+    # (e.g., list, dict) from within the created module instances
+    #kwargs = copy.deepcopy(kwargs)
+    return cls(**cls_kwargs)
--- a/tutorials/pp-series/HRNet-Keypoint/requirements.txt
+++ b/tutorials/pp-series/HRNet-Keypoint/requirements.txt
+tqdm
+typeguard ; python_version >= '3.4'
+visualdl>=2.1.0 ; python_version <= '3.7'
+opencv-python
+PyYAML
+shapely
+scipy
+terminaltables
+Cython
+pycocotools
+#xtcocotools==1.6 #only for crowdpose
+setuptools>=42.0.0
+lap
+sklearn
+motmetrics
+openpyxl
+cython_bbox
\ No newline at end of file
--- a/tutorials/pp-series/HRNet-Keypoint/tools/eval.py
+++ b/tutorials/pp-series/HRNet-Keypoint/tools/eval.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+# add python path of PadleDetection to sys.path
+parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
+sys.path.insert(0, parent_path)
+# ignore warning log
+import warnings
+warnings.filterwarnings('ignore')
+import paddle
+from lib.utils.workspace import load_config, merge_config
+from lib.utils.check import check_gpu, check_version, check_config
+from lib.slim import build_slim_model
+from lib.utils.cli import ArgsParser
+from lib.core.trainer import Trainer
+from lib.utils.env import init_parallel_env
+from lib.metrics.coco_utils import json_eval_results
+from lib.utils.logger import setup_logger
+logger = setup_logger('eval')
+def parse_args():
+    parser = ArgsParser()
+    parser.add_argument(
+        "--output_eval",
+        default=None,
+        type=str,
+        help="Evaluation directory, default is current directory.")
+    args = parser.parse_args()
+    return args
+def run(FLAGS, cfg):
+    # init parallel environment if nranks > 1
+    init_parallel_env()
+    # build trainer
+    trainer = Trainer(cfg, mode='eval')
+    # load weights
+    trainer.load_weights(cfg.weights)
+    # training
+    trainer.evaluate()
+def main():
+    FLAGS = parse_args()
+    cfg = load_config(FLAGS.config)
+    cfg['output_eval'] = FLAGS.output_eval
+    merge_config(FLAGS.opt)
+    if cfg.use_gpu:
+        paddle.set_device('gpu')
+    else:
+        paddle.set_device('cpu')
+    if 'slim' in cfg:
+        cfg = build_slim_model(cfg, mode='eval')
+    check_config(cfg)
+    check_gpu(cfg.use_gpu)
+    check_version()
+    run(FLAGS, cfg)
+if __name__ == '__main__':
+    main()
--- a/tutorials/pp-series/HRNet-Keypoint/tools/infer.py
+++ b/tutorials/pp-series/HRNet-Keypoint/tools/infer.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+# add python path of PadleDetection to sys.path
+parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
+sys.path.insert(0, parent_path)
+# ignore warning log
+import warnings
+warnings.filterwarnings('ignore')
+import glob
+import paddle
+from lib.utils.workspace import load_config, merge_config
+from lib.slim import build_slim_model
+from lib.core.trainer import Trainer
+from lib.utils.check import check_gpu, check_version, check_config
+from lib.utils.cli import ArgsParser
+from lib.utils.logger import setup_logger
+logger = setup_logger('train')
+def parse_args():
+    parser = ArgsParser()
+    parser.add_argument(
+        "--infer_dir",
+        type=str,
+        default=None,
+        help="Directory for images to perform inference on.")
+    parser.add_argument(
+        "--infer_img",
+        type=str,
+        default=None,
+        help="Image path, has higher priority over --infer_dir")
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default="output",
+        help="Directory for storing the output visualization files.")
+    parser.add_argument(
+        "--draw_threshold",
+        type=float,
+        default=0.5,
+        help="Threshold to reserve the result for visualization.")
+    parser.add_argument(
+        "--use_vdl",
+        type=bool,
+        default=False,
+        help="Whether to record the data to VisualDL.")
+    parser.add_argument(
+        '--vdl_log_dir',
+        type=str,
+        default="vdl_log_dir/image",
+        help='VisualDL logging directory for image.')
+    parser.add_argument(
+        "--save_txt",
+        type=bool,
+        default=False,
+        help="Whether to save inference result in txt.")
+    args = parser.parse_args()
+    return args
+def get_test_images(infer_dir, infer_img):
+    """
+    Get image path list in TEST mode
+    """
+    assert infer_img is not None or infer_dir is not None, \
+        "--infer_img or --infer_dir should be set"
+    assert infer_img is None or os.path.isfile(infer_img), \
+            "{} is not a file".format(infer_img)
+    assert infer_dir is None or os.path.isdir(infer_dir), \
+            "{} is not a directory".format(infer_dir)
+    # infer_img has a higher priority
+    if infer_img and os.path.isfile(infer_img):
+        return [infer_img]
+    images = set()
+    infer_dir = os.path.abspath(infer_dir)
+    assert os.path.isdir(infer_dir), \
+        "infer_dir {} is not a directory".format(infer_dir)
+    exts = ['jpg', 'jpeg', 'png', 'bmp']
+    exts += [ext.upper() for ext in exts]
+    for ext in exts:
+        images.update(glob.glob('{}/*.{}'.format(infer_dir, ext)))
+    images = list(images)
+    assert len(images) > 0, "no image found in {}".format(infer_dir)
+    logger.info("Found {} inference images in total.".format(len(images)))
+    return images
+def run(FLAGS, cfg):
+    # build trainer
+    trainer = Trainer(cfg, mode='test')
+    # load weights
+    trainer.load_weights(cfg.weights)
+    # get inference images
+    images = get_test_images(FLAGS.infer_dir, FLAGS.infer_img)
+    # inference
+    trainer.predict(
+        images,
+        draw_threshold=FLAGS.draw_threshold,
+        output_dir=FLAGS.output_dir,
+        save_txt=FLAGS.save_txt)
+def main():
+    FLAGS = parse_args()
+    cfg = load_config(FLAGS.config)
+    cfg['use_vdl'] = FLAGS.use_vdl
+    cfg['vdl_log_dir'] = FLAGS.vdl_log_dir
+    merge_config(FLAGS.opt)
+    if cfg.use_gpu:
+        paddle.set_device('gpu')
+    else:
+        paddle.set_device('cpu')
+    if 'slim' in cfg:
+        cfg = build_slim_model(cfg, mode='eval')
+    check_config(cfg)
+    check_gpu(cfg.use_gpu)
+    check_version()
+    run(FLAGS, cfg)
+if __name__ == '__main__':
+    main()
--- a/tutorials/pp-series/HRNet-Keypoint/tools/train.py
+++ b/tutorials/pp-series/HRNet-Keypoint/tools/train.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+# add python path of PadleDetection to sys.path
+parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
+sys.path.insert(0, parent_path)
+# ignore warning log
+import warnings
+warnings.filterwarnings('ignore')
+import copy
+import paddle
+from lib.utils.workspace import load_config
+from lib.utils.env import init_parallel_env, set_random_seed, init_fleet_env
+from lib.utils.logger import setup_logger
+from lib.utils.checkpoint import load_pretrain_weight
+from lib.utils.cli import ArgsParser
+from lib.utils.check import check_config, check_gpu, check_version
+from lib.core.trainer import Trainer
+from lib.utils.workspace import create
+from lib.models.loss import DistMSELoss
+from lib.slim import build_slim_model
+logger = setup_logger('train')
+def build_teacher_model(config):
+    model = create(config.architecture)
+    if config.get('pretrain_weights', None):
+        load_pretrain_weight(model, config.pretrain_weights)
+        logger.debug("Load weights {} to start training".format(
+            config.pretrain_weights))
+    if config.get('weights', None):
+        load_pretrain_weight(model, config.weights)
+        logger.debug("Load weights {} to start training".format(
+            config.weights))
+    if config.get("freeze_parameters", True):
+        for param in model.parameters():
+            param.trainable = False
+    model.train()
+    return model
+def build_distill_loss(config):
+    loss_config = copy.deepcopy(config["distill_loss"])
+    name = loss_config.pop("name")
+    dist_loss_class = eval(name)(**loss_config)
+    return dist_loss_class
+def parse_args():
+    parser = ArgsParser()
+    parser.add_argument(
+        "--eval",
+        action='store_true',
+        default=False,
+        help="Whether to perform evaluation in train")
+    parser.add_argument(
+        "--distill_config",
+        default=None,
+        type=str,
+        help="Configuration file of model distillation.")
+    parser.add_argument(
+        "--enable_ce",
+        type=bool,
+        default=False,
+        help="If set True, enable continuous evaluation job."
+        "This flag is only used for internal test.")
+    parser.add_argument(
+        "--use_vdl",
+        type=bool,
+        default=False,
+        help="whether to record the data to VisualDL.")
+    parser.add_argument(
+        '--vdl_log_dir',
+        type=str,
+        default="vdl_log_dir/scalar",
+        help='VisualDL logging directory for scalar.')
+    args = parser.parse_args()
+    return args
+def run(FLAGS, cfg):
+    init_parallel_env()
+    if FLAGS.enable_ce:
+        set_random_seed(0)
+    # build trainer
+    trainer = Trainer(cfg, mode='train')
+    # load weights
+    if 'pretrain_weights' in cfg and cfg.pretrain_weights:
+        trainer.load_weights(cfg.pretrain_weights)
+    # init config
+    if FLAGS.distill_config is not None:
+        distill_config = load_config(FLAGS.distill_config)
+        trainer.distill_model = build_teacher_model(distill_config)
+        trainer.distill_loss = build_distill_loss(distill_config)
+    trainer.init_optimizer()
+    # training
+    trainer.train(FLAGS.eval)
+def main():
+    FLAGS = parse_args()
+    cfg = load_config(FLAGS.config)
+    cfg['eval'] = FLAGS.eval
+    cfg['enable_ce'] = FLAGS.enable_ce
+    cfg['distill_config'] = FLAGS.distill_config
+    cfg['use_vdl'] = FLAGS.use_vdl
+    cfg['vdl_log_dir'] = FLAGS.vdl_log_dir
+    cfg['distill_config'] = FLAGS.distill_config
+    if cfg.use_gpu:
+        paddle.set_device('gpu')
+    else:
+        paddle.set_device('cpu')
+    if 'slim' in cfg:
+        cfg = build_slim_model(cfg)
+    check_config(cfg)
+    check_gpu(cfg.use_gpu)
+    check_version()
+    run(FLAGS, cfg)
+if __name__ == "__main__":
+    main()
--- a/tutorials/pp-series/README.md
+++ b/tutorials/pp-series/README.md
-# 飞桨训推一体全流程（TIPC）开发文档
+# 产业级SOTA模型优化指南
-## 1. TIPC简介
+## 1. 背景
-飞桨除了基本的模型训练和预测，还提供了支持多端多平台的高性能推理部署工具。飞桨训推一体全流程（Training and Inference Pipeline Criterion(TIPC)）旨在建立模型从学术研究到产业落地的桥梁，方便模型更广泛的使用。
+在基于深度学习的视觉任务中，许多论文中SOTA模型所需的计算量都很大，预测耗时很长。实际场景部署过程中，希望模型大小尽可能小，模型速度尽可能快，因此需要对模型进行轻量化，实现在模型预测速度明显加速的情况下，尽量减少模型精度损失的目标，最终打造产业级的SOTA模型。
+模型优化包含模型速度与精度优化，既有针对不同任务的通用优化方案，也有与任务紧密相关的模型优化方案，结构框图如下所示。
 <div align="center">
-    <img src="images/tipc_guide.png" width="800">
+<img src="images/lite_model_framework.png"  width = "1000" />
 </div>
-## 2. 不同环境不同训练推理方式的开发文档
+## 2. 通用模型轻量化方法
- [Linux GPU/CPU 基础训练推理开发文档](./train_infer_python/README.md)
+通用模型轻量化方法需要完成下面三个部分的内容，包括轻量化骨干网络、知识蒸馏、模型量化，最终将复现的模型打造为轻量化模型。
- 更多训练方式开发文档
-    - [Linux GPU 多机多卡训练推理开发文档](./train_fleet_infer_python/README.md)
+<div align="center">
-    - [Linux GPU 混合精度训练推理开发文档](./train_amp_infer_python/README.md)
+<img src="images/general_lite_model_pipeline.png"  width = "300" />
+</div>
- 更多部署方式开发文档
-    - [Linux GPU/CPU PYTHON 服务化部署开发文档](./serving_python/README.md)
+更多内容请参考：[通用模型轻量化方案指南](general_lite_model_optimization.md)。
-    - [Linux GPU/CPU C++ 服务化部署开发文档](./serving_cpp/README.md)
-    - [Linux GPU/CPU C++ 推理开发文档](./infer_cpp/README.md)
-    - [Paddle.js 部署开发文档](./paddlejs/README.md)
+## 3. 模型定制优化
-    - [Paddle2ONNX 开发文档](./paddle2onnx/README.md)
-    - ARM CPU 部署开发文档 (coming soon)
+CV任务应用场景广泛，下面分方向介绍更多的模型优化策略。在对轻量化模型进行精度优化时，可以参考。
-    - OpenCL ARM GPU 部署开发文档 (coming soon)
-    - Metal ARM GPU 部署开发文档 (coming soon)
+### 3.1 图像分类
-    - Jetson 部署开发文档 (coming soon)
-    - XPU 部署开发文档 (coming soon)
+* 论文[Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/abs/1812.01187)中，介绍了关于ResNet50系列模型的改进，包括`bottleneck结构改进`、`Cosine学习率策略`、`Mixup数据增广`，最终将ResNet系列模型在ImageNet1k数据集上的精度从76.5%提升至79.2%。更多实现细节可以参考[ResNet50_vd.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml)。
- 更多训练环境开发文档
+### 3.2 目标检测
-    - Linux XPU2 基础训练推理开发文档 (coming soon)
-    - Linux DCU 基础训练推理开发文档 (coming soon)
+* PP-YOLOv2算法中，通过对数据、模型结构、训练策略进行优化，最终精度超过YOLOv5等SOTA模型。具体地，在数据增强方面，引入了Mixup、AutoAugment方法；在模型结构方面，引入了SPP、PAN特征融合模块以及Mish激活函数，在损失函数方面，引入了IOU aware loss；在后处理方面，引入Matrix NMS；训练过程中，使用EMA，对模型权重进行指数滑动平均，提升收敛效果。更多关于PP-YOLOv2的优化技巧可以参考：[检测模型优化思路：PP-YOLOv2](./det_ppyolov2_optimization.md)。
-    - Linux NPU 基础训练推理开发文档 (coming soon)
-    - Windows GPU 基础训练推理开发文档 (coming soon)
+### 3.3 图像分割
-    - macOS CPU 基础训练推理开发文档 (coming soon)
+### 3.4 视频识别
+### 3.5 文本检测与识别
+### 3.6 对抗生成网络
--- a/tutorials/pp-series/det_ppyolov2_optimization.md
+++ b/tutorials/pp-series/det_ppyolov2_optimization.md
+# PP-YOLOv2优化技巧
+目标检测是计算机视觉领域中最常用的任务之一，除了零售商品以及行人车辆的检测，也有大量在工业生产中的应用：例如生产质检领域、设备巡检领域、厂房安全检测等领域。疫情期间，目标检测技术还被用于人脸口罩的检测，新冠肺炎检测等。同时目标检测技术整体流程较为复杂，对于不同的任务需要进行相应调整，因此目标检测模型的不断更新迭代拥有巨大的实用价值。
+百度飞桨于2021年推出了业界顶尖的目标检测模型[PP-YOLOv2](https://arxiv.org/abs/2104.10419)，它是基于[YOLOv3](https://arxiv.org/abs/1804.02767)的优化模型，在尽可能不引入额外计算量的前提下提升模型精度。PP-YOLOv2(R50）在COCO 2017数据集mAP达到49.5%，在 640x640 的输入尺寸下，FPS 达到 68.9FPS，采用 TensorRT 加速，FPS 高达 106.5。PP-YOLOv2（R101）的mAP达到50.3%，对比当前最好的YOLOv5模型，相同的推理速度下，精度提升1.3%；相同精度下，推理速度加速15.9%。本章节重点围绕目标检测任务的优化技巧，并重点解读PP-YOLOv2模型的优化历程。
+<div align="center">
+  <img src="https://ai-studio-static-online.cdn.bcebos.com/232ff7ff4918482399c5f9276dc18bb4151918c4552143ca976e46cf10935a9b" width='800'/>
+</div>
+## 1. 目标检测算法优化思路
+算法优化总体可以分为如下三部分，首先需要对目标场景进行详细分析，并充分掌握模型需求，例如模型体积、精度速度要求等。有效的前期分析有助于制定清晰合理的算法优化目标，并指导接下来高效的算法调研和迭代实验，避免出现尝试大量优化方法但是无法满足模型最终要求的情况。具体来说，调研和实验可以应用到数据模块、模型结构、训练策略三大模块。这个方法普遍的适用于深度学习模型优化，下面重点以目标检测领域为例，详细展开以上三部分的优化思路。
+<div align="center">
+  <img src="https://ai-studio-static-online.cdn.bcebos.com/25bfeaa9206e406aa0c406f0dcfabb85c20409b51f08401488f4007612d71fc6" width='800'/>
+</div>
+### 1.1 数据模块
+数据模块可以说是深度学习领域最重要的一环，在产业应用中，数据往往都是自定义采集的，数据量相比开源数据集规模较小，因此高质量的标注数据和不断迭代是模型优化的一大利器。数据采集方面，少数精标数据的效果会优于大量粗标或者无标注数据，制定清晰明确的标注标准并覆盖尽可能全面的场景也是十分必要的。而在成本允许的情况下，数据是多多益善的。在学术研究中，通常是对固定的公开数据集进行迭代，此时数据模块的优化主要在数据增广方面，例如颜色、翻转、随机扩充、随机裁剪，以及近些年使用较多的[MixUp](https://paddlepedia.readthedocs.io/en/latest/tutorials/computer_vision/image_augmentation/ImageAugment.html#mixup), [AutoAugment](https://paddlepedia.readthedocs.io/en/latest/tutorials/computer_vision/image_augmentation/ImageAugment.html#autoaugment), Mosaic等方法。可以将不同的数据增广方法组合以提升模型泛化能力，需要注意的是，过多的数据增广可能会使模型学习能力降低，也使得数据加载模块耗时过长导致训练迭代效率降低；另外在目标检测任务中，部分数据增广方法可能会影响真实标注框的坐标位置，需要做出相应的调整。
+<div align="center">
+  <img src="https://ai-studio-static-online.cdn.bcebos.com/d153ee4cc94b4a4f8f2ebf86b76d18ab3b60ec6855ae4c4da2db66da1c4a79c5" width='800'/>
+  <center><br>AutoAugment数据增广</br></center>
+</div>
+### 1.2 模型结构
+模型结构方面存在一系列通用的优化方案，例如损失函数优化和特征提取优化，focal loss，IoU loss等损失函数优化能够在不影响推理速度的同时提升模型精度；[SPP](https://arxiv.org/abs/1406.4729)能够在几乎不增加预测耗时的情况下加强模型多尺度特征。
+此外还需要在清晰的优化目标的基础上，作出针对性的优化。对于云端和边缘部署的场景，模型结构的设计选择不同，如下表所示。
+| 场景      | 特点     | 模型结构建议 |
+|:---------:|:------------------:|:------------:|
+| 云端场景    | 算力充足，保证效果        | 倾向于使用ResNet系列骨干网络，并引入少量的可变形卷积实现引入少量计算量的同时提升模型特征提取能力。|
+| 边缘端部署场景 | 算力和功耗相对云端较低，内存较小 | 倾向于使用[MobileNet](https://arxiv.org/abs/1704.04861)系列轻量级骨干网络，同时将模型中较为耗时的卷积替换为[深度可分离卷积](https://paddlepedia.readthedocs.io/en/latest/tutorials/CNN/convolution_operator/Separable_Convolution.html?highlight=%E6%B7%B1%E5%BA%A6%E5%8F%AF%E5%88%86%E7%A6%BB%E5%8D%B7%E7%A7%AF#id4)，将[反卷积](https://paddlepedia.readthedocs.io/en/latest/tutorials/CNN/convolution_operator/Transpose_Convolution.html?highlight=%E5%8F%8D%E5%8D%B7%E7%A7%AF)替换为插值的方式。 |
+<div align="center">
+  <img src="https://ai-studio-static-online.cdn.bcebos.com/86c783f67ecf439e9eab67f145bc7321b2828f68ef9f44efaff0e6be25a9985f" width='800'/>
+</div>
+除此之外，还有其他落地难点问题需要在模型结构中作出相应的调整及优化，例如小目标问题，可以选择大感受野的[HRNet](https://arxiv.org/pdf/1904.04514.pdf), [DLA](https://arxiv.org/pdf/1707.06484.pdf)等网络结构作为骨干网络；并将普通卷积替换为[空洞卷积](https://paddlepedia.readthedocs.io/en/latest/tutorials/CNN/convolution_operator/Dilated_Convolution.html?highlight=%E7%A9%BA%E6%B4%9E%E5%8D%B7%E7%A7%AF)。对于数据长尾问题，可以在损失函数部分抑制样本较多的类别，提升样本较少类别的重要性。
+### 1.3 训练策略
+在模型迭代优化的过程中，会引入不同的优化模块，可能导致模型训练不稳定，为此需要改进训练策略，加强模型训练稳定性，同时提升模型收敛效果。所有训练策略的调整并不会对预测速度造成损失。例如调整优化器、学习率等训练参数，Synchronized batch normalization(卡间同步批归一化)能够扩充batch信息，使得网络获取更多输入信息，EMA（Exponential Moving Average）通过滑动平均的方式更新参数，避免异常值对参数的影响。在实际应用场景中，由于数据量有限，可以使用预先在COCO数据集上训练好的模型进行迁移学习，能够大幅提升模型精度。该模块中的优化策略也能够通用的提升不同计算机视觉任务模型效果。
+以上优化技巧集成了多种算法结构及优化模块，需要大量的代码开发，而且不同优化技巧之间需要相互组合，对代码的模块化设计有较大挑战，接下来介绍飞桨推出的一套端到端目标检测开发套件PaddleDetection。
+下面结合代码具体讲解如何使用PaddleDetection将YOLOv3模型一步步优化成为业界SOTA的PP-YOLOv2模型
+## 2. PP-YOLO优化及代码实践
+YOLO系列模型一直以其高性价比保持着很高的使用率和关注度，近年来关于YOLO系列模型上的优化和拓展的研究越来越多，其中有[YOLOv4](https://arxiv.org/abs/2004.10934),YOLOv5,旷视发布的[YOLOX](https://arxiv.org/abs/2107.08430)系列模型，它们整合了计算机视觉的state-of-the-art技巧，大幅提升了YOLO目标检测性能。百度飞桨通过自研的目标检测框架PaddleDetection，对YOLOv3进行细致优化，在尽可能不引入额外计算量的前提下提升模型精度，今年推出了高精度低时延的PP-YOLOv2模型。下面分别从数据增广、骨干网络、Neck&head结构、损失函数、后处理优化、训练策略几个维度详细展开。
+### 2.1 数据增广
+PP-YOLOv2中采用了大量数据增广方式，这里逐一进行说明
+#### 2.1.1 MixUp
+[MixUp](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L1574)以随机权重对图片和标签进行线性插值，在目标检测任务中标签向量gt_bbox，gt_class，is_crowd等直接连接，gt_score进行加权求和。Mixup可以提高网络在空间上的抗干扰能力，线性插值的权重满足Beta分布，表达式如下：
+$$
+\widetilde x = \lambda x_i + (1 - \lambda)x_j,\\
+\widetilde y = \lambda y_i + (1 - \lambda)y_j \\
+\lambda\in[0,1]
+$$
+以下图为例，将任意两张图片加权叠加作为输入。
+<div align=center>
+    <img src="https://raw.githubusercontent.com/mls1999725/pictures/master/Mixup.png" alt="Mixup" style="zoom: 80%;"/>
+</div>
+#### 2.1.2 RandomDistort
+[RandomDistort](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L329)操作以一定的概率对图像进行随机像素内容变换，包括色相（hue），饱和度（saturation），对比度（contrast），明亮度（brightness）。
+#### 2.1.3 RandomExpand
+[随机扩展](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L875)（RandomExpand）图像的操作步骤如下：
+- 随机选取扩张比例（扩张比例大于1时才进行扩张）。
+- 计算扩张后图像大小。
+- 初始化像素值为输入填充值的图像，并将原图像随机粘贴于该图像上。
+- 根据原图像粘贴位置换算出扩张后真实标注框的位置坐标。
+#### 2.1.4 RandomCrop
+[随机裁剪](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L1182)（RandomCrop）图像的操作步骤如下：
+- 若allow_no_crop为True，则在thresholds加入’no_crop’。
+- 随机打乱thresholds。
+- 遍历thresholds中各元素： (1) 如果当前thresh为’no_crop’，则返回原始图像和标注信息。 (2) 随机取出aspect_ratio和scaling中的值并由此计算出候选裁剪区域的高、宽、起始点。 (3) 计算真实标注框与候选裁剪区域IoU，若全部真实标注框的IoU都小于thresh，则继续第（3）步。 (4) 如果cover_all_box为True且存在真实标注框的IoU小于thresh，则继续第（3）步。 (5) 筛选出位于候选裁剪区域内的真实标注框，若有效框的个数为0，则继续第（3）步，否则进行第（4）步。
+- 换算有效真值标注框相对候选裁剪区域的位置坐标。
+#### 2.1.5 RandomFlip
+[随机翻转](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L487)（RandomFlip）操作利用随机值决定是否对图像，真实标注框位置进行翻转。
+以上数据增广方式均在[ppyolov2_reader.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_reader.yml#L5)进行配置
+### 2.2 骨干网络
+不同于YOLOv3的DarkNet53骨干网络，PP-YOLOv2使用更加优异的ResNet50vd-DCN作为模型的骨干网络。它可以被分为ResNet50vd和DCN两部分来看。ResNet50vd是指拥有50个卷积层的ResNet-D网络。ResNet结构如下图所示：
+<div align=center>
+    <img src="https://raw.githubusercontent.com/mls1999725/pictures/master/ResNet-A.png" alt="ResNet-A" style="zoom: 50%;"/>
+</div>
+ResNet系列模型在2015年提出后，其模型结构不断被业界开发者持续改进，在经过了B、C、D三个版本的改进后，最新的ResNetvd结构能在基本不增加计算量的情况下显著提高模型精度。ResNetvd的第一个卷积层由三个卷积构成，卷积核尺寸均是3x3，步长分别为2，1，1，取代了上图的7x7卷积，在参数量基本不变的情况下增加网络深度。同时，ResNet-D在ResNet-B的基础上，在下采样模块加入了步长为2的2x2平均池化层，并将之后的卷积步长修改为1，避免了输入信息被忽略的情况。B、C、D三种结构的演化如下图所示：
+<div align=center>
+    <img src="https://raw.githubusercontent.com/mls1999725/pictures/master/resnet结构.png" alt="resnet结构" style="zoom: 50%;"/>
+</div>
+ResNetvd下采样模块代码参考实现：[代码链接](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/backbones/resnet.py#L265)
+ResNetvd使用方式参考[ResNetvd配置](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L13)
+```
+ResNet:
+  depth: 50              # ResNet网络深度
+  variant: d             # ResNet变种结构，d即表示ResNetvd
+  return_idx: [1, 2, 3]  # 骨干网络引出feature map层级
+  dcn_v2_stages: [3]     # 引入可变形卷积层级
+  freeze_at: -1          # 不更新参数的层级
+  freeze_norm: false     # 是否不更新归一化层
+  norm_decay: 0.         # 归一化层对应的正则化系数
+```
+经多次实验发现，使用ResNet50vd结构作为骨干网络，相比于原始的ResNet，可以提高1%-2%的目标检测精度，且推理速度基本保持不变。而DCN（Deformable Convolution）可变形卷积的特点在于：其卷积核在每一个元素上额外增加了一个可学习的偏移参数。这样的卷积核在学习过程中可以调整卷积的感受野，从而能够更好的提取图像特征，以达到提升目标检测精度的目的。但它会在一定程度上引入额外的计算开销。经过多翻尝试，发现只在ResNet的最后一个stage增加可变形卷积，是实现引入极少计算量并提升模型精度的最佳策略。
+可变形卷积的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/layers.py#L41)如下：
+```python
+from paddle.vision.ops import DeformConv2D
+class DeformableConvV2(nn.Layer):
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 kernel_size,
+                 stride=1,
+                 padding=0,
+                 dilation=1,
+                 groups=1,
+                 weight_attr=None,
+                 bias_attr=None,
+                 regularizer=None,
+                 skip_quant=False,
+                 dcn_bias_regularizer=L2Decay(0.),
+                 dcn_bias_lr_scale=2.):
+        super().__init__()
+        self.offset_channel = 2 * kernel_size**2
+        self.mask_channel = kernel_size**2
+        offset_bias_attr = ParamAttr(
+            initializer=Constant(0.),
+            learning_rate=lr_scale,
+            regularizer=regularizer)
+        self.conv_offset = nn.Conv2D(
+            in_channels,
+            3 * kernel_size**2,
+            kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2,
+            weight_attr=ParamAttr(initializer=Constant(0.0)),
+            bias_attr=offset_bias_attr)
+        if bias_attr:
+            # in FCOS-DCN head, specifically need learning_rate and regularizer
+            dcn_bias_attr = ParamAttr(
+                initializer=Constant(value=0),
+                regularizer=dcn_bias_regularizer,
+                learning_rate=dcn_bias_lr_scale)
+        else:
+            # in ResNet backbone, do not need bias
+            dcn_bias_attr = False
+        self.conv_dcn = DeformConv2D(
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 * dilation,
+            dilation=dilation,
+            groups=groups,
+            weight_attr=weight_attr,
+            bias_attr=dcn_bias_attr)
+    def forward(self, x):
+        offset_mask = self.conv_offset(x)
+        offset, mask = paddle.split(
+            offset_mask,
+            num_or_sections=[self.offset_channel, self.mask_channel],
+            axis=1)
+        mask = F.sigmoid(mask)
+        y = self.conv_dcn(x, offset, mask=mask)
+        return y
+```
+### 2.3 Neck&head结构
+PP-YOLOv2模型中使用PAN和SPP结构来强化模型结构的Neck部分。[PAN（Path Aggregation Network）](https://arxiv.org/abs/1803.01534)结构，作为[FPN](https://arxiv.org/abs/1612.03144)的变形之一，通过从上至下和从下到上两条路径来聚合特征信息，达到更好的特征提取效果。具体结构如下图，其中C3, C4, C5为3个不同level的feature，分别对应stride为(8, 16, 32)；其中Detection Block使用CSP connection方式，对应ppdet的[PPYOLODetBlockCSP模块](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/necks/yolo_fpn.py#L359)
+<div align="center">
+  <img src="https://ai-studio-static-online.cdn.bcebos.com/eeae465462484a6a9797f779434ef721cb9882eb10374b34be43956360691521" width='600'/>
+</div>
+SPP在[Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/pdf/1406.4729.pdf)中提出，可以通过多个不同尺度的池化窗口提取不同尺度的池化特征，然后把特征组合在一起作为输出特征，能有效的增加特征的感受野，是一种广泛应用的特征提取优化方法。PPYOLO-v2中使用三个池化窗口分别是(5, 9, 13)，得到特征通过concat拼接到一起，最后跟一个卷积操作，详见[SPP模快](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/necks/yolo_fpn.py#L114)。SPP会插入到PAN第一组计算的[中间位置](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/necks/yolo_fpn.py#L903)。
+<div align="center">
+  <img src="https://ai-studio-static-online.cdn.bcebos.com/439a29465efc4867ac4edc70d17d3ac9aa124d719b364170b08422a685045745" width='600'/>
+</div>
+除此之外，PP-YOLOv2 Neck部分引入了[Mish](https://arxiv.org/pdf/1908.08681.pdf)激活函数，公式如下：
+$$
+mish(x) = x \ast tanh(ln(1+e^x))
+$$
+Mish的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/ops.py#L43)如下所示：
+```python
+def mish(x):
+    return x * paddle.tanh(F.softplus(x))
+```
+PP-YOLOv2中PAN模块使用方式参考 [neck: PPYOLOPAN](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L9)
+```
+PPYOLOPAN:
+  act: "mish"        # 默认使用mish函数
+  conv_block_num: 2  # 每个pan block中使用的conv block个数
+  drop_block: true   # 是否采用drop block, 训练策略模块中介绍
+  block_size: 3      # DropBlock的size
+  keep_prob: 0.9     # DropBlock保留的概率
+  spp: true          # 是否使用spp
+```
+PP-YOLOv2的Head部分在PAN输出的3个scale的feature上进行预测，PP-YOLOv2采用和[YOLO-v3](https://pjreddie.com/media/files/papers/YOLOv3.pdf)类似的结构，即使用卷积对最后的feature进行编码，最后输出的feature是四维的tensor，分别是[n, c, h, w]对应图像数量、通道数、高和宽。c是具体的形式为anchor_num ∗ (4 + 1 + 1 + num_classs)，anchor_num是每个位置对应的anchor的数量(PP-YOLOv2中为3)，4代表bbox的属性(对应中心点和长宽)，1代表是否是物体(objectness), 1代表iou_aware(详细见损失函数计算), num_classs代表类别数量(coco数据集上为80).
+使用方式参考[yolo_head: YOLOv3Head](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L28)
+```
+YOLOv3Head:
+  # anchors包含9种, 根据anchor_masks的index分为3组，分别对应到不同的scale
+  # [6, 7, 8]对应到stride为32的预测特征
+  # [3, 4, 5]对应到stride为16的预测特征
+  # [0, 1, 2]对应到stride为8的预测特征
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss      # 采用损失函数类型，详细见损失函数模块
+  iou_aware: true       # 是否使用iou_aware
+  iou_aware_factor: 0.5 # iou_aware的系数
+```
+### 2.4 损失函数
+PP-YOLOv2使用IoU Loss和IoU Aware Loss提升定位精度。IoU Loss直接优化预测框与真实框的IoU，提升了预测框的质量。IoU Aware Loss则用于监督模型学习预测框与真实框的IoU，学习到的IoU将作为定位置信度参与到NMS的计算当中。
+对于目标检测任务，IoU是我们常用评估指标。预测框与真实框的IoU越大，预测框与真实框越接近，预测框的质量越高。基于“所见即所得”的思想，PP-YOLOv2使用IoU Loss直接去优化模型的预测框与真实框的IoU。IoU Loss的表达式如下：
+$$
+L_{iou}=1 - iou^2
+$$
+IoU Loss的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/losses/iou_loss.py#L56)如下所示：
+```python
+iou = bbox_iou(
+    pbox, gbox, giou=self.giou, diou=self.diou, ciou=self.ciou)
+if self.loss_square:
+    loss_iou = 1 - iou * iou
+else:
+    loss_iou = 1 - iou
+loss_iou = loss_iou * self.loss_weight
+```
+PP-YOLOv2增加了一个通道用于学习预测框与真实框的IoU，并使用IoU Aware Loss来监督这一过程。在推理过程中，将这个通道学习的IoU预测值也作为评分的因子之一，能一定程度上避免高IoU预测框被挤掉的情况，从而提升模型的精度。IoU Aware Loss为二分类交叉熵损失函数，其表达式如下：
+$$
+L_{iou\_aware} = -(iou * log(ioup) + (1 - iou) * log(1 - ioup))
+$$
+IoU Aware Loss的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/losses/iou_aware_loss.py#L41)如下：
+```python
+iou = bbox_iou(
+    pbox, gbox, giou=self.giou, diou=self.diou, ciou=self.ciou)
+iou.stop_gradient = True
+loss_iou_aware = F.binary_cross_entropy_with_logits(
+    ioup, iou, reduction='none')
+loss_iou_aware = loss_iou_aware * self.loss_weight
+```
+### 2.5 后处理优化
+在后处理的过程中，PP-YOLOv2采用了Matrix NMS和Grid Sensitive。Matrix NMS为并行化的计算[Soft NMS](https://paddlepedia.readthedocs.io/en/latest/tutorials/computer_vision/object_detection/SoftNMS.html?highlight=Soft%20NMS)的算法，Grid Sensitive解决了检测框的中心落到网格边线的情况。
+Grid Sensitive是YOLOv4引入的优化方法，如下图所示，YOLO系列模型中使用sigmoid函数来预测中心点相对于grid左上角点的偏移量。
+<div align="center">
+  <img src="https://ai-studio-static-online.cdn.bcebos.com/2c34ba7dc29b41feb09455c71ffe444f0d06c733b7384ec7bf8f56357fd6375c", width='400'/>
+</div>
+然而，当中心点位于grid的边线上时，使用sigmoid函数较难预测。因此，对于预测值加上一个缩放和偏移，保证预测框中心点能够有效的拟合真实框刚好落在网格边线上的情况。Grid Sensitive的表达式如下：
+$$
+x = scale * \sigma(x) - 0.5 * (scale - 1.) \\
+y = scale * \sigma(y) - 0.5 * (scale - 1.)
+$$
+Matrix NMS通过一个矩阵并行运算的方式计算出任意两个框之间的IoU，从而实现并行化的计算Soft NMS，在提升检测精度的同时，避免了推理速度的下降。Matrix NMS的实现在PaddlePaddle框架的[Matrix NMS OP](https://github.com/PaddlePaddle/Paddle/blob/release/2.1/paddle/fluid/operators/detection/matrix_nms_op.cc#L169)中，在PaddleDetection中封装了[Matrix NMS API](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/layers.py#L426)
+使用方式参考：[post process: MatrixNMS](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L59)
+```
+nms:
+    name: MatrixNMS       # NMS类型，支持MultiClass NMS和Matrix NMS
+    keep_top_k: 100       # NMS输出框的最大个数
+    score_threshold: 0.01 # NMS计算前的分数阈值
+    post_threshold: 0.01  # NMS计算后的分数阈值
+    nms_top_k: -1         # NMS计算前，分数过滤后保留的最大个数
+    background_label: -1  # 背景类别
+```
+### 2.6 训练策略
+在训练过程中，PP-YOLOv2使用了Synchronize batch normalization, EMA(Exponential Moving Average，指数滑动平均)和DropBlock和来提升模型的收敛效果以及泛化性能。
+BN(Batch Normalization, 批归一化)是训练卷积神经网络时常用的归一化方法，能起到加快模型收敛，防止梯度弥散的效果。在BN的计算过程中，需要统计样本的均值和方差，通常batch size越大，统计得到的均值和方差越准确。在多卡训练时，样本被等分送入每张卡，如果使用BN进行归一化，每张卡会利用自身的样本分别计算一个均值和方差进行批处理化，而SyncBN会同步所有卡的样本信息统一计算一个均值和方差，每张卡利用这个均值和方差进行批处理化。因此，使用SyncBN替代BN，能够使计算得到的均值和方差更加准确，从而提升模型的性能。SyncBN的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/backbones/resnet.py#L104)如下所示：
+```python
+if norm_type == 'sync_bn':
+    self.norm = nn.SyncBatchNorm(
+        ch_out, weight_attr=param_attr, bias_attr=bias_attr)
+else:
+    self.norm = nn.BatchNorm(
+        ch_out,
+        act=None,
+        param_attr=param_attr,
+        bias_attr=bias_attr,
+        use_global_stats=global_stats)
+```
+使用方法参考：[SyncBN](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L3)
+```
+norm_type: sync_bn
+```
+EMA是指将参数过去一段时间的均值作为新的参数。相比直接对参数进行更新，采用滑动平均的方式能让参数学习过程中变得更加平缓，能有效避免异常值对参数更新的影响，提升模型训练的收敛效果。EMA包含更新和校正两个过程，更新过程使用指数滑动平均的方式不断地更新参数$\theta$, 校正过程通过除以$(1 - decay^t)$来校正对于初值的偏移。
+EMA的更新过程如下列表达式所示：
+$$
+\theta_0 = 0 \\
+\theta_t = decay * \theta_{t - 1} + (1 - decay) * \theta_t
+$$
+EMA的校正过程如下列表达式所示:
+$$
+\tilde{\theta_t} = \frac{\theta_t}{1 - decay^t}
+$$
+EMA的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/optimizer.py#L261)如下所示：
+```python
+def update(self, model):
+    if self.use_thres_step:
+        decay = min(self.decay, (1 + self.step) / (10 + self.step))
+    else:
+        decay = self.decay
+    self._decay = decay
+    model_dict = model.state_dict()
+    for k, v in self.state_dict.items():
+        v = decay * v + (1 - decay) * model_dict[k]
+        v.stop_gradient = True
+        self.state_dict[k] = v
+    self.step += 1
+def apply(self):
+    if self.step == 0:
+        return self.state_dict
+    state_dict = dict()
+    for k, v in self.state_dict.items():
+        v = v / (1 - self._decay**self.step)
+        v.stop_gradient = True
+        state_dict[k] = v
+    return state_dict
+```
+使用方式参考：[EMA](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L4)
+```
+use_ema: true
+ema_decay: 0.9998
+```
+与[Dropout](https://paddlepedia.readthedocs.io/en/latest/tutorials/deep_learning/model_tuning/regularization/dropout.html?highlight=Dropout)类似，DropBlock是一种防止过拟合的方法。因为卷积特征图的相邻点之间包含密切相关的语义信息，以特征点的形式随机Drop对于目标检测任务通常不太有效。基于此，DropBlock算法在Drop特征的时候不是以特征点的形式来Drop的，而是会集中Drop掉某一块区域，从而更适合被应用到目标检测任务中来提高网络的泛化能力，如下图(c)中所示。
+<div align="center">
+  <img src="https://ai-studio-static-online.cdn.bcebos.com/cf257e09a8164b19bf0e6adc0eabbce0123917146c624e90a0e10f68ea38bb4b", width='600'/>
+</div>
+DropBlock的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/necks/yolo_fpn.py#L196)如下所示：
+```python
+gamma = (1. - self.keep_prob) / (self.block_size**2)
+if self.data_format == 'NCHW':
+    shape = x.shape[2:]
+else:
+    shape = x.shape[1:3]
+for s in shape:
+    gamma *= s / (s - self.block_size + 1)
+matrix = paddle.cast(paddle.rand(x.shape, x.dtype) < gamma, x.dtype)
+mask_inv = F.max_pool2d(
+            matrix,
+            self.block_size,
+            stride=1,
+            padding=self.block_size // 2,
+            data_format=self.data_format)
+mask = 1. - mask_inv
+y = x * mask * (mask.numel() / mask.sum())
+```
+以上是PP-YOLOv2模型优化的全部技巧，期间也实验过大量没有正向效果的方法，这些方法可能并不适用于YOLO系列的模型结构或者训练策略，在[PP-YOLOv2](https://arxiv.org/abs/2104.10419)论文中汇总了一部分，这里不详细展开了。下面分享PP-YOLOv2在实际应用中的使用技巧和模型调优经验。
+## 3. 调参经验
+### 3.1 配置合理的学习率
+PaddleDetection提供的[学习率配置](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/optimizer_365e.yml)是使用8张GPU训练，每张卡batch_size为12时对应的学习率（base_lr=0.005）,如果在实际训练时使用了其他的GPU卡数或batch_size，需要相应调整学习率设置，否则可能会出现模型训练出nan的情况。调整方法为的学习率与总batch_size，即卡数乘以每张卡batch_size，成正比，下表举例进行说明
+| GPU卡数 | 每张卡batch_size | 总batch_size | 对应学习率 |
+| -------- | -------- | -------- | -------- |
+| 8     | 12     | 96     |   0.005       |
+| 1     | 12     | 12     |   0.000625    |
+| 8     | 6     | 48     |   0.0025       |
+### 3.2  在资源允许的情况下增大batch_size。
+在多个目标检测任务优化的过程发现，仅仅增大reader中的batch_size有助于提升模型收敛效果。
+### 3.3 调整gradient clip
+在PP-YOLOv2中，设置了[clip_grad_by_norm](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/optimizer_365e.yml#L15) 为35以防止模型训练梯度爆炸，对于自定义任务，如果出现了梯度爆炸可以尝试修改梯度裁剪的值。
--- a/tutorials/pp-series/general_lite_model_optimization.md
+++ b/tutorials/pp-series/general_lite_model_optimization.md
+# 模型轻量化指南
+## 1. 简介
+本文主要关注模型轻量化过程中的通用优化方案，需要完成下面三个部分的内容，将复现的模型打造为轻量化模型。
+<div align="center">
+<img src="images/general_lite_model_pipeline.png"  width = "400" />
+</div>
+对于一个任务，达到下面二者之一的条件，即可认为核验通过。
+1. `模型大小`、`模型精度`和`模型速度`三个方面达到要求(该要求与任务相关，比如在某一目标检测任务中，要求模型大小（模型动转静导出的`pdiparams`与`pdmodel`文件大小之和）**20M**以内，模型精度与论文中大模型精度差距在**5%**以内，同时CPU上预测速度在**100ms**以内)。
+2. 在满足模型大小与速度的情况下，截止日期之前提交的模型中，模型精度最高。
+## 2. 具体内容
+### 2.1 更换骨干网络
+#### 2.1.1 简介
+视觉任务中，模型骨干网络直接影响模型大小和预测速度。
+大部分论文都是基于相对较大的骨干网络进行实验，如VGG、ResNet等（模型存储大小为100M量级），本部分希望通过更换骨干网络（模型存储大小为10M量级），让模型轻量化，方便实际部署过程。
+#### 2.1.2 具体内容
+使用`PP-LCNet_x2_5` (`~30M`)、`MobileNetV3_large_x1_0`  (`~20M`)、`MobileNetV3_small_x1_0` (`~10M`) 或者针对该任务设计（如关键点检测任务中的lite_HRNet）的轻量级骨干网络，替代原始任务中的骨干网络，训练模型。
+#### 2.1.3 操作步骤
+PaddleClas提供了便于下游任务使用的骨干网络以及调用接口，支持网络截断、返回网络中间层输出和修改网络中间层的功能。只需要安装whl包，便可以在自己的任务中使用骨干网络。
+如果使用的是常用的feature map，则可以直接通过安装paddleclas的whl包，来直接使用骨干网络。
+首先需要安装paddleclas whl包。
+```bash
+pip install https://paddle-model-ecology.bj.bcebos.com/whl/paddleclas-0.0.0-py3-none-any.whl
+```
+如果希望提取中间层特征进行训练，使用方法如下。
+```python
+import paddle
+import paddleclas
+# PPLCNet_x2_5
+model = paddleclas.PPLCNet_x2_5(pretrained=True, return_stages=True)
+# MobileNetV3_large
+# model = paddleclas.MobileNetV3_large_x1_0(pretrained=True, return_stages=True)
+# MobileNetV3_smal
+# model = paddleclas.MobileNetV3_small_x1_0(pretrained=True, return_stages=True)
+x = paddle.rand([1, 3, 224, 224])
+y = model(x)
+for key in y:
+    print(key, y[key].shape)
+```
+最终会同时返回logits以及中间层特征，可以根据自己的任务，选择合适分辨率的特征图进行训练。
+以PP-LCNet为例，输出信息与特征图分辨率如下所示，在检测任务中，一般抽取出`blocks3`,  `blocks4`,  `blocks5`,  `blocks5`， 4个特征图，即可用于下游任务的训练。
+```
+logits [1, 1000]
+blocks2 [1, 80, 112, 112]
+blocks3 [1, 160, 56, 56]
+blocks4 [1, 320, 28, 28]
+blocks5 [1, 640, 14, 14]
+blocks6 [1, 1280, 7, 7]
+```
+更多关于该接口的功能介绍和使用可以参考[theseus_layer使用教程](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/advanced_tutorials/theseus_layer.md)。
+在某些任务中，对需要对骨干网络的batch norm等参数状态进行修改（比如freeze norm或者stop grad等），此时建议直接拷贝骨干网络代码，修改代码并添加到自己的项目中。飞桨骨干网络代码地址：[常用backbone参考链接](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/arch/backbone/legendary_models)。
+#### 2.1.4 实战
+关键点检测任务对高低层特征融合的要求较高，这里使用`lite_hrnet`网络作为该任务的轻量化骨干网络：[lite_hrnet.py](HRNet-Keypoint/lib/models/lite_hrnet.py)。
+训练方法如下所示。
+```shell
+python tools/train.py -c configs/lite_hrnet_30_256x192_coco.yml
+```
+COCO数据集上结果对比如下所示。
+| Model    | Input Size | AP(coco val) |    Model Download | Model size | Config File |
+| :---------- | -------- | :--------: |:--------: | :----------: | ----------- |
+| HRNet-w32             | 256x192  |     76.9     | [hrnet_w32_256x192.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/hrnet_w32_256x192.pdparams) | 165M | [config](./configs/hrnet_w32_256x192.yml)                     |
+| LiteHRNet-30          | 256x192  |    69.4     | [lite_hrnet_30_256x192_coco.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/lite_hrnet_30_256x192_coco.pdparams) | 7.1M |  [config](./configs/lite_hrnet_30_256x192_coco.yml)
+#### 2.1.5 核验点
+（1）基于轻量化骨干网络训练模型，提供模型训练结果与模型，**模型精度/速度指标满足该项任务的要求**。
+（2）在提交的文档中补充轻量化骨干网络的模型精度、存储大小、训练日志以及模型下载地址。
+（3）文档中补充轻量化骨干网络模型的训练方法。
+### 2.2 模型蒸馏
+#### 2.2.1 简介
+模型蒸馏指的是用大模型指导小模型的训练过程，让小模型的精度更高，因此在相同精度情况下，所需模型更小，从而达到模型轻量化的目的。
+后续又衍生出两个完全相同的模型互相学习，这种模式称之为DML（互学习策略）
+大小模型蒸馏中也可以使用DML的loss，唯一的区别是在大小模型蒸馏中，教师模型的参数不需要更新。
+模型蒸馏有2种主要损失函数：
+* 对于分类输出，对于最终的回归输出，使用，JSDIV loss，具体实现可以参考：[链接](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/ppcls/loss/dmlloss.py#L20#L20)
+* 对于回归输出，使用距离loss（l2、l1、smooth l1等）
+#### 2.2.2 具体内容
+（1）如果有大模型，则使用大模型（默认的骨干网络）指导小模型（超轻量骨干网络）学习，根据任务输出，选择用于计算的loss，加入训练loss中，训练得到最终模型，保存模型、日志与最终精度。
+（2）如果没有大模型，建议使用2个完全相同的小模型互相学习，根据任务输出，选择用于计算的loss，加入训练loss中，训练得到最终模型，保存模型、日志与最终精度。
+#### 2.2.3 操作步骤
+（1）定义蒸馏模型：蒸馏模型中包含教师与学生模型，教师模型基于默认骨干网络搭建，学生模型基于超轻量骨干网络搭建。如果默认骨干网络已经是超量网骨干网络，则可以使用结构相同的模型进行互学习。
+（2）定义损失函数：蒸馏任务中，包含3个损失函数：
+* 教师模型输出与groundtruth的损失函数
+* 学生模型与groundtruth之间的损失函数
+* 教师模型与学生模型输出之间的损失函数
+（3）加载预训练模型：
+* 如果教师模型是大模型，则需要加载大模型的训练结果，并且将教师模型的参数状态设置为`trainable=False`，停止参数更新，使用示例可以参考：[链接](https://github.com/PaddlePaddle/PaddleClas/blob/1358e3f647e12b9ee6c5d6450291983b2d5ac382/ppcls/arch/__init__.py#L117)
+* 如果是教师模型是小模型，则与学生模型的加载逻辑相同
+（4）蒸馏训练：和该任务的默认训练过程保持一致。
+#### 2.2.4 实战
+关键点检测任务中，教师模型直接使用HRNet骨干网络训练得到的模型，学生模型使用`lite_hrnet`作为骨干网络。
+* 教师模型构建（通过传入教师模型的结构配置与预训练模型路径，初始化教师模型）：[build_teacher_model函数](HRNet-Keypoint/tools/train.py#46)。
+* 损失函数构建：[DistMSELoss](HRNet-Keypoint/lib/models/loss.py#L67)，由于关键点检测任务是回归任务，这里选用了MSE loss作为蒸馏的损失函数。
+最终使用知识蒸馏训练轻量化模型的命令如下所示。
+```bash
+python tools/train.py -c configs/lite_hrnet_30_256x192_coco.yml  --distill_config=./configs/hrnet_w32_256x192_teacher.yml
+```
+最终在模型大小不变的情况下，精度从`69.4%`提升至`69.9%`。
+#### 2.2.5 核验点
+（1）提供轻量化骨干网络的蒸馏结果精度，**模型精度指标满足该项任务的要求（如果有）**。
+（2）提供知识蒸馏训练后的模型下载地址以及训练日志。
+（3）文档中补充知识蒸馏训练的说明文档与命令。
+### 2.3 模型量化
+#### 2.3.1 简介
+Paddle 量化训练（Quant-aware Training, QAT）是指在训练过程中对模型的权重及激活做模拟量化，并且产出量化训练校准后的量化模型，使用该量化模型进行预测，可以减少计算量、降低计算内存、减小模型大小。
+#### 2.3.2 具体内容
+添加模型PACT训练代码与训练脚本，并且提供训练日志、模型与精度对比。
+**注意：**量化模型在导出为用于端侧部署的Lite模型时，才会以int8的形式保存模型，这里保存的预训练模型仍然以FP32的形式保存，因此不会小于使用fp32训练得到的模型。
+#### 2.3.3 操作步骤
+向代码中添加PACT量化代码包含以下5个步骤。
+<div align="center">
+<img src="../tipc/images/quant_aware_training_guide.png"  width = "500" />
+</div>
+具体内容请参考[Linux GPU/CPU PACT量化训练功能开发文档](../tipc/train_pact_infer_python/train_pact_infer_python.md)。
+#### 2.3.4 实战
+在关键点检测任务中，首先在配置中添加PACT量化的配置文件：[lite_hrnet_30_256x192_coco_pact.yml#L19](HRNet-Keypoint/configs/lite_hrnet_30_256x192_coco_pact.yml#L19)。
+```yaml
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams
+slim: QAT
+# 这里的PACT量化配置适用于大多数任务，包括分类、检测、OCR等，一般无需改动
+QAT:
+  quant_config: {
+    'activation_preprocess_type': 'PACT',
+    'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
+    'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9,
+    'quantizable_layer_type': ['Conv2D', 'Linear']}
+  print_model: True
+```
+在代码中，基于配置文件创建PACT的量化类，之后再将fp32的`nn.Layer`模型传入PACT量化类中，得到量化后的模型，用于训练。代码如下所示。
+```python
+def build_slim_model(cfg, mode='train'):
+    assert cfg.slim == 'QAT', 'Only QAT is supported now'
+    model = create(cfg.architecture)
+    if mode == 'train':
+        load_pretrain_weight(model, cfg.pretrain_weights)
+    slim = create(cfg.slim)
+    cfg['slim_type'] = cfg.slim
+    # TODO: fix quant export model in framework.
+    if mode == 'test' and cfg.slim == 'QAT':
+        slim.quant_config['activation_preprocess_type'] = None
+    cfg['model'] = slim(model)
+    cfg['slim'] = slim
+    if mode != 'train':
+        load_pretrain_weight(cfg['model'], cfg.weights)
+    return cfg
+```
+#### 2.3.5 核验点
+（1）提供量化后的模型精度，**模型精度指标满足该项任务的要求（如果有）**。
+（2）提供量化训练后的模型下载地址以及训练日志。
+（3）文档中补充量化训练的说明文档与命令。
+## 3. FAQ
+### 3.1 轻量化骨干网络
+* 关于模型大小的定义如下：将训练得到的动态图模型使用`paddle.jit.save`接口，保存为静态图模型，得到模型参数文件`*.pdiparams`和结构文件`*.pdmodel`，二者的存储大小之和。
+* 2.1章节中提供的骨干网络为推荐使用，具体不做限制，最终模型大小/速度/精度满足验收条件即可。
+* 在部分模型不方便直接替换骨干网络的情况下（比如该模型是针对该任务设计的，通用的骨干网络无法满足要求等），可以通过对大模型的通道或者层数进行裁剪，来实现模型轻量化，最终保证模型大小满足要求即可。
+### 3.2 知识蒸馏
+* 不同任务中有针对该任务的定制知识蒸馏训练策略，2.2章节中内容仅供建议参考，蒸馏策略不做限制，模型精度满足要求验收条件即可。
+### 3.3 模型量化
+* 量化时，加载训练得到的fp32模型之后，可以将初始学习率修改为fp32训练时的`0.2~0.5`倍，迭代轮数也可以缩短为之前的`0.25~0.5`倍。
+* 对于大多数CV任务，模型量化的精度损失在0.3%~1.0%左右，如果量化后精度大幅降低（超过3%），则需要仔细排查量化细节，建议仅对`conv`以及`linear`参数进行量化。
--- a/tutorials/pp-series/images/general_lite_model_pipeline.png
+++ b/tutorials/pp-series/images/general_lite_model_pipeline.png
--- a/tutorials/pp-series/images/lite_model_framework.png
+++ b/tutorials/pp-series/images/lite_model_framework.png