未验证 提交 8f6106cb 编写于 作者: littletomatodonkey's avatar littletomatodonkey 提交者: GitHub

add industrial code and doc (#5493)

* add industrial code and doc
上级 1df0f648
# 产业级模型开发教程
飞桨是源于产业实践的开源深度学习平台,致力于让深度学习技术的创新与应用更简单。产业级模型的开发过程主要包含下面三个步骤。
<div align="center">
<img src="images/intrstrial_sota_model_pipeline.png" width = "800" />
</div>
具体地,
* 关于论文复现流程与方法,请参考:[论文复现指南](./article-implementation/ArticleReproduction_CV.md)
* 关于模型速度与精度优化的方法,请参考:[产业级SOTA模型优化指南](./pp-series/README.md)
* 关于训推一体全流程功能开发与测试方法,请参考:[飞桨训推一体全流程开发文档](./tipc/README.md)
# [Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019)](https://arxiv.org/abs/1902.09212)
## 1 Introduction
This is the paddle code of [Deep High-Resolution Representation Learning for Human Pose Estimation](https://arxiv.org/abs/1902.09212).
In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.
## 2 How to use
### 2.1 Environment
### Requirements:
- PaddlePaddle 2.2
- OS 64 bit
- Python 3(3.5.1+/3.6/3.7/3.8/3.9),64 bit
- pip/pip3(9.0.1+), 64 bit
- CUDA >= 10.1
- cuDNN >= 7.6
### Installation
#### 1. Install PaddlePaddle
```
# CUDA10.1
python -m pip install paddlepaddle-gpu==2.2.0.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
```
- For more CUDA version or environment to quick install, please refer to the [PaddlePaddle Quick Installation document](https://www.paddlepaddle.org.cn/install/quick)
- For more installation methods such as conda or compile with source code, please refer to the [installation document](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)
Please make sure that your PaddlePaddle is installed successfully and the version is not lower than the required version. Use the following command to verify.
```
# check
>>> import paddle
>>> paddle.utils.run_check()
# confirm the paddle's version
python -c "import paddle; print(paddle.__version__)"
```
**Note**
1. If you want to use PaddleDetection on multi-GPU, please install NCCL at first.
#### 2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
#### 3. Install dependencies:
```
pip install -r requirements.txt
```
#### 4. Init output(training model output directory) and log(tensorboard log directory) directory:
```
mkdir output
mkdir log
```
Your directory tree should look like this:
```
${POSE_ROOT}
├── config
├── dataset
├── figures
├── lib
├── log
├── output
├── tools
├── README.md
└── requirements.txt
```
### 2.2 Data preparation
#### COCO Data Download
- The coco dataset is downloaded automatically through the code. The dataset is large and takes a long time to download
```
# automatically download coco datasets by executing code
python dataset/download_coco.py
```
after code execution, the organization structure of coco dataset file is:
```
>>cd dataset
>>tree
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
- If the coco dataset has been downloaded
The files can be organized according to the above data file organization structure.
### 2.3 Training & Evaluation & Inference
We provides scripts for training, evalution and inference with various features according to different configure.
```bash
# training on single-GPU
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/hrnet_w32_256x192.yml
# training on multi-GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/hrnet_w32_256x192.yml
# GPU evaluation
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/hrnet_w32_256x192.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams
# Inference
python tools/infer.py -c configs/hrnet_w32_256x192.yml --infer_img=dataset/test_image/hrnet_demo.jpg -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams
# training with distillation
python tools/train.py -c configs/lite_hrnet_30_256x192_coco.yml --distill_config=./configs/hrnet_w32_256x192_teacher.yml
# training with PACT quantization on single-GPU
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/lite_hrnet_30_256x192_coco_pact.yml
# training with PACT quantization on multi-GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/lite_hrnet_30_256x192_coco_pact.yml
# GPU evaluation with PACT quantization
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/lite_hrnet_30_256x192_coco_pact.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/lite_hrnet_30_256x192_coco_pact.pdparams
# Inference with PACT quantization
python tools/infer.py -c configs/lite_hrnet_30_256x192_coco_pact.yml
--infer_img=dataset/test_image/hrnet_demo.jpg -o weights=https://paddledet.bj.bcebos.com/models/keypoint/lite_hrnet_30_256x192_coco_pact.pdparams
```
## 3 Result
COCO Dataset
| Model | Input Size | AP(coco val) | Model Download | Config File |
| :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------------- |
| HRNet-w32 | 256x192 | 76.9 | [hrnet_w32_256x192.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/hrnet_w32_256x192.pdparams) | [config](./configs/hrnet_w32_256x192.yml) |
| LiteHRNet-30 | 256x192 | 69.4 | [lite_hrnet_30_256x192_coco.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/lite_hrnet_30_256x192_coco.pdparams) | [config](./configs/lite_hrnet_30_256x192_coco.yml) |
| LiteHRNet-30-PACT | 256x192 | 68.9 | [lite_hrnet_30_256x192_coco_pact.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/lite_hrnet_30_256x192_coco_pact.pdparams) | [config](./configs/lite_hrnet_30_256x192_coco_pact.yml) |
| LiteHRNet-30-PACT | 256x192 | 69.9 | [lite_hrnet_30_256x192_coco.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/lite_hrnet_30_256x192_coco_dist.pdparams) | [config](./configs/lite_hrnet_30_256x192_coco_pact.yml) |
![](/dataset/test_image/hrnet_demo.jpg)
![](/deploy/output/hrnet_demo_vis.jpg)
## Citation
````
@inproceedings{cheng2020bottom,
title={Deep High-Resolution Representation Learning for Human Pose Estimation},
author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
booktitle={CVPR},
year={2019}
}
````
use_gpu: true
log_iter: 5
save_dir: output
snapshot_epoch: 10
weights: output/hrnet_w32_256x192/model_final
epoch: 210
num_joints: &num_joints 17
pixel_std: &pixel_std 200
metric: KeyPointTopDownCOCOEval
num_classes: 1
train_height: &train_height 256
train_width: &train_width 192
trainsize: &trainsize [*train_width, *train_height]
hmsize: &hmsize [48, 64]
flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
#####model
architecture: TopDownHRNet
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams
TopDownHRNet:
backbone: HRNet
post_process: HRNetPostProcess
flip_perm: *flip_perm
num_joints: *num_joints
width: &width 32
loss: KeyPointMSELoss
use_dark: False
HRNet:
width: *width
freeze_at: -1
freeze_norm: false
return_idx: [0]
KeyPointMSELoss:
use_target_weight: true
#####optimizer
LearningRate:
base_lr: 0.0005
schedulers:
- !PiecewiseDecay
milestones: [170, 200]
gamma: 0.1
- !LinearWarmup
start_factor: 0.001
steps: 1000
OptimizerBuilder:
optimizer:
type: Adam
regularizer:
factor: 0.0
type: L2
#####data
TrainDataset:
!KeypointTopDownCocoDataset
image_dir: train2017
anno_path: annotations/person_keypoints_train2017.json
dataset_dir: dataset/coco
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
EvalDataset:
!KeypointTopDownCocoDataset
image_dir: val2017
anno_path: annotations/person_keypoints_val2017.json
dataset_dir: dataset/coco
bbox_file: bbox.json
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
image_thre: 0.0
TestDataset:
!ImageFolder
anno_path: dataset/coco/keypoint_imagelist.txt
worker_num: 2
global_mean: &global_mean [0.485, 0.456, 0.406]
global_std: &global_std [0.229, 0.224, 0.225]
TrainReader:
sample_transforms:
- RandomFlipHalfBodyTransform:
scale: 0.5
rot: 40
num_joints_half_body: 8
prob_half_body: 0.3
pixel_std: *pixel_std
trainsize: *trainsize
upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
flip_pairs: *flip_perm
- TopDownAffine:
trainsize: *trainsize
- ToHeatmapsTopDown:
hmsize: *hmsize
sigma: 2
batch_transforms:
- NormalizeImage:
mean: *global_mean
std: *global_std
is_scale: true
- Permute: {}
batch_size: 64
shuffle: true
drop_last: false
EvalReader:
sample_transforms:
- TopDownAffine:
trainsize: *trainsize
batch_transforms:
- NormalizeImage:
mean: *global_mean
std: *global_std
is_scale: true
- Permute: {}
batch_size: 16
TestReader:
inputs_def:
image_shape: [3, *train_height, *train_width]
sample_transforms:
- Decode: {}
- TopDownEvalAffine:
trainsize: *trainsize
- NormalizeImage:
mean: *global_mean
std: *global_std
is_scale: true
- Permute: {}
batch_size: 1
fuse_normalize: false #whether to fuse nomalize layer into model while export model
pretrain_weights:
weights: "https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams"
num_joints: &num_joints 17
pixel_std: &pixel_std 200
metric: KeyPointTopDownCOCOEval
train_height: &train_height 256
train_width: &train_width 192
trainsize: &trainsize [*train_width, *train_height]
hmsize: &hmsize [48, 64]
flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
# distillation config and loss
freeze_parameters: True
distill_loss:
name: DistMSELoss
weight: 1.0
key: output
# model
architecture: TopDownHRNet
TopDownHRNet:
backbone: HRNet
post_process: HRNetPostProcess
flip_perm: *flip_perm
num_joints: *num_joints
width: &width 32
loss: KeyPointMSELoss
use_dark: False
HRNet:
width: *width
freeze_at: -1
freeze_norm: false
return_idx: [0]
KeyPointMSELoss:
use_target_weight: true
\ No newline at end of file
use_gpu: true
log_iter: 5
save_dir: output
snapshot_epoch: 10
weights: output/lite_hrnet_30_256x192_coco/model_final
epoch: 210
num_joints: &num_joints 17
pixel_std: &pixel_std 200
metric: KeyPointTopDownCOCOEval
num_classes: 1
train_height: &train_height 256
train_width: &train_width 192
trainsize: &trainsize [*train_width, *train_height]
hmsize: &hmsize [48, 64]
flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
#####model
architecture: TopDownHRNet
TopDownHRNet:
backbone: LiteHRNet
post_process: HRNetPostProcess
flip_perm: *flip_perm
num_joints: *num_joints
width: &width 40
loss: KeyPointMSELoss
use_dark: false
LiteHRNet:
network_type: lite_30
freeze_at: -1
freeze_norm: false
return_idx: [0]
KeyPointMSELoss:
use_target_weight: true
loss_scale: 1.0
#####optimizer
LearningRate:
base_lr: 0.002
schedulers:
- !PiecewiseDecay
milestones: [170, 200]
gamma: 0.1
- !LinearWarmup
start_factor: 0.001
steps: 500
OptimizerBuilder:
optimizer:
type: Adam
regularizer:
factor: 0.0
type: L2
#####data
TrainDataset:
!KeypointTopDownCocoDataset
image_dir: train2017
anno_path: annotations/person_keypoints_train2017.json
dataset_dir: dataset/coco
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
EvalDataset:
!KeypointTopDownCocoDataset
image_dir: val2017
anno_path: annotations/person_keypoints_val2017.json
dataset_dir: dataset/coco
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
image_thre: 0.0
TestDataset:
!ImageFolder
anno_path: dataset/coco/keypoint_imagelist.txt
worker_num: 4
global_mean: &global_mean [0.485, 0.456, 0.406]
global_std: &global_std [0.229, 0.224, 0.225]
TrainReader:
sample_transforms:
- RandomFlipHalfBodyTransform:
scale: 0.25
rot: 30
num_joints_half_body: 8
prob_half_body: 0.3
pixel_std: *pixel_std
trainsize: *trainsize
upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
flip_pairs: *flip_perm
- TopDownAffine:
trainsize: *trainsize
- ToHeatmapsTopDown:
hmsize: *hmsize
sigma: 2
batch_transforms:
- NormalizeImage:
mean: *global_mean
std: *global_std
is_scale: true
- Permute: {}
batch_size: 64
shuffle: true
drop_last: false
EvalReader:
sample_transforms:
- TopDownAffine:
trainsize: *trainsize
batch_transforms:
- NormalizeImage:
mean: *global_mean
std: *global_std
is_scale: true
- Permute: {}
batch_size: 16
TestReader:
inputs_def:
image_shape: [3, *train_height, *train_width]
sample_transforms:
- Decode: {}
- TopDownEvalAffine:
trainsize: *trainsize
- NormalizeImage:
mean: *global_mean
std: *global_std
is_scale: true
- Permute: {}
batch_size: 1
use_gpu: true
log_iter: 5
save_dir: output
snapshot_epoch: 10
weights: output/lite_hrnet_30_256x192_coco/model_final
epoch: 50
num_joints: &num_joints 17
pixel_std: &pixel_std 200
metric: KeyPointTopDownCOCOEval
num_classes: 1
train_height: &train_height 256
train_width: &train_width 192
trainsize: &trainsize [*train_width, *train_height]
hmsize: &hmsize [48, 64]
flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams
slim: QAT
QAT:
quant_config: {
'activation_preprocess_type': 'PACT',
'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9,
'quantizable_layer_type': ['Conv2D', 'Linear']}
print_model: True
architecture: TopDownHRNet
TopDownHRNet:
backbone: LiteHRNet
post_process: HRNetPostProcess
flip_perm: *flip_perm
num_joints: *num_joints
width: &width 40
loss: KeyPointMSELoss
use_dark: false
LiteHRNet:
network_type: lite_30
freeze_at: -1
freeze_norm: false
return_idx: [0]
KeyPointMSELoss:
use_target_weight: true
loss_scale: 1.0
# optimizer
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
milestones: [40, 45]
gamma: 0.1
- !LinearWarmup
start_factor: 0.001
steps: 500
OptimizerBuilder:
optimizer:
type: Adam
regularizer:
factor: 0.0
type: L2
#####data
TrainDataset:
!KeypointTopDownCocoDataset
image_dir: train2017
anno_path: annotations/person_keypoints_train2017.json
dataset_dir: dataset/coco
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
EvalDataset:
!KeypointTopDownCocoDataset
image_dir: val2017
anno_path: annotations/person_keypoints_val2017.json
dataset_dir: dataset/coco
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
image_thre: 0.0
TestDataset:
!ImageFolder
anno_path: dataset/coco/keypoint_imagelist.txt
worker_num: 4
global_mean: &global_mean [0.485, 0.456, 0.406]
global_std: &global_std [0.229, 0.224, 0.225]
TrainReader:
sample_transforms:
- RandomFlipHalfBodyTransform:
scale: 0.25
rot: 30
num_joints_half_body: 8
prob_half_body: 0.3
pixel_std: *pixel_std
trainsize: *trainsize
upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
flip_pairs: *flip_perm
- TopDownAffine:
trainsize: *trainsize
- ToHeatmapsTopDown:
hmsize: *hmsize
sigma: 2
batch_transforms:
- NormalizeImage:
mean: *global_mean
std: *global_std
is_scale: true
- Permute: {}
batch_size: 64
shuffle: true
drop_last: false
EvalReader:
sample_transforms:
- TopDownAffine:
trainsize: *trainsize
batch_transforms:
- NormalizeImage:
mean: *global_mean
std: *global_std
is_scale: true
- Permute: {}
batch_size: 16
TestReader:
inputs_def:
image_shape: [3, *train_height, *train_width]
sample_transforms:
- Decode: {}
- TopDownEvalAffine:
trainsize: *trainsize
- NormalizeImage:
mean: *global_mean
std: *global_std
is_scale: true
- Permute: {}
batch_size: 1
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os.path as osp
import logging
# add python path of PadleDetection to sys.path
parent_path = osp.abspath(osp.join(__file__, *(['..'] * 3)))
if parent_path not in sys.path:
sys.path.append(parent_path)
from ppdet.utils.download import download_dataset
logging.basicConfig(level=logging.INFO)
download_path = osp.split(osp.realpath(sys.argv[0]))[0]
download_dataset(download_path, 'coco')
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import logging
import paddle
import paddle.inference as paddle_infer
from pathlib import Path
CUR_DIR = os.path.dirname(os.path.abspath(__file__))
LOG_PATH_ROOT = f"{CUR_DIR}/../../output"
class PaddleInferBenchmark(object):
def __init__(self,
config,
model_info: dict={},
data_info: dict={},
perf_info: dict={},
resource_info: dict={},
**kwargs):
"""
Construct PaddleInferBenchmark Class to format logs.
args:
config(paddle.inference.Config): paddle inference config
model_info(dict): basic model info
{'model_name': 'resnet50'
'precision': 'fp32'}
data_info(dict): input data info
{'batch_size': 1
'shape': '3,224,224'
'data_num': 1000}
perf_info(dict): performance result
{'preprocess_time_s': 1.0
'inference_time_s': 2.0
'postprocess_time_s': 1.0
'total_time_s': 4.0}
resource_info(dict):
cpu and gpu resources
{'cpu_rss': 100
'gpu_rss': 100
'gpu_util': 60}
"""
# PaddleInferBenchmark Log Version
self.log_version = "1.0.3"
# Paddle Version
self.paddle_version = paddle.__version__
self.paddle_commit = paddle.__git_commit__
paddle_infer_info = paddle_infer.get_version()
self.paddle_branch = paddle_infer_info.strip().split(': ')[-1]
# model info
self.model_info = model_info
# data info
self.data_info = data_info
# perf info
self.perf_info = perf_info
try:
# required value
self.model_name = model_info['model_name']
self.precision = model_info['precision']
self.batch_size = data_info['batch_size']
self.shape = data_info['shape']
self.data_num = data_info['data_num']
self.inference_time_s = round(perf_info['inference_time_s'], 4)
except:
self.print_help()
raise ValueError(
"Set argument wrong, please check input argument and its type")
self.preprocess_time_s = perf_info.get('preprocess_time_s', 0)
self.postprocess_time_s = perf_info.get('postprocess_time_s', 0)
self.total_time_s = perf_info.get('total_time_s', 0)
self.inference_time_s_90 = perf_info.get("inference_time_s_90", "")
self.inference_time_s_99 = perf_info.get("inference_time_s_99", "")
self.succ_rate = perf_info.get("succ_rate", "")
self.qps = perf_info.get("qps", "")
# conf info
self.config_status = self.parse_config(config)
# mem info
if isinstance(resource_info, dict):
self.cpu_rss_mb = int(resource_info.get('cpu_rss_mb', 0))
self.cpu_vms_mb = int(resource_info.get('cpu_vms_mb', 0))
self.cpu_shared_mb = int(resource_info.get('cpu_shared_mb', 0))
self.cpu_dirty_mb = int(resource_info.get('cpu_dirty_mb', 0))
self.cpu_util = round(resource_info.get('cpu_util', 0), 2)
self.gpu_rss_mb = int(resource_info.get('gpu_rss_mb', 0))
self.gpu_util = round(resource_info.get('gpu_util', 0), 2)
self.gpu_mem_util = round(resource_info.get('gpu_mem_util', 0), 2)
else:
self.cpu_rss_mb = 0
self.cpu_vms_mb = 0
self.cpu_shared_mb = 0
self.cpu_dirty_mb = 0
self.cpu_util = 0
self.gpu_rss_mb = 0
self.gpu_util = 0
self.gpu_mem_util = 0
# init benchmark logger
self.benchmark_logger()
def benchmark_logger(self):
"""
benchmark logger
"""
# remove other logging handler
for handler in logging.root.handlers[:]:
logging.root.removeHandler(handler)
# Init logger
FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
log_output = f"{LOG_PATH_ROOT}/{self.model_name}.log"
Path(f"{LOG_PATH_ROOT}").mkdir(parents=True, exist_ok=True)
logging.basicConfig(
level=logging.INFO,
format=FORMAT,
handlers=[
logging.FileHandler(
filename=log_output, mode='w'),
logging.StreamHandler(),
])
self.logger = logging.getLogger(__name__)
self.logger.info(
f"Paddle Inference benchmark log will be saved to {log_output}")
def parse_config(self, config) -> dict:
"""
parse paddle predictor config
args:
config(paddle.inference.Config): paddle inference config
return:
config_status(dict): dict style config info
"""
if isinstance(config, paddle_infer.Config):
config_status = {}
config_status['runtime_device'] = "gpu" if config.use_gpu(
) else "cpu"
config_status['ir_optim'] = config.ir_optim()
config_status['enable_tensorrt'] = config.tensorrt_engine_enabled()
config_status['precision'] = self.precision
config_status['enable_mkldnn'] = config.mkldnn_enabled()
config_status[
'cpu_math_library_num_threads'] = config.cpu_math_library_num_threads(
)
elif isinstance(config, dict):
config_status['runtime_device'] = config.get('runtime_device', "")
config_status['ir_optim'] = config.get('ir_optim', "")
config_status['enable_tensorrt'] = config.get('enable_tensorrt',
"")
config_status['precision'] = config.get('precision', "")
config_status['enable_mkldnn'] = config.get('enable_mkldnn', "")
config_status['cpu_math_library_num_threads'] = config.get(
'cpu_math_library_num_threads', "")
else:
self.print_help()
raise ValueError(
"Set argument config wrong, please check input argument and its type"
)
return config_status
def report(self, identifier=None):
"""
print log report
args:
identifier(string): identify log
"""
if identifier:
identifier = f"[{identifier}]"
else:
identifier = ""
self.logger.info("\n")
self.logger.info(
"---------------------- Paddle info ----------------------")
self.logger.info(f"{identifier} paddle_version: {self.paddle_version}")
self.logger.info(f"{identifier} paddle_commit: {self.paddle_commit}")
self.logger.info(f"{identifier} paddle_branch: {self.paddle_branch}")
self.logger.info(f"{identifier} log_api_version: {self.log_version}")
self.logger.info(
"----------------------- Conf info -----------------------")
self.logger.info(
f"{identifier} runtime_device: {self.config_status['runtime_device']}"
)
self.logger.info(
f"{identifier} ir_optim: {self.config_status['ir_optim']}")
self.logger.info(f"{identifier} enable_memory_optim: {True}")
self.logger.info(
f"{identifier} enable_tensorrt: {self.config_status['enable_tensorrt']}"
)
self.logger.info(
f"{identifier} enable_mkldnn: {self.config_status['enable_mkldnn']}"
)
self.logger.info(
f"{identifier} cpu_math_library_num_threads: {self.config_status['cpu_math_library_num_threads']}"
)
self.logger.info(
"----------------------- Model info ----------------------")
self.logger.info(f"{identifier} model_name: {self.model_name}")
self.logger.info(f"{identifier} precision: {self.precision}")
self.logger.info(
"----------------------- Data info -----------------------")
self.logger.info(f"{identifier} batch_size: {self.batch_size}")
self.logger.info(f"{identifier} input_shape: {self.shape}")
self.logger.info(f"{identifier} data_num: {self.data_num}")
self.logger.info(
"----------------------- Perf info -----------------------")
self.logger.info(
f"{identifier} cpu_rss(MB): {self.cpu_rss_mb}, cpu_vms: {self.cpu_vms_mb}, cpu_shared_mb: {self.cpu_shared_mb}, cpu_dirty_mb: {self.cpu_dirty_mb}, cpu_util: {self.cpu_util}%"
)
self.logger.info(
f"{identifier} gpu_rss(MB): {self.gpu_rss_mb}, gpu_util: {self.gpu_util}%, gpu_mem_util: {self.gpu_mem_util}%"
)
self.logger.info(
f"{identifier} total time spent(s): {self.total_time_s}")
self.logger.info(
f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, inference_time(ms): {round(self.inference_time_s*1000, 1)}, postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}"
)
if self.inference_time_s_90:
self.looger.info(
f"{identifier} 90%_cost: {self.inference_time_s_90}, 99%_cost: {self.inference_time_s_99}, succ_rate: {self.succ_rate}"
)
if self.qps:
self.logger.info(f"{identifier} QPS: {self.qps}")
def print_help(self):
"""
print function help
"""
print("""Usage:
==== Print inference benchmark logs. ====
config = paddle.inference.Config()
model_info = {'model_name': 'resnet50'
'precision': 'fp32'}
data_info = {'batch_size': 1
'shape': '3,224,224'
'data_num': 1000}
perf_info = {'preprocess_time_s': 1.0
'inference_time_s': 2.0
'postprocess_time_s': 1.0
'total_time_s': 4.0}
resource_info = {'cpu_rss_mb': 100
'gpu_rss_mb': 100
'gpu_util': 60}
log = PaddleInferBenchmark(config, model_info, data_info, perf_info, resource_info)
log('Test')
""")
def __call__(self, identifier=None):
"""
__call__
args:
identifier(string): identify log
"""
self.report(identifier)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import yaml
import glob
from functools import reduce
import cv2
import numpy as np
import math
import paddle
from paddle.inference import Config
from paddle.inference import create_predictor
from benchmark_utils import PaddleInferBenchmark
from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride, WarpAffine, TopDownEvalAffine, expand_crop
from postprocess import HRNetPostProcess
from visualize import draw_pose
from utils import argsparser, Timer, get_current_memory_mb
class Detector(object):
"""
Args:
pred_config (object): config of model, defined by `Config(model_dir)`
model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
batch_size (int): size of pre batch in inference
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
calibration, trt_calib_mode need to set True
cpu_threads (int): cpu threads
enable_mkldnn (bool): whether to open MKLDNN
"""
def __init__(self,
pred_config,
model_dir,
device='CPU',
run_mode='paddle',
batch_size=1,
trt_min_shape=1,
trt_max_shape=1280,
trt_opt_shape=640,
trt_calib_mode=False,
cpu_threads=1,
enable_mkldnn=False,
use_dark=True):
self.pred_config = pred_config
self.predictor, self.config = load_predictor(
model_dir,
run_mode=run_mode,
batch_size=batch_size,
min_subgraph_size=self.pred_config.min_subgraph_size,
device=device,
use_dynamic_shape=self.pred_config.use_dynamic_shape,
trt_min_shape=trt_min_shape,
trt_max_shape=trt_max_shape,
trt_opt_shape=trt_opt_shape,
trt_calib_mode=trt_calib_mode,
cpu_threads=cpu_threads,
enable_mkldnn=enable_mkldnn)
self.det_times = Timer()
self.cpu_mem, self.gpu_mem, self.gpu_util = 0, 0, 0
self.use_dark = use_dark
def preprocess(self, image_list):
preprocess_ops = []
for op_info in self.pred_config.preprocess_infos:
new_op_info = op_info.copy()
op_type = new_op_info.pop('type')
preprocess_ops.append(eval(op_type)(**new_op_info))
input_im_lst = []
input_im_info_lst = []
for im_path in image_list:
im, im_info = preprocess(im_path, preprocess_ops)
input_im_lst.append(im)
input_im_info_lst.append(im_info)
inputs = create_inputs(input_im_lst, input_im_info_lst)
return inputs
def postprocess(self, np_boxes, inputs, threshold=0.5):
# postprocess output of predictor
results = {}
imshape = inputs['im_shape'][:, ::-1]
center = np.round(imshape / 2.)
scale = imshape / 200.
postprocess = HRNetPostProcess(use_dark=self.use_dark)
results['keypoint'] = postprocess(np_boxes, center, scale)
return results
def predict(self, image_list, threshold=0.5, repeats=1, add_timer=True):
'''
Args:
image_list (list): list of image
threshold (float): threshold of predicted box' score
repeats (int): repeat number for prediction
add_timer (bool): whether add timer during prediction
Returns:
results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
matix element:[class, score, x_min, y_min, x_max, y_max]
MaskRCNN's results include 'masks': np.ndarray:
shape: [N, im_h, im_w]
'''
# preprocess
if add_timer:
self.det_times.preprocess_time_s.start()
inputs = self.preprocess(image_list)
np_boxes = None
input_names = self.predictor.get_input_names()
for i in range(len(input_names)):
input_tensor = self.predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
if add_timer:
self.det_times.preprocess_time_s.end()
self.det_times.inference_time_s.start()
# model prediction
for i in range(repeats):
self.predictor.run()
output_names = self.predictor.get_output_names()
boxes_tensor = self.predictor.get_output_handle(output_names[0])
np_boxes = boxes_tensor.copy_to_cpu()
if add_timer:
self.det_times.inference_time_s.end(repeats=repeats)
self.det_times.postprocess_time_s.start()
# postprocess
results = self.postprocess(np_boxes, inputs, threshold=threshold)
if add_timer:
self.det_times.postprocess_time_s.end()
self.det_times.img_num += len(image_list)
return results
def get_timer(self):
return self.det_times
def create_inputs(imgs, im_info):
"""generate input for different model type
Args:
imgs (list(numpy)): list of images (np.ndarray)
im_info (list(dict)): list of image info
Returns:
inputs (dict): input of model
"""
inputs = {}
inputs['image'] = np.stack(imgs, axis=0)
im_shape = []
for e in im_info:
im_shape.append(np.array((e['im_shape'])).astype('float32'))
inputs['im_shape'] = np.stack(im_shape, axis=0)
return inputs
class PredictConfig():
"""set config of preprocess, postprocess and visualize
Args:
model_dir (str): root path of model.yml
"""
def __init__(self, model_dir):
# parsing Yaml config for Preprocess
deploy_file = os.path.join(model_dir, 'infer_cfg.yml')
with open(deploy_file) as f:
yml_conf = yaml.safe_load(f)
self.arch = yml_conf['arch']
self.preprocess_infos = yml_conf['Preprocess']
self.min_subgraph_size = yml_conf['min_subgraph_size']
self.labels = yml_conf['label_list']
self.use_dynamic_shape = yml_conf['use_dynamic_shape']
self.print_config()
def print_config(self):
print('----------- Model Configuration -----------')
print('%s: %s' % ('Model Arch', self.arch))
print('%s: ' % ('Transform Order'))
for op_info in self.preprocess_infos:
print('--%s: %s' % ('transform op', op_info['type']))
print('--------------------------------------------')
def load_predictor(model_dir,
run_mode='paddle',
batch_size=1,
device='CPU',
min_subgraph_size=3,
use_dynamic_shape=False,
trt_min_shape=1,
trt_max_shape=1280,
trt_opt_shape=640,
trt_calib_mode=False,
cpu_threads=1,
enable_mkldnn=False):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
use_dynamic_shape (bool): use dynamic shape or not
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
calibration, trt_calib_mode need to set True
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need device == 'GPU'.
"""
if device != 'GPU' and run_mode != 'paddle':
raise ValueError(
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
.format(run_mode, device))
config = Config(
os.path.join(model_dir, 'model.pdmodel'),
os.path.join(model_dir, 'model.pdiparams'))
if device == 'GPU':
# initial GPU memory(M), device ID
config.enable_use_gpu(200, 0)
# optimize graph and fuse op
config.switch_ir_optim(True)
elif device == 'XPU':
config.enable_lite_engine()
config.enable_xpu(10 * 1024 * 1024)
else:
config.disable_gpu()
config.set_cpu_math_library_num_threads(cpu_threads)
if enable_mkldnn:
try:
# cache 10 different shapes for mkldnn to avoid memory leak
config.set_mkldnn_cache_capacity(10)
config.enable_mkldnn()
except Exception as e:
print(
"The current environment does not support `mkldnn`, so disable mkldnn."
)
pass
precision_map = {
'trt_int8': Config.Precision.Int8,
'trt_fp32': Config.Precision.Float32,
'trt_fp16': Config.Precision.Half
}
if run_mode in precision_map.keys():
config.enable_tensorrt_engine(
workspace_size=1 << 25,
max_batch_size=batch_size,
min_subgraph_size=min_subgraph_size,
precision_mode=precision_map[run_mode],
use_static=False,
use_calib_mode=trt_calib_mode)
if use_dynamic_shape:
min_input_shape = {
'image': [batch_size, 3, trt_min_shape, trt_min_shape]
}
max_input_shape = {
'image': [batch_size, 3, trt_max_shape, trt_max_shape]
}
opt_input_shape = {
'image': [batch_size, 3, trt_opt_shape, trt_opt_shape]
}
config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
opt_input_shape)
print('trt set dynamic shape done!')
# disable print log when predict
config.disable_glog_info()
# enable shared memory
config.enable_memory_optim()
# disable feed, fetch OP, needed by zero_copy_run
config.switch_use_feed_fetch_ops(False)
predictor = create_predictor(config)
return predictor, config
def get_test_images(infer_dir, infer_img):
"""
Get image path list in TEST mode
"""
assert infer_img is not None or infer_dir is not None, \
"--infer_img or --infer_dir should be set"
assert infer_img is None or os.path.isfile(infer_img), \
"{} is not a file".format(infer_img)
assert infer_dir is None or os.path.isdir(infer_dir), \
"{} is not a directory".format(infer_dir)
# infer_img has a higher priority
if infer_img and os.path.isfile(infer_img):
return [infer_img]
images = set()
infer_dir = os.path.abspath(infer_dir)
assert os.path.isdir(infer_dir), \
"infer_dir {} is not a directory".format(infer_dir)
exts = ['jpg', 'jpeg', 'png', 'bmp']
exts += [ext.upper() for ext in exts]
for ext in exts:
images.update(glob.glob('{}/*.{}'.format(infer_dir, ext)))
images = list(images)
assert len(images) > 0, "no image found in {}".format(infer_dir)
print("Found {} inference images in total.".format(len(images)))
return images
def print_arguments(args):
print('----------- Running Arguments -----------')
for arg, value in sorted(vars(args).items()):
print('%s: %s' % (arg, value))
print('------------------------------------------')
def predict_image(detector, image_list, batch_size=1):
for i, img_file in enumerate(image_list):
if FLAGS.run_benchmark:
# warmup
detector.predict(
image_list, FLAGS.threshold, repeats=10, add_timer=False)
# run benchmark
detector.predict(
image_list, FLAGS.threshold, repeats=10, add_timer=True)
cm, gm, gu = get_current_memory_mb()
detector.cpu_mem += cm
detector.gpu_mem += gm
detector.gpu_util += gu
print('Test iter {}'.format(i))
else:
results = detector.predict(image_list, FLAGS.threshold)
draw_pose(
img_file,
results,
visual_thread=FLAGS.threshold,
save_dir=FLAGS.output_dir)
def predict_video(detector, camera_id):
video_out_name = 'output.mp4'
if camera_id != -1:
capture = cv2.VideoCapture(camera_id)
else:
capture = cv2.VideoCapture(FLAGS.video_file)
video_out_name = os.path.split(FLAGS.video_file)[-1]
# Get Video info : resolution, fps, frame count
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(capture.get(cv2.CAP_PROP_FPS))
frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
print("fps: %d, frame_count: %d" % (fps, frame_count))
if not os.path.exists(FLAGS.output_dir):
os.makedirs(FLAGS.output_dir)
out_path = os.path.join(FLAGS.output_dir, video_out_name)
fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
index = 1
while (1):
ret, frame = capture.read()
if not ret:
break
print('detect frame: %d' % (index))
index += 1
results = detector.predict([frame], FLAGS.threshold)
im = draw_pose(
frame, results, visual_thread=FLAGS.threshold, returnimg=True)
writer.write(im)
if camera_id != -1:
cv2.imshow('Mask Detection', im)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
writer.release()
def main():
pred_config = PredictConfig(FLAGS.model_dir)
detector = Detector(
pred_config,
FLAGS.model_dir,
device=FLAGS.device,
run_mode=FLAGS.run_mode,
batch_size=FLAGS.batch_size,
trt_min_shape=FLAGS.trt_min_shape,
trt_max_shape=FLAGS.trt_max_shape,
trt_opt_shape=FLAGS.trt_opt_shape,
trt_calib_mode=FLAGS.trt_calib_mode,
cpu_threads=FLAGS.cpu_threads,
enable_mkldnn=FLAGS.enable_mkldnn,
use_dark=FLAGS.use_dark)
# predict from video file or camera video stream
if FLAGS.video_file is not None or FLAGS.camera_id != -1:
predict_video(detector, FLAGS.camera_id)
else:
# predict from image
img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
predict_image(detector, img_list)
if not FLAGS.run_benchmark:
detector.det_times.info(average=True)
else:
mems = {
'cpu_rss_mb': detector.cpu_mem / len(img_list),
'gpu_rss_mb': detector.gpu_mem / len(img_list),
'gpu_util': detector.gpu_util * 100 / len(img_list)
}
perf_info = detector.det_times.report(average=True)
model_dir = FLAGS.model_dir
mode = FLAGS.run_mode
model_info = {
'model_name': model_dir.strip('/').split('/')[-1],
'precision': mode.split('_')[-1]
}
data_info = {
'batch_size': 1,
'shape': "dynamic_shape",
'data_num': perf_info['img_num']
}
det_log = PaddleInferBenchmark(detector.config, model_info,
data_info, perf_info, mems)
det_log('Det')
if __name__ == '__main__':
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
print_arguments(FLAGS)
FLAGS.device = FLAGS.device.upper()
assert FLAGS.device in ['CPU', 'GPU', 'XPU'
], "device should be CPU, GPU or XPU"
assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device"
main()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import functools
import logging
import os
import sys
import paddle.distributed as dist
__all__ = ['setup_logger']
logger_initialized = []
def setup_logger(name="ppdet", output=None):
"""
Initialize logger and set its verbosity level to INFO.
Args:
output (str): a file name or a directory to save log. If None, will not save log file.
If ends with ".txt" or ".log", assumed to be a file name.
Otherwise, logs will be saved to `output/log.txt`.
name (str): the root module name of this logger
Returns:
logging.Logger: a logger
"""
logger = logging.getLogger(name)
if name in logger_initialized:
return logger
logger.setLevel(logging.INFO)
logger.propagate = False
formatter = logging.Formatter(
"[%(asctime)s] %(name)s %(levelname)s: %(message)s",
datefmt="%m/%d %H:%M:%S")
# stdout logging: master only
local_rank = dist.get_rank()
if local_rank == 0:
ch = logging.StreamHandler(stream=sys.stdout)
ch.setLevel(logging.DEBUG)
ch.setFormatter(formatter)
logger.addHandler(ch)
# file logging: all workers
if output is not None:
if output.endswith(".txt") or output.endswith(".log"):
filename = output
else:
filename = os.path.join(output, "log.txt")
if local_rank > 0:
filename = filename + ".rank{}".format(local_rank)
os.makedirs(os.path.dirname(filename))
fh = logging.FileHandler(filename, mode='a')
fh.setLevel(logging.DEBUG)
fh.setFormatter(logging.Formatter())
logger.addHandler(fh)
logger_initialized.append(name)
return logger
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from scipy.optimize import linear_sum_assignment
from collections import abc, defaultdict
import cv2
import numpy as np
import math
import paddle
import paddle.nn as nn
from preprocess import get_affine_mat_kernel, get_affine_transform
class HRNetPostProcess(object):
def __init__(self, use_dark=True):
self.use_dark = use_dark
def flip_back(self, output_flipped, matched_parts):
assert output_flipped.ndim == 4,\
'output_flipped should be [batch_size, num_joints, height, width]'
output_flipped = output_flipped[:, :, :, ::-1]
for pair in matched_parts:
tmp = output_flipped[:, pair[0], :, :].copy()
output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
output_flipped[:, pair[1], :, :] = tmp
return output_flipped
def get_max_preds(self, heatmaps):
"""get predictions from score maps
Args:
heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
Returns:
preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints
"""
assert isinstance(heatmaps,
np.ndarray), 'heatmaps should be numpy.ndarray'
assert heatmaps.ndim == 4, 'batch_images should be 4-ndim'
batch_size = heatmaps.shape[0]
num_joints = heatmaps.shape[1]
width = heatmaps.shape[3]
heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1))
idx = np.argmax(heatmaps_reshaped, 2)
maxvals = np.amax(heatmaps_reshaped, 2)
maxvals = maxvals.reshape((batch_size, num_joints, 1))
idx = idx.reshape((batch_size, num_joints, 1))
preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
preds[:, :, 0] = (preds[:, :, 0]) % width
preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
pred_mask = pred_mask.astype(np.float32)
preds *= pred_mask
return preds, maxvals
def gaussian_blur(self, heatmap, kernel):
border = (kernel - 1) // 2
batch_size = heatmap.shape[0]
num_joints = heatmap.shape[1]
height = heatmap.shape[2]
width = heatmap.shape[3]
for i in range(batch_size):
for j in range(num_joints):
origin_max = np.max(heatmap[i, j])
dr = np.zeros((height + 2 * border, width + 2 * border))
dr[border:-border, border:-border] = heatmap[i, j].copy()
dr = cv2.GaussianBlur(dr, (kernel, kernel), 0)
heatmap[i, j] = dr[border:-border, border:-border].copy()
heatmap[i, j] *= origin_max / np.max(heatmap[i, j])
return heatmap
def dark_parse(self, hm, coord):
heatmap_height = hm.shape[0]
heatmap_width = hm.shape[1]
px = int(coord[0])
py = int(coord[1])
if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2:
dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1])
dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px])
dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2])
dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \
+ hm[py-1][px-1])
dyy = 0.25 * (
hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px])
derivative = np.matrix([[dx], [dy]])
hessian = np.matrix([[dxx, dxy], [dxy, dyy]])
if dxx * dyy - dxy**2 != 0:
hessianinv = hessian.I
offset = -hessianinv * derivative
offset = np.squeeze(np.array(offset.T), axis=0)
coord += offset
return coord
def dark_postprocess(self, hm, coords, kernelsize):
"""
refer to https://github.com/ilovepose/DarkPose/lib/core/inference.py
"""
hm = self.gaussian_blur(hm, kernelsize)
hm = np.maximum(hm, 1e-10)
hm = np.log(hm)
for n in range(coords.shape[0]):
for p in range(coords.shape[1]):
coords[n, p] = self.dark_parse(hm[n][p], coords[n][p])
return coords
def get_final_preds(self, heatmaps, center, scale, kernelsize=3):
"""the highest heatvalue location with a quarter offset in the
direction from the highest response to the second highest response.
Args:
heatmaps (numpy.ndarray): The predicted heatmaps
center (numpy.ndarray): The boxes center
scale (numpy.ndarray): The scale factor
Returns:
preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints
"""
coords, maxvals = self.get_max_preds(heatmaps)
heatmap_height = heatmaps.shape[2]
heatmap_width = heatmaps.shape[3]
if self.use_dark:
coords = self.dark_postprocess(heatmaps, coords, kernelsize)
else:
for n in range(coords.shape[0]):
for p in range(coords.shape[1]):
hm = heatmaps[n][p]
px = int(math.floor(coords[n][p][0] + 0.5))
py = int(math.floor(coords[n][p][1] + 0.5))
if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1:
diff = np.array([
hm[py][px + 1] - hm[py][px - 1],
hm[py + 1][px] - hm[py - 1][px]
])
coords[n][p] += np.sign(diff) * .25
preds = coords.copy()
# Transform back
for i in range(coords.shape[0]):
preds[i] = transform_preds(coords[i], center[i], scale[i],
[heatmap_width, heatmap_height])
return preds, maxvals
def __call__(self, output, center, scale):
preds, maxvals = self.get_final_preds(output, center, scale)
return np.concatenate(
(preds, maxvals), axis=-1), np.mean(
maxvals, axis=1)
def transform_preds(coords, center, scale, output_size):
target_coords = np.zeros(coords.shape)
trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1)
for p in range(coords.shape[0]):
target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
return target_coords
def affine_transform(pt, t):
new_pt = np.array([pt[0], pt[1], 1.]).T
new_pt = np.dot(t, new_pt)
return new_pt[:2]
def translate_to_ori_images(keypoint_result, batch_records):
kpts, scores = keypoint_result['keypoint']
kpts[..., 0] += batch_records[:, 0:1]
kpts[..., 1] += batch_records[:, 1:2]
return kpts, scores
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import cv2
import numpy as np
def decode_image(im_file, im_info):
"""read rgb image
Args:
im_file (str|np.ndarray): input can be image path or np.ndarray
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
if isinstance(im_file, str):
with open(im_file, 'rb') as f:
im_read = f.read()
data = np.frombuffer(im_read, dtype='uint8')
im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
else:
im = im_file
im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32)
return im, im_info
class Resize(object):
"""resize image by target_size and max_size
Args:
target_size (int): the target size of image
keep_ratio (bool): whether keep_ratio or not, default true
interp (int): method of resize
"""
def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR):
if isinstance(target_size, int):
target_size = [target_size, target_size]
self.target_size = target_size
self.keep_ratio = keep_ratio
self.interp = interp
def __call__(self, im, im_info):
"""
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
assert len(self.target_size) == 2
assert self.target_size[0] > 0 and self.target_size[1] > 0
im_channel = im.shape[2]
im_scale_y, im_scale_x = self.generate_scale(im)
im = cv2.resize(
im,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp)
im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
im_info['scale_factor'] = np.array(
[im_scale_y, im_scale_x]).astype('float32')
return im, im_info
def generate_scale(self, im):
"""
Args:
im (np.ndarray): image (np.ndarray)
Returns:
im_scale_x: the resize ratio of X
im_scale_y: the resize ratio of Y
"""
origin_shape = im.shape[:2]
im_c = im.shape[2]
if self.keep_ratio:
im_size_min = np.min(origin_shape)
im_size_max = np.max(origin_shape)
target_size_min = np.min(self.target_size)
target_size_max = np.max(self.target_size)
im_scale = float(target_size_min) / float(im_size_min)
if np.round(im_scale * im_size_max) > target_size_max:
im_scale = float(target_size_max) / float(im_size_max)
im_scale_x = im_scale
im_scale_y = im_scale
else:
resize_h, resize_w = self.target_size
im_scale_y = resize_h / float(origin_shape[0])
im_scale_x = resize_w / float(origin_shape[1])
return im_scale_y, im_scale_x
class NormalizeImage(object):
"""normalize image
Args:
mean (list): im - mean
std (list): im / std
is_scale (bool): whether need im / 255
is_channel_first (bool): if True: image shape is CHW, else: HWC
"""
def __init__(self, mean, std, is_scale=True):
self.mean = mean
self.std = std
self.is_scale = is_scale
def __call__(self, im, im_info):
"""
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
im = im.astype(np.float32, copy=False)
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
if self.is_scale:
im = im / 255.0
im -= mean
im /= std
return im, im_info
class Permute(object):
"""permute image
Args:
to_bgr (bool): whether convert RGB to BGR
channel_first (bool): whether convert HWC to CHW
"""
def __init__(self, ):
super(Permute, self).__init__()
def __call__(self, im, im_info):
"""
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
im = im.transpose((2, 0, 1)).copy()
return im, im_info
class PadStride(object):
""" padding image for model with FPN, instead PadBatch(pad_to_stride) in original config
Args:
stride (bool): model with FPN need image shape % stride == 0
"""
def __init__(self, stride=0):
self.coarsest_stride = stride
def __call__(self, im, im_info):
"""
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
coarsest_stride = self.coarsest_stride
if coarsest_stride <= 0:
return im, im_info
im_c, im_h, im_w = im.shape
pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
padding_im[:, :im_h, :im_w] = im
return padding_im, im_info
class WarpAffine(object):
"""Warp affine the image
"""
def __init__(self,
keep_res=False,
pad=31,
input_h=512,
input_w=512,
scale=0.4,
shift=0.1):
self.keep_res = keep_res
self.pad = pad
self.input_h = input_h
self.input_w = input_w
self.scale = scale
self.shift = shift
def __call__(self, im, im_info):
"""
Args:
im (np.ndarray): image (np.ndarray)
im_info (dict): info of image
Returns:
im (np.ndarray): processed image (np.ndarray)
im_info (dict): info of processed image
"""
img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
h, w = img.shape[:2]
if self.keep_res:
input_h = (h | self.pad) + 1
input_w = (w | self.pad) + 1
s = np.array([input_w, input_h], dtype=np.float32)
c = np.array([w // 2, h // 2], dtype=np.float32)
else:
s = max(h, w) * 1.0
input_h, input_w = self.input_h, self.input_w
c = np.array([w / 2., h / 2.], dtype=np.float32)
trans_input = get_affine_transform(c, s, 0, [input_w, input_h])
img = cv2.resize(img, (w, h))
inp = cv2.warpAffine(
img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR)
return inp, im_info
class EvalAffine(object):
def __init__(self, size, stride=64):
super(EvalAffine, self).__init__()
self.size = size
self.stride = stride
def __call__(self, image, im_info):
s = self.size
h, w, _ = image.shape
trans, size_resized = get_affine_mat_kernel(h, w, s, inv=False)
image_resized = cv2.warpAffine(image, trans, size_resized)
return image_resized, im_info
def get_affine_mat_kernel(h, w, s, inv=False):
if w < h:
w_ = s
h_ = int(np.ceil((s / w * h) / 64.) * 64)
scale_w = w
scale_h = h_ / w_ * w
else:
h_ = s
w_ = int(np.ceil((s / h * w) / 64.) * 64)
scale_h = h
scale_w = w_ / h_ * h
center = np.array([np.round(w / 2.), np.round(h / 2.)])
size_resized = (w_, h_)
trans = get_affine_transform(
center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv)
return trans, size_resized
def get_affine_transform(center,
input_size,
rot,
output_size,
shift=(0., 0.),
inv=False):
"""Get the affine transform matrix, given the center/scale/rot/output_size.
Args:
center (np.ndarray[2, ]): Center of the bounding box (x, y).
scale (np.ndarray[2, ]): Scale of the bounding box
wrt [width, height].
rot (float): Rotation angle (degree).
output_size (np.ndarray[2, ]): Size of the destination heatmaps.
shift (0-100%): Shift translation ratio wrt the width/height.
Default (0., 0.).
inv (bool): Option to inverse the affine transform direction.
(inv=False: src->dst or inv=True: dst->src)
Returns:
np.ndarray: The transform matrix.
"""
assert len(center) == 2
assert len(output_size) == 2
assert len(shift) == 2
if not isinstance(input_size, (np.ndarray, list)):
input_size = np.array([input_size, input_size], dtype=np.float32)
scale_tmp = input_size
shift = np.array(shift)
src_w = scale_tmp[0]
dst_w = output_size[0]
dst_h = output_size[1]
rot_rad = np.pi * rot / 180
src_dir = rotate_point([0., src_w * -0.5], rot_rad)
dst_dir = np.array([0., dst_w * -0.5])
src = np.zeros((3, 2), dtype=np.float32)
src[0, :] = center + scale_tmp * shift
src[1, :] = center + src_dir + scale_tmp * shift
src[2, :] = _get_3rd_point(src[0, :], src[1, :])
dst = np.zeros((3, 2), dtype=np.float32)
dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
if inv:
trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
else:
trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
return trans
def get_warp_matrix(theta, size_input, size_dst, size_target):
"""This code is based on
https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py
Calculate the transformation matrix under the constraint of unbiased.
Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
Data Processing for Human Pose Estimation (CVPR 2020).
Args:
theta (float): Rotation angle in degrees.
size_input (np.ndarray): Size of input image [w, h].
size_dst (np.ndarray): Size of output image [w, h].
size_target (np.ndarray): Size of ROI in input plane [w, h].
Returns:
matrix (np.ndarray): A matrix for transformation.
"""
theta = np.deg2rad(theta)
matrix = np.zeros((2, 3), dtype=np.float32)
scale_x = size_dst[0] / size_target[0]
scale_y = size_dst[1] / size_target[1]
matrix[0, 0] = np.cos(theta) * scale_x
matrix[0, 1] = -np.sin(theta) * scale_x
matrix[0, 2] = scale_x * (
-0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
np.sin(theta) + 0.5 * size_target[0])
matrix[1, 0] = np.sin(theta) * scale_y
matrix[1, 1] = np.cos(theta) * scale_y
matrix[1, 2] = scale_y * (
-0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
np.cos(theta) + 0.5 * size_target[1])
return matrix
def rotate_point(pt, angle_rad):
"""Rotate a point by an angle.
Args:
pt (list[float]): 2 dimensional point to be rotated
angle_rad (float): rotation angle by radian
Returns:
list[float]: Rotated point.
"""
assert len(pt) == 2
sn, cs = np.sin(angle_rad), np.cos(angle_rad)
new_x = pt[0] * cs - pt[1] * sn
new_y = pt[0] * sn + pt[1] * cs
rotated_pt = [new_x, new_y]
return rotated_pt
def _get_3rd_point(a, b):
"""To calculate the affine matrix, three pairs of points are required. This
function is used to get the 3rd point, given 2D points a & b.
The 3rd point is defined by rotating vector `a - b` by 90 degrees
anticlockwise, using b as the rotation center.
Args:
a (np.ndarray): point(x,y)
b (np.ndarray): point(x,y)
Returns:
np.ndarray: The 3rd point.
"""
assert len(a) == 2
assert len(b) == 2
direction = a - b
third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
return third_pt
class TopDownEvalAffine(object):
"""apply affine transform to image and coords
Args:
trainsize (list): [w, h], the standard size used to train
use_udp (bool): whether to use Unbiased Data Processing.
records(dict): the dict contained the image and coords
Returns:
records (dict): contain the image and coords after tranformed
"""
def __init__(self, trainsize, use_udp=False):
self.trainsize = trainsize
self.use_udp = use_udp
def __call__(self, image, im_info):
rot = 0
imshape = im_info['im_shape'][::-1]
center = im_info['center'] if 'center' in im_info else imshape / 2.
scale = im_info['scale'] if 'scale' in im_info else imshape
if self.use_udp:
trans = get_warp_matrix(
rot, center * 2.0,
[self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
image = cv2.warpAffine(
image,
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
flags=cv2.INTER_LINEAR)
else:
trans = get_affine_transform(center, scale, rot, self.trainsize)
image = cv2.warpAffine(
image,
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
flags=cv2.INTER_LINEAR)
return image, im_info
def expand_crop(images, rect, expand_ratio=0.3):
imgh, imgw, c = images.shape
label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()]
if label != 0:
return None, None, None
org_rect = [xmin, ymin, xmax, ymax]
h_half = (ymax - ymin) * (1 + expand_ratio) / 2.
w_half = (xmax - xmin) * (1 + expand_ratio) / 2.
if h_half > w_half * 4 / 3:
w_half = h_half * 0.75
center = [(ymin + ymax) / 2., (xmin + xmax) / 2.]
ymin = max(0, int(center[0] - h_half))
ymax = min(imgh - 1, int(center[0] + h_half))
xmin = max(0, int(center[1] - w_half))
xmax = min(imgw - 1, int(center[1] + w_half))
return images[ymin:ymax, xmin:xmax, :], [xmin, ymin, xmax, ymax], org_rect
class EvalAffine(object):
def __init__(self, size, stride=64):
super(EvalAffine, self).__init__()
self.size = size
self.stride = stride
def __call__(self, image, im_info):
s = self.size
h, w, _ = image.shape
trans, size_resized = get_affine_mat_kernel(h, w, s, inv=False)
image_resized = cv2.warpAffine(image, trans, size_resized)
return image_resized, im_info
def get_affine_mat_kernel(h, w, s, inv=False):
if w < h:
w_ = s
h_ = int(np.ceil((s / w * h) / 64.) * 64)
scale_w = w
scale_h = h_ / w_ * w
else:
h_ = s
w_ = int(np.ceil((s / h * w) / 64.) * 64)
scale_h = h
scale_w = w_ / h_ * h
center = np.array([np.round(w / 2.), np.round(h / 2.)])
size_resized = (w_, h_)
trans = get_affine_transform(
center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv)
return trans, size_resized
def get_affine_transform(center,
input_size,
rot,
output_size,
shift=(0., 0.),
inv=False):
"""Get the affine transform matrix, given the center/scale/rot/output_size.
Args:
center (np.ndarray[2, ]): Center of the bounding box (x, y).
scale (np.ndarray[2, ]): Scale of the bounding box
wrt [width, height].
rot (float): Rotation angle (degree).
output_size (np.ndarray[2, ]): Size of the destination heatmaps.
shift (0-100%): Shift translation ratio wrt the width/height.
Default (0., 0.).
inv (bool): Option to inverse the affine transform direction.
(inv=False: src->dst or inv=True: dst->src)
Returns:
np.ndarray: The transform matrix.
"""
assert len(center) == 2
assert len(output_size) == 2
assert len(shift) == 2
if not isinstance(input_size, (np.ndarray, list)):
input_size = np.array([input_size, input_size], dtype=np.float32)
scale_tmp = input_size
shift = np.array(shift)
src_w = scale_tmp[0]
dst_w = output_size[0]
dst_h = output_size[1]
rot_rad = np.pi * rot / 180
src_dir = rotate_point([0., src_w * -0.5], rot_rad)
dst_dir = np.array([0., dst_w * -0.5])
src = np.zeros((3, 2), dtype=np.float32)
src[0, :] = center + scale_tmp * shift
src[1, :] = center + src_dir + scale_tmp * shift
src[2, :] = _get_3rd_point(src[0, :], src[1, :])
dst = np.zeros((3, 2), dtype=np.float32)
dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
if inv:
trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
else:
trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
return trans
def get_warp_matrix(theta, size_input, size_dst, size_target):
"""This code is based on
https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py
Calculate the transformation matrix under the constraint of unbiased.
Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
Data Processing for Human Pose Estimation (CVPR 2020).
Args:
theta (float): Rotation angle in degrees.
size_input (np.ndarray): Size of input image [w, h].
size_dst (np.ndarray): Size of output image [w, h].
size_target (np.ndarray): Size of ROI in input plane [w, h].
Returns:
matrix (np.ndarray): A matrix for transformation.
"""
theta = np.deg2rad(theta)
matrix = np.zeros((2, 3), dtype=np.float32)
scale_x = size_dst[0] / size_target[0]
scale_y = size_dst[1] / size_target[1]
matrix[0, 0] = np.cos(theta) * scale_x
matrix[0, 1] = -np.sin(theta) * scale_x
matrix[0, 2] = scale_x * (
-0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
np.sin(theta) + 0.5 * size_target[0])
matrix[1, 0] = np.sin(theta) * scale_y
matrix[1, 1] = np.cos(theta) * scale_y
matrix[1, 2] = scale_y * (
-0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
np.cos(theta) + 0.5 * size_target[1])
return matrix
def rotate_point(pt, angle_rad):
"""Rotate a point by an angle.
Args:
pt (list[float]): 2 dimensional point to be rotated
angle_rad (float): rotation angle by radian
Returns:
list[float]: Rotated point.
"""
assert len(pt) == 2
sn, cs = np.sin(angle_rad), np.cos(angle_rad)
new_x = pt[0] * cs - pt[1] * sn
new_y = pt[0] * sn + pt[1] * cs
rotated_pt = [new_x, new_y]
return rotated_pt
def _get_3rd_point(a, b):
"""To calculate the affine matrix, three pairs of points are required. This
function is used to get the 3rd point, given 2D points a & b.
The 3rd point is defined by rotating vector `a - b` by 90 degrees
anticlockwise, using b as the rotation center.
Args:
a (np.ndarray): point(x,y)
b (np.ndarray): point(x,y)
Returns:
np.ndarray: The 3rd point.
"""
assert len(a) == 2
assert len(b) == 2
direction = a - b
third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
return third_pt
class TopDownEvalAffine(object):
"""apply affine transform to image and coords
Args:
trainsize (list): [w, h], the standard size used to train
use_udp (bool): whether to use Unbiased Data Processing.
records(dict): the dict contained the image and coords
Returns:
records (dict): contain the image and coords after tranformed
"""
def __init__(self, trainsize, use_udp=False):
self.trainsize = trainsize
self.use_udp = use_udp
def __call__(self, image, im_info):
rot = 0
imshape = im_info['im_shape'][::-1]
center = im_info['center'] if 'center' in im_info else imshape / 2.
scale = im_info['scale'] if 'scale' in im_info else imshape
if self.use_udp:
trans = get_warp_matrix(
rot, center * 2.0,
[self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
image = cv2.warpAffine(
image,
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
flags=cv2.INTER_LINEAR)
else:
trans = get_affine_transform(center, scale, rot, self.trainsize)
image = cv2.warpAffine(
image,
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
flags=cv2.INTER_LINEAR)
return image, im_info
def expand_crop(images, rect, expand_ratio=0.3):
imgh, imgw, c = images.shape
label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()]
if label != 0:
return None, None, None
org_rect = [xmin, ymin, xmax, ymax]
h_half = (ymax - ymin) * (1 + expand_ratio) / 2.
w_half = (xmax - xmin) * (1 + expand_ratio) / 2.
if h_half > w_half * 4 / 3:
w_half = h_half * 0.75
center = [(ymin + ymax) / 2., (xmin + xmax) / 2.]
ymin = max(0, int(center[0] - h_half))
ymax = min(imgh - 1, int(center[0] + h_half))
xmin = max(0, int(center[1] - w_half))
xmax = min(imgw - 1, int(center[1] + w_half))
return images[ymin:ymax, xmin:xmax, :], [xmin, ymin, xmax, ymax], org_rect
def preprocess(im, preprocess_ops):
# process image by preprocess_ops
im_info = {
'scale_factor': np.array(
[1., 1.], dtype=np.float32),
'im_shape': None,
}
im, im_info = decode_image(im, im_info)
for operator in preprocess_ops:
im, im_info = operator(im, im_info)
return im, im_info
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import os
import ast
import argparse
def argsparser():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--model_dir",
type=str,
default=None,
help=("Directory include:'model.pdiparams', 'model.pdmodel', "
"'infer_cfg.yml', created by tools/export_model.py."),
required=True)
parser.add_argument(
"--image_file", type=str, default=None, help="Path of image file.")
parser.add_argument(
"--image_dir",
type=str,
default=None,
help="Dir of image file, `image_file` has a higher priority.")
parser.add_argument(
"--batch_size", type=int, default=1, help="batch_size for inference.")
parser.add_argument(
"--video_file",
type=str,
default=None,
help="Path of video file, `video_file` or `camera_id` has a highest priority."
)
parser.add_argument(
"--camera_id",
type=int,
default=-1,
help="device id of camera to predict.")
parser.add_argument(
"--threshold", type=float, default=0.5, help="Threshold of score.")
parser.add_argument(
"--output_dir",
type=str,
default="output",
help="Directory of output visualization files.")
parser.add_argument(
"--run_mode",
type=str,
default='paddle',
help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
parser.add_argument(
"--device",
type=str,
default='cpu',
help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU."
)
parser.add_argument(
"--use_gpu",
type=ast.literal_eval,
default=False,
help="Deprecated, please use `--device`.")
parser.add_argument(
"--run_benchmark",
type=ast.literal_eval,
default=False,
help="Whether to predict a image_file repeatedly for benchmark")
parser.add_argument(
"--enable_mkldnn",
type=ast.literal_eval,
default=False,
help="Whether use mkldnn with CPU.")
parser.add_argument(
"--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
parser.add_argument(
"--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
parser.add_argument(
"--trt_max_shape",
type=int,
default=1280,
help="max_shape for TensorRT.")
parser.add_argument(
"--trt_opt_shape",
type=int,
default=640,
help="opt_shape for TensorRT.")
parser.add_argument(
"--trt_calib_mode",
type=bool,
default=False,
help="If the model is produced by TRT offline quantitative "
"calibration, trt_calib_mode need to set True.")
parser.add_argument(
'--save_images',
action='store_true',
help='Save visualization image results.')
parser.add_argument(
'--use_dark',
type=bool,
default=True,
help='whether to use darkpose to get better keypoint position predict ')
return parser
class Times(object):
def __init__(self):
self.time = 0.
# start time
self.st = 0.
# end time
self.et = 0.
def start(self):
self.st = time.time()
def end(self, repeats=1, accumulative=True):
self.et = time.time()
if accumulative:
self.time += (self.et - self.st) / repeats
else:
self.time = (self.et - self.st) / repeats
def reset(self):
self.time = 0.
self.st = 0.
self.et = 0.
def value(self):
return round(self.time, 4)
class Timer(Times):
def __init__(self):
super(Timer, self).__init__()
self.preprocess_time_s = Times()
self.inference_time_s = Times()
self.postprocess_time_s = Times()
self.img_num = 0
def info(self, average=False):
total_time = self.preprocess_time_s.value(
) + self.inference_time_s.value() + self.postprocess_time_s.value()
total_time = round(total_time, 4)
print("------------------ Inference Time Info ----------------------")
print("total_time(ms): {}, img_num: {}".format(total_time * 1000,
self.img_num))
preprocess_time = round(
self.preprocess_time_s.value() / max(1, self.img_num),
4) if average else self.preprocess_time_s.value()
postprocess_time = round(
self.postprocess_time_s.value() / max(1, self.img_num),
4) if average else self.postprocess_time_s.value()
inference_time = round(self.inference_time_s.value() /
max(1, self.img_num),
4) if average else self.inference_time_s.value()
average_latency = total_time / max(1, self.img_num)
qps = 0
if total_time > 0:
qps = 1 / average_latency
print("average latency time(ms): {:.2f}, QPS: {:2f}".format(
average_latency * 1000, qps))
print(
"preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}".
format(preprocess_time * 1000, inference_time * 1000,
postprocess_time * 1000))
def report(self, average=False):
dic = {}
dic['preprocess_time_s'] = round(
self.preprocess_time_s.value() / max(1, self.img_num),
4) if average else self.preprocess_time_s.value()
dic['postprocess_time_s'] = round(
self.postprocess_time_s.value() / max(1, self.img_num),
4) if average else self.postprocess_time_s.value()
dic['inference_time_s'] = round(
self.inference_time_s.value() / max(1, self.img_num),
4) if average else self.inference_time_s.value()
dic['img_num'] = self.img_num
total_time = self.preprocess_time_s.value(
) + self.inference_time_s.value() + self.postprocess_time_s.value()
dic['total_time_s'] = round(total_time, 4)
return dic
def get_current_memory_mb():
"""
It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
And this function Current program is time-consuming.
"""
import pynvml
import psutil
import GPUtil
gpu_id = int(os.environ.get('CUDA_VISIBLE_DEVICES', 0))
pid = os.getpid()
p = psutil.Process(pid)
info = p.memory_full_info()
cpu_mem = info.uss / 1024. / 1024.
gpu_mem = 0
gpu_percent = 0
gpus = GPUtil.getGPUs()
if gpu_id is not None and len(gpus) > 0:
gpu_percent = gpus[gpu_id].load
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
gpu_mem = meminfo.used / 1024. / 1024.
return round(cpu_mem, 4), round(gpu_mem, 4), round(gpu_percent, 4)
# coding: utf-8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import division
import os
import cv2
import numpy as np
from PIL import Image, ImageDraw, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
import math
def get_color(idx):
idx = idx * 3
color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
return color
def draw_pose(imgfile,
results,
visual_thread=0.6,
save_name='pose.jpg',
save_dir='output',
returnimg=False,
ids=None):
try:
import matplotlib.pyplot as plt
import matplotlib
plt.switch_backend('agg')
except Exception as e:
logger.error('Matplotlib not found, please install matplotlib.'
'for example: `pip install matplotlib`.')
raise e
skeletons, scores = results['keypoint']
skeletons = np.array(skeletons)
kpt_nums = 17
if len(skeletons) > 0:
kpt_nums = skeletons.shape[1]
if kpt_nums == 17: #plot coco keypoint
EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7),
(6, 8), (7, 9), (8, 10), (5, 11), (6, 12), (11, 13), (12, 14),
(13, 15), (14, 16), (11, 12)]
else: #plot mpii keypoint
EDGES = [(0, 1), (1, 2), (3, 4), (4, 5), (2, 6), (3, 6), (6, 7),
(7, 8), (8, 9), (10, 11), (11, 12), (13, 14), (14, 15),
(8, 12), (8, 13)]
NUM_EDGES = len(EDGES)
colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
[0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
[170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
cmap = matplotlib.cm.get_cmap('hsv')
plt.figure()
img = cv2.imread(imgfile) if type(imgfile) == str else imgfile
color_set = results['colors'] if 'colors' in results else None
if 'bbox' in results and ids is None:
bboxs = results['bbox']
for j, rect in enumerate(bboxs):
xmin, ymin, xmax, ymax = rect
color = colors[0] if color_set is None else colors[color_set[j] %
len(colors)]
cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1)
canvas = img.copy()
for i in range(kpt_nums):
for j in range(len(skeletons)):
if skeletons[j][i, 2] < visual_thread:
continue
if ids is None:
color = colors[i] if color_set is None else colors[color_set[j]
%
len(colors)]
else:
color = get_color(ids[j])
cv2.circle(
canvas,
tuple(skeletons[j][i, 0:2].astype('int32')),
2,
color,
thickness=-1)
to_plot = cv2.addWeighted(img, 0.3, canvas, 0.7, 0)
fig = matplotlib.pyplot.gcf()
stickwidth = 2
for i in range(NUM_EDGES):
for j in range(len(skeletons)):
edge = EDGES[i]
if skeletons[j][edge[0], 2] < visual_thread or skeletons[j][edge[
1], 2] < visual_thread:
continue
cur_canvas = canvas.copy()
X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]]
Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]]
mX = np.mean(X)
mY = np.mean(Y)
length = ((X[0] - X[1])**2 + (Y[0] - Y[1])**2)**0.5
angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
polygon = cv2.ellipse2Poly((int(mY), int(mX)),
(int(length / 2), stickwidth),
int(angle), 0, 360, 1)
if ids is None:
color = colors[i] if color_set is None else colors[color_set[j]
%
len(colors)]
else:
color = get_color(ids[j])
cv2.fillConvexPoly(cur_canvas, polygon, color)
canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0)
if returnimg:
return canvas
save_name = os.path.join(
save_dir, os.path.splitext(os.path.basename(imgfile))[0] + '_vis.jpg')
plt.imsave(save_name, canvas[:, :, ::-1])
print("keypoint visualize image saved to: " + save_name)
plt.close()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import lib.utils
import lib.models
import lib.metrics
import lib.dataset
import lib.core
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import callbacks
from . import optimizer
from . import trainer
from .callbacks import *
from .optimizer import *
from .trainer import *
__all__ = callbacks.__all__ \
+ optimizer.__all__ + trainer.__all__
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import datetime
import six
import copy
import json
import paddle
import paddle.distributed as dist
from lib.utils.checkpoint import save_model
from lib.metrics.coco_utils import get_infer_results
from lib.utils.logger import setup_logger
logger = setup_logger('hrnet')
__all__ = [
'Callback', 'ComposeCallback', 'LogPrinter', 'Checkpointer',
'VisualDLWriter'
]
class Callback(object):
def __init__(self, model):
self.model = model
def on_step_begin(self, status):
pass
def on_step_end(self, status):
pass
def on_epoch_begin(self, status):
pass
def on_epoch_end(self, status):
pass
def on_train_begin(self, status):
pass
def on_train_end(self, status):
pass
class ComposeCallback(object):
def __init__(self, callbacks):
callbacks = [c for c in list(callbacks) if c is not None]
for c in callbacks:
assert isinstance(
c, Callback), "callback should be subclass of Callback"
self._callbacks = callbacks
def on_step_begin(self, status):
for c in self._callbacks:
c.on_step_begin(status)
def on_step_end(self, status):
for c in self._callbacks:
c.on_step_end(status)
def on_epoch_begin(self, status):
for c in self._callbacks:
c.on_epoch_begin(status)
def on_epoch_end(self, status):
for c in self._callbacks:
c.on_epoch_end(status)
def on_train_begin(self, status):
for c in self._callbacks:
c.on_train_begin(status)
def on_train_end(self, status):
for c in self._callbacks:
c.on_train_end(status)
class LogPrinter(Callback):
def __init__(self, model):
super(LogPrinter, self).__init__(model)
def on_step_end(self, status):
if dist.get_world_size() < 2 or dist.get_rank() == 0:
mode = status['mode']
if mode == 'train':
epoch_id = status['epoch_id']
step_id = status['step_id']
steps_per_epoch = status['steps_per_epoch']
training_staus = status['training_staus']
batch_time = status['batch_time']
data_time = status['data_time']
epoches = self.model.cfg.epoch
batch_size = self.model.cfg['{}Reader'.format(mode.capitalize(
))]['batch_size']
logs = training_staus.log()
space_fmt = ':' + str(len(str(steps_per_epoch))) + 'd'
if step_id % self.model.cfg.log_iter == 0:
eta_steps = (epoches - epoch_id
) * steps_per_epoch - step_id
eta_sec = eta_steps * batch_time.global_avg
eta_str = str(datetime.timedelta(seconds=int(eta_sec)))
ips = float(batch_size) / batch_time.avg
fmt = ' '.join([
'Epoch: [{}]',
'[{' + space_fmt + '}/{}]',
'learning_rate: {lr:.6f}',
'{meters}',
'eta: {eta}',
'batch_cost: {btime}',
'data_cost: {dtime}',
'ips: {ips:.4f} images/s',
])
fmt = fmt.format(
epoch_id,
step_id,
steps_per_epoch,
lr=status['learning_rate'],
meters=logs,
eta=eta_str,
btime=str(batch_time),
dtime=str(data_time),
ips=ips)
logger.info(fmt)
if mode == 'eval':
step_id = status['step_id']
if step_id % 100 == 0:
logger.info("Eval iter: {}".format(step_id))
def on_epoch_end(self, status):
if dist.get_world_size() < 2 or dist.get_rank() == 0:
mode = status['mode']
if mode == 'eval':
sample_num = status['sample_num']
cost_time = status['cost_time']
logger.info('Total sample number: {}, averge FPS: {}'.format(
sample_num, sample_num / cost_time))
class Checkpointer(Callback):
def __init__(self, model):
super(Checkpointer, self).__init__(model)
cfg = self.model.cfg
self.best_ap = 0.
self.save_dir = os.path.join(self.model.cfg.save_dir,
self.model.cfg.filename)
if hasattr(self.model.model, 'student_model'):
self.weight = self.model.model.student_model
else:
self.weight = self.model.model
def on_epoch_end(self, status):
# Checkpointer only performed during training
mode = status['mode']
epoch_id = status['epoch_id']
weight = None
save_name = None
if dist.get_world_size() < 2 or dist.get_rank() == 0:
if mode == 'train':
end_epoch = self.model.cfg.epoch
if (
epoch_id + 1
) % self.model.cfg.snapshot_epoch == 0 or epoch_id == end_epoch - 1:
save_name = str(
epoch_id
) if epoch_id != end_epoch - 1 else "model_final"
weight = self.weight
elif mode == 'eval':
if 'save_best_model' in status and status['save_best_model']:
for metric in self.model._metrics:
map_res = metric.get_results()
if 'bbox' in map_res:
key = 'bbox'
elif 'keypoint' in map_res:
key = 'keypoint'
else:
key = 'mask'
if key not in map_res:
logger.warning("Evaluation results empty, this may be due to " \
"training iterations being too few or not " \
"loading the correct weights.")
return
if map_res[key][0] > self.best_ap:
self.best_ap = map_res[key][0]
save_name = 'best_model'
weight = self.weight
logger.info("Best test {} ap is {:0.3f}.".format(
key, self.best_ap))
if weight:
save_model(weight, self.model.optimizer, self.save_dir,
save_name, epoch_id + 1)
class VisualDLWriter(Callback):
"""
Use VisualDL to log data or image
"""
def __init__(self, model):
super(VisualDLWriter, self).__init__(model)
assert six.PY3, "VisualDL requires Python >= 3.5"
try:
from visualdl import LogWriter
except Exception as e:
logger.error('visualdl not found, plaese install visualdl. '
'for example: `pip install visualdl`.')
raise e
self.vdl_writer = LogWriter(
model.cfg.get('vdl_log_dir', 'vdl_log_dir/scalar'))
self.vdl_loss_step = 0
self.vdl_mAP_step = 0
self.vdl_image_step = 0
self.vdl_image_frame = 0
def on_step_end(self, status):
mode = status['mode']
if dist.get_world_size() < 2 or dist.get_rank() == 0:
if mode == 'train':
training_staus = status['training_staus']
for loss_name, loss_value in training_staus.get().items():
self.vdl_writer.add_scalar(loss_name, loss_value,
self.vdl_loss_step)
self.vdl_loss_step += 1
elif mode == 'test':
ori_image = status['original_image']
result_image = status['result_image']
self.vdl_writer.add_image(
"original/frame_{}".format(self.vdl_image_frame),
ori_image, self.vdl_image_step)
self.vdl_writer.add_image(
"result/frame_{}".format(self.vdl_image_frame),
result_image, self.vdl_image_step)
self.vdl_image_step += 1
# each frame can display ten pictures at most.
if self.vdl_image_step % 10 == 0:
self.vdl_image_step = 0
self.vdl_image_frame += 1
def on_epoch_end(self, status):
mode = status['mode']
if dist.get_world_size() < 2 or dist.get_rank() == 0:
if mode == 'eval':
for metric in self.model._metrics:
for key, map_value in metric.get_results().items():
self.vdl_writer.add_scalar("{}-mAP".format(key),
map_value[0],
self.vdl_mAP_step)
self.vdl_mAP_step += 1
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import yaml
from collections import OrderedDict
import paddle
from lib.dataset.category import get_categories
from lib.utils.logger import setup_logger
logger = setup_logger('hrnet')
# Global dictionary
TRT_MIN_SUBGRAPH = {'HRNet': 3, }
def _prune_input_spec(input_spec, program, targets):
# try to prune static program to figure out pruned input spec
# so we perform following operations in static mode
paddle.enable_static()
pruned_input_spec = [{}]
program = program.clone()
program = program._prune(targets=targets)
global_block = program.global_block()
for name, spec in input_spec[0].items():
try:
v = global_block.var(name)
pruned_input_spec[0][name] = spec
except Exception:
pass
paddle.disable_static()
return pruned_input_spec
def _parse_reader(reader_cfg, dataset_cfg, metric, arch, image_shape):
preprocess_list = []
anno_file = dataset_cfg.get_anno()
clsid2catid, catid2name = get_categories(metric, anno_file, arch)
label_list = [str(cat) for cat in catid2name.values()]
fuse_normalize = reader_cfg.get('fuse_normalize', False)
sample_transforms = reader_cfg['sample_transforms']
for st in sample_transforms[1:]:
for key, value in st.items():
p = {'type': key}
if key == 'Resize':
if int(image_shape[1]) != -1:
value['target_size'] = image_shape[1:]
if fuse_normalize and key == 'NormalizeImage':
continue
p.update(value)
preprocess_list.append(p)
return preprocess_list, label_list
def _parse_tracker(tracker_cfg):
tracker_params = {}
for k, v in tracker_cfg.items():
tracker_params.update({k: v})
return tracker_params
def _dump_infer_config(config, path, image_shape, model):
arch_state = False
from lib.utils.config.yaml_helpers import setup_orderdict
setup_orderdict()
use_dynamic_shape = True if image_shape[2] == -1 else False
infer_cfg = OrderedDict({
'mode': 'fluid',
'draw_threshold': 0.5,
'metric': config['metric'],
'use_dynamic_shape': use_dynamic_shape
})
infer_arch = config['architecture']
for arch, min_subgraph_size in TRT_MIN_SUBGRAPH.items():
if arch in infer_arch:
infer_cfg['arch'] = arch
infer_cfg['min_subgraph_size'] = min_subgraph_size
arch_state = True
break
if not arch_state:
logger.error(
'Architecture: {} is not supported for exporting model now.\n'.
format(infer_arch) +
'Please set TRT_MIN_SUBGRAPH in ppdet/engine/export_utils.py')
os._exit(0)
label_arch = 'keypoint_arch'
reader_cfg = config['TestReader']
dataset_cfg = config['TestDataset']
infer_cfg['Preprocess'], infer_cfg['label_list'] = _parse_reader(
reader_cfg, dataset_cfg, config['metric'], label_arch, image_shape[1:])
yaml.dump(infer_cfg, open(path, 'w'))
logger.info("Export inference config file to {}".format(
os.path.join(path)))
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import paddle
import paddle.nn as nn
import paddle.optimizer as optimizer
import paddle.regularizer as regularizer
from lib.utils.workspace import register, serializable
__all__ = ['LearningRate', 'OptimizerBuilder']
from ..utils.logger import setup_logger
logger = setup_logger(__name__)
@serializable
class PiecewiseDecay(object):
"""
Multi step learning rate decay
Args:
gamma (float | list): decay factor
milestones (list): steps at which to decay learning rate
"""
def __init__(self,
gamma=[0.1, 0.01],
milestones=[8, 11],
values=None,
use_warmup=True):
super(PiecewiseDecay, self).__init__()
if type(gamma) is not list:
self.gamma = []
for i in range(len(milestones)):
self.gamma.append(gamma / 10**i)
else:
self.gamma = gamma
self.milestones = milestones
self.values = values
self.use_warmup = use_warmup
def __call__(self,
base_lr=None,
boundary=None,
value=None,
step_per_epoch=None):
if boundary is not None and self.use_warmup:
boundary.extend([int(step_per_epoch) * i for i in self.milestones])
else:
# do not use LinearWarmup
boundary = [int(step_per_epoch) * i for i in self.milestones]
value = [base_lr] # during step[0, boundary[0]] is base_lr
# self.values is setted directly in config
if self.values is not None:
assert len(self.milestones) + 1 == len(self.values)
return optimizer.lr.PiecewiseDecay(boundary, self.values)
# value is computed by self.gamma
value = value if value is not None else [base_lr]
for i in self.gamma:
value.append(base_lr * i)
return optimizer.lr.PiecewiseDecay(boundary, value)
@serializable
class LinearWarmup(object):
"""
Warm up learning rate linearly
Args:
steps (int): warm up steps
start_factor (float): initial learning rate factor
"""
def __init__(self, steps=500, start_factor=1. / 3):
super(LinearWarmup, self).__init__()
self.steps = steps
self.start_factor = start_factor
def __call__(self, base_lr, step_per_epoch):
boundary = []
value = []
for i in range(self.steps + 1):
if self.steps > 0:
alpha = i / self.steps
factor = self.start_factor * (1 - alpha) + alpha
lr = base_lr * factor
value.append(lr)
if i > 0:
boundary.append(i)
return boundary, value
@register
class LearningRate(object):
"""
Learning Rate configuration
Args:
base_lr (float): base learning rate
schedulers (list): learning rate schedulers
"""
__category__ = 'optim'
def __init__(self,
base_lr=0.01,
schedulers=[PiecewiseDecay(), LinearWarmup()]):
super(LearningRate, self).__init__()
self.base_lr = base_lr
self.schedulers = schedulers
def __call__(self, step_per_epoch):
assert len(self.schedulers) >= 1
if not self.schedulers[0].use_warmup:
return self.schedulers[0](base_lr=self.base_lr,
step_per_epoch=step_per_epoch)
# TODO: split warmup & decay
# warmup
boundary, value = self.schedulers[1](self.base_lr, step_per_epoch)
# decay
decay_lr = self.schedulers[0](self.base_lr, boundary, value,
step_per_epoch)
return decay_lr
@register
class OptimizerBuilder():
"""
Build optimizer handles
Args:
regularizer (object): an `Regularizer` instance
optimizer (object): an `Optimizer` instance
"""
__category__ = 'optim'
def __init__(self,
clip_grad_by_norm=None,
regularizer={'type': 'L2',
'factor': .0001},
optimizer={'type': 'Momentum',
'momentum': .9}):
self.clip_grad_by_norm = clip_grad_by_norm
self.regularizer = regularizer
self.optimizer = optimizer
def __call__(self, learning_rate, model=None):
if not isinstance(model, (list, tuple)):
model = [model]
if self.clip_grad_by_norm is not None:
grad_clip = nn.ClipGradByGlobalNorm(
clip_norm=self.clip_grad_by_norm)
else:
grad_clip = None
if self.regularizer and self.regularizer != 'None':
reg_type = self.regularizer['type'] + 'Decay'
reg_factor = self.regularizer['factor']
regularization = getattr(regularizer, reg_type)(reg_factor)
else:
regularization = None
optim_args = self.optimizer.copy()
optim_type = optim_args['type']
del optim_args['type']
optim_args['weight_decay'] = regularization
op = getattr(optimizer, optim_type)
params = []
for m in model:
if m is not None:
params.extend(m.parameters())
return op(learning_rate=learning_rate,
parameters=params,
grad_clip=grad_clip,
**optim_args)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import copy
import time
import numpy as np
from PIL import Image, ImageOps, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
import paddle
import paddle.distributed as dist
from paddle.distributed import fleet
from paddle import amp
from paddle.static import InputSpec
from lib.utils.workspace import create
from lib.utils.checkpoint import load_weight, load_pretrain_weight
from lib.utils.visualizer import visualize_results, save_result
from lib.metrics.coco_utils import get_infer_results
from lib.metrics import KeyPointTopDownCOCOEval
from lib.dataset.category import get_categories
import lib.utils.stats as stats
from .callbacks import Callback, ComposeCallback, LogPrinter, Checkpointer, VisualDLWriter
from .export_utils import _dump_infer_config, _prune_input_spec
from lib.utils.logger import setup_logger
logger = setup_logger('hrnet.pose')
__all__ = ['Trainer']
class Trainer(object):
def __init__(self, cfg, mode='train'):
self.cfg = cfg
assert mode.lower() in ['train', 'eval', 'test'], \
"mode should be 'train', 'eval' or 'test'"
self.mode = mode.lower()
self.optimizer = None
# init distillation config
self.distill_model = None
self.distill_loss = None
# build data loader
self.dataset = cfg['{}Dataset'.format(self.mode.capitalize())]
if self.mode == 'train':
self.loader = create('{}Reader'.format(self.mode.capitalize()))(
self.dataset, cfg.worker_num)
self.model = create(cfg.architecture)
#normalize params for deploy
self.model.load_meanstd(cfg['TestReader']['sample_transforms'])
# EvalDataset build with BatchSampler to evaluate in single device
if self.mode == 'eval':
self._eval_batch_sampler = paddle.io.BatchSampler(
self.dataset, batch_size=self.cfg.EvalReader['batch_size'])
self.loader = create('{}Reader'.format(self.mode.capitalize()))(
self.dataset, cfg.worker_num, self._eval_batch_sampler)
# TestDataset build after user set images, skip loader creation here
self._nranks = dist.get_world_size()
self._local_rank = dist.get_rank()
self.status = {}
self.start_epoch = 0
self.end_epoch = 0 if 'epoch' not in cfg else cfg.epoch
# initial default callbacks
self._init_callbacks()
# initial default metrics
self._init_metrics()
self._reset_metrics()
def _init_callbacks(self):
if self.mode == 'train':
self._callbacks = [LogPrinter(self), Checkpointer(self)]
if self.cfg.get('use_vdl', False):
self._callbacks.append(VisualDLWriter(self))
self._compose_callback = ComposeCallback(self._callbacks)
elif self.mode == 'eval':
self._callbacks = [LogPrinter(self)]
self._compose_callback = ComposeCallback(self._callbacks)
elif self.mode == 'test' and self.cfg.get('use_vdl', False):
self._callbacks = [VisualDLWriter(self)]
self._compose_callback = ComposeCallback(self._callbacks)
else:
self._callbacks = []
self._compose_callback = None
def _init_metrics(self, validate=False):
if self.mode == 'test' or (self.mode == 'train' and not validate):
self._metrics = []
return
if self.cfg.metric == 'KeyPointTopDownCOCOEval':
eval_dataset = self.cfg['EvalDataset']
eval_dataset.check_or_download_dataset()
anno_file = eval_dataset.get_anno()
save_prediction_only = self.cfg.get('save_prediction_only', False)
self._metrics = [
KeyPointTopDownCOCOEval(
anno_file,
len(eval_dataset),
self.cfg.num_joints,
self.cfg.save_dir,
save_prediction_only=save_prediction_only)
]
else:
logger.warning("Metric not support for metric type {}".format(
self.cfg.metric))
self._metrics = []
def init_optimizer(self, ):
# build optimizer in train mode
if self.mode == 'train':
steps_per_epoch = len(self.loader)
self.lr = create('LearningRate')(steps_per_epoch)
self.optimizer = create('OptimizerBuilder')(
self.lr, [self.model, self.distill_model])
def _reset_metrics(self):
for metric in self._metrics:
metric.reset()
def register_callbacks(self, callbacks):
callbacks = [c for c in list(callbacks) if c is not None]
for c in callbacks:
assert isinstance(c, Callback), \
"metrics shoule be instances of subclass of Metric"
self._callbacks.extend(callbacks)
self._compose_callback = ComposeCallback(self._callbacks)
def register_metrics(self, metrics):
metrics = [m for m in list(metrics) if m is not None]
self._metrics.extend(metrics)
def load_weights(self, weights, model=None):
self.start_epoch = 0
if model is None:
model = self.model
load_pretrain_weight(self.model, weights)
logger.debug("Load weights {} to start training".format(weights))
def train(self, validate=False):
assert self.mode == 'train', "Model not in 'train' mode"
Init_mark = False
model = self.model
if self._nranks > 1:
model = paddle.DataParallel(
self.model,
find_unused_parameters=self.cfg.get("find_unused_parameters",
False))
self.status.update({
'epoch_id': self.start_epoch,
'step_id': 0,
'steps_per_epoch': len(self.loader)
})
self.status['batch_time'] = stats.SmoothedValue(
self.cfg.log_iter, fmt='{avg:.4f}')
self.status['data_time'] = stats.SmoothedValue(
self.cfg.log_iter, fmt='{avg:.4f}')
self.status['training_staus'] = stats.TrainingStats(self.cfg.log_iter)
self._compose_callback.on_train_begin(self.status)
for epoch_id in range(self.start_epoch, self.cfg.epoch):
self.status['mode'] = 'train'
self.status['epoch_id'] = epoch_id
self._compose_callback.on_epoch_begin(self.status)
self.loader.dataset.set_epoch(epoch_id)
model.train()
iter_tic = time.time()
for step_id, data in enumerate(self.loader):
self.status['data_time'].update(time.time() - iter_tic)
self.status['step_id'] = step_id
self._compose_callback.on_step_begin(self.status)
data['epoch_id'] = epoch_id
# model forward
outputs = model(data)
if self.distill_model is not None:
teacher_outputs = self.distill_model(data)
distill_loss = self.distill_loss(outputs, teacher_outputs,
data)
loss = outputs['loss'] + teacher_outputs[
"loss"] + distill_loss
else:
loss = outputs['loss']
# model backward
loss.backward()
self.optimizer.step()
curr_lr = self.optimizer.get_lr()
self.lr.step()
self.optimizer.clear_grad()
self.status['learning_rate'] = curr_lr
if self._nranks < 2 or self._local_rank == 0:
loss_dict = {"loss": outputs['loss']}
if self.distill_model is not None:
loss_dict.update({
"loss_student": outputs['loss'],
"loss_teacher": teacher_outputs["loss"],
"loss_distill": distill_loss,
"loss": loss
})
self.status['training_staus'].update(loss_dict)
self.status['batch_time'].update(time.time() - iter_tic)
self._compose_callback.on_step_end(self.status)
iter_tic = time.time()
self._compose_callback.on_epoch_end(self.status)
if validate and self._local_rank == 0 \
and ((epoch_id + 1) % self.cfg.snapshot_epoch == 0 \
or epoch_id == self.end_epoch - 1):
print("begin to eval...")
if not hasattr(self, '_eval_loader'):
# build evaluation dataset and loader
self._eval_dataset = self.cfg.EvalDataset
self._eval_batch_sampler = \
paddle.io.BatchSampler(
self._eval_dataset,
batch_size=self.cfg.EvalReader['batch_size'])
self._eval_loader = create('EvalReader')(
self._eval_dataset,
self.cfg.worker_num,
batch_sampler=self._eval_batch_sampler)
# if validation in training is enabled, metrics should be re-init
# Init_mark makes sure this code will only execute once
if validate and Init_mark == False:
Init_mark = True
self._init_metrics(validate=validate)
self._reset_metrics()
with paddle.no_grad():
self.status['save_best_model'] = True
self._eval_with_loader(self._eval_loader)
self._compose_callback.on_train_end(self.status)
def _eval_with_loader(self, loader):
sample_num = 0
tic = time.time()
self._compose_callback.on_epoch_begin(self.status)
self.status['mode'] = 'eval'
self.model.eval()
for step_id, data in enumerate(loader):
self.status['step_id'] = step_id
self._compose_callback.on_step_begin(self.status)
# forward
outs = self.model(data)
# update metrics
for metric in self._metrics:
metric.update(data, outs)
sample_num += data['im_id'].numpy().shape[0]
self._compose_callback.on_step_end(self.status)
self.status['sample_num'] = sample_num
self.status['cost_time'] = time.time() - tic
# accumulate metric to log out
for metric in self._metrics:
metric.accumulate()
metric.log()
self._compose_callback.on_epoch_end(self.status)
# reset metric states for metric may performed multiple times
self._reset_metrics()
def evaluate(self):
with paddle.no_grad():
self._eval_with_loader(self.loader)
def predict(self,
images,
draw_threshold=0.5,
output_dir='output',
save_txt=False):
self.dataset.set_images(images)
loader = create('TestReader')(self.dataset, 0)
imid2path = self.dataset.get_imid2path()
anno_file = self.dataset.get_anno()
clsid2catid, catid2name = get_categories(
self.cfg.metric, anno_file=anno_file)
# Run Infer
self.status['mode'] = 'test'
self.model.eval()
results = []
for step_id, data in enumerate(loader):
self.status['step_id'] = step_id
# forward
outs = self.model(data)
for key in ['im_shape', 'scale_factor', 'im_id']:
outs[key] = data[key]
for key, value in outs.items():
if hasattr(value, 'numpy'):
outs[key] = value.numpy()
results.append(outs)
for outs in results:
batch_res = get_infer_results(outs, clsid2catid)
bbox_num = outs['bbox_num']
start = 0
for i, im_id in enumerate(outs['im_id']):
image_path = imid2path[int(im_id)]
image = Image.open(image_path).convert('RGB')
image = ImageOps.exif_transpose(image)
self.status['original_image'] = np.array(image.copy())
end = start + bbox_num[i]
bbox_res = batch_res['bbox'][start:end] \
if 'bbox' in batch_res else None
keypoint_res = batch_res['keypoint'][start:end] \
if 'keypoint' in batch_res else None
image = visualize_results(image, bbox_res, keypoint_res,
int(im_id), catid2name,
draw_threshold)
self.status['result_image'] = np.array(image.copy())
if self._compose_callback:
self._compose_callback.on_step_end(self.status)
# save image with detection
save_name = self._get_save_image_name(output_dir, image_path)
logger.info("Detection bbox results save in {}".format(
save_name))
image.save(save_name, quality=95)
if save_txt:
save_path = os.path.splitext(save_name)[0] + '.txt'
results = {}
results["im_id"] = im_id
if bbox_res:
results["bbox_res"] = bbox_res
if keypoint_res:
results["keypoint_res"] = keypoint_res
save_result(save_path, results, catid2name, draw_threshold)
start = end
def _get_save_image_name(self, output_dir, image_path):
"""
Get save image name from source image path.
"""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
image_name = os.path.split(image_path)[-1]
name, ext = os.path.splitext(image_name)
return os.path.join(output_dir, "{}".format(name)) + ext
def _get_infer_cfg_and_input_spec(self, save_dir, prune_input=True):
image_shape = [3, -1, -1]
im_shape = [None, 2]
scale_factor = [None, 2]
test_reader_name = 'TestReader'
if 'inputs_def' in self.cfg[test_reader_name]:
inputs_def = self.cfg[test_reader_name]['inputs_def']
image_shape = inputs_def.get('image_shape', None)
# set image_shape=[None, 3, -1, -1] as default
image_shape = [None] + image_shape
if hasattr(self.model, 'deploy'):
self.model.deploy = True
# Save infer cfg
_dump_infer_config(self.cfg,
os.path.join(save_dir, 'infer_cfg.yml'),
image_shape, self.model)
input_spec = [{
"image": InputSpec(
shape=image_shape, name='image'),
"im_shape": InputSpec(
shape=im_shape, name='im_shape'),
"scale_factor": InputSpec(
shape=scale_factor, name='scale_factor')
}]
if prune_input:
static_model = paddle.jit.to_static(
self.model, input_spec=input_spec)
# NOTE: dy2st do not pruned program, but jit.save will prune program
# input spec, prune input spec here and save with pruned input spec
pruned_input_spec = _prune_input_spec(
input_spec, static_model.forward.main_program,
static_model.forward.outputs)
else:
static_model = None
pruned_input_spec = input_spec
return static_model, pruned_input_spec
def export(self, output_dir='output_inference'):
self.model.eval()
model_name = os.path.splitext(os.path.split(self.cfg.filename)[-1])[0]
save_dir = os.path.join(output_dir, model_name)
if not os.path.exists(save_dir):
os.makedirs(save_dir)
static_model, pruned_input_spec = self._get_infer_cfg_and_input_spec(
save_dir)
# save model
if 'slim' not in self.cfg:
paddle.jit.save(
static_model,
os.path.join(save_dir, 'model'),
input_spec=pruned_input_spec)
else:
self.cfg.slim.save_quantized_model(
self.model,
os.path.join(save_dir, 'model'),
input_spec=pruned_input_spec)
logger.info("Export model and saved in {}".format(save_dir))
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import category
from . import dataset
from . import keypoint_coco
from . import reader
from . import transform
from .category import *
from .dataset import *
from .keypoint_coco import *
from .reader import *
from .transform import *
__all__ = category.__all__ + dataset.__all__ + keypoint_coco.__all__ \
+ reader.__all__ + transform.__all__
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from lib.utils.logger import setup_logger
logger = setup_logger(__name__)
__all__ = ['get_categories']
def get_categories(metric_type, anno_file=None, arch=None):
"""
Get class id to category id map and category id
to category name map from annotation file.
Args:
metric_type (str): metric type, currently support 'coco'.
anno_file (str): annotation file path
"""
if arch == 'keypoint_arch':
return (None, {'id': 'keypoint'})
if metric_type.lower() == 'keypointtopdowncocoeval' or metric_type.lower(
) == 'keypointtopdownmpiieval':
return (None, {'id': 'keypoint'})
else:
raise ValueError("unknown metric type {}".format(metric_type))
def _mot_category(category='pedestrian'):
"""
Get class id to category id map and category id
to category name map of mot dataset
"""
label_map = {category: 0}
label_map = sorted(label_map.items(), key=lambda x: x[1])
cats = [l[0] for l in label_map]
clsid2catid = {i: i for i in range(len(cats))}
catid2name = {i: name for i, name in enumerate(cats)}
return clsid2catid, catid2name
def _coco17_category():
"""
Get class id to category id map and category id
to category name map of COCO2017 dataset
"""
clsid2catid = {
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 13,
13: 14,
14: 15,
15: 16,
16: 17,
17: 18,
18: 19,
19: 20,
20: 21,
21: 22,
22: 23,
23: 24,
24: 25,
25: 27,
26: 28,
27: 31,
28: 32,
29: 33,
30: 34,
31: 35,
32: 36,
33: 37,
34: 38,
35: 39,
36: 40,
37: 41,
38: 42,
39: 43,
40: 44,
41: 46,
42: 47,
43: 48,
44: 49,
45: 50,
46: 51,
47: 52,
48: 53,
49: 54,
50: 55,
51: 56,
52: 57,
53: 58,
54: 59,
55: 60,
56: 61,
57: 62,
58: 63,
59: 64,
60: 65,
61: 67,
62: 70,
63: 72,
64: 73,
65: 74,
66: 75,
67: 76,
68: 77,
69: 78,
70: 79,
71: 80,
72: 81,
73: 82,
74: 84,
75: 85,
76: 86,
77: 87,
78: 88,
79: 89,
80: 90
}
catid2name = {
0: 'background',
1: 'person',
2: 'bicycle',
3: 'car',
4: 'motorcycle',
5: 'airplane',
6: 'bus',
7: 'train',
8: 'truck',
9: 'boat',
10: 'traffic light',
11: 'fire hydrant',
13: 'stop sign',
14: 'parking meter',
15: 'bench',
16: 'bird',
17: 'cat',
18: 'dog',
19: 'horse',
20: 'sheep',
21: 'cow',
22: 'elephant',
23: 'bear',
24: 'zebra',
25: 'giraffe',
27: 'backpack',
28: 'umbrella',
31: 'handbag',
32: 'tie',
33: 'suitcase',
34: 'frisbee',
35: 'skis',
36: 'snowboard',
37: 'sports ball',
38: 'kite',
39: 'baseball bat',
40: 'baseball glove',
41: 'skateboard',
42: 'surfboard',
43: 'tennis racket',
44: 'bottle',
46: 'wine glass',
47: 'cup',
48: 'fork',
49: 'knife',
50: 'spoon',
51: 'bowl',
52: 'banana',
53: 'apple',
54: 'sandwich',
55: 'orange',
56: 'broccoli',
57: 'carrot',
58: 'hot dog',
59: 'pizza',
60: 'donut',
61: 'cake',
62: 'chair',
63: 'couch',
64: 'potted plant',
65: 'bed',
67: 'dining table',
70: 'toilet',
72: 'tv',
73: 'laptop',
74: 'mouse',
75: 'remote',
76: 'keyboard',
77: 'cell phone',
78: 'microwave',
79: 'oven',
80: 'toaster',
81: 'sink',
82: 'refrigerator',
84: 'book',
85: 'clock',
86: 'vase',
87: 'scissors',
88: 'teddy bear',
89: 'hair drier',
90: 'toothbrush'
}
clsid2catid = {k - 1: v for k, v in clsid2catid.items()}
catid2name.pop(0)
return clsid2catid, catid2name
def _dota_category():
"""
Get class id to category id map and category id
to category name map of dota dataset
"""
catid2name = {
0: 'background',
1: 'plane',
2: 'baseball-diamond',
3: 'bridge',
4: 'ground-track-field',
5: 'small-vehicle',
6: 'large-vehicle',
7: 'ship',
8: 'tennis-court',
9: 'basketball-court',
10: 'storage-tank',
11: 'soccer-ball-field',
12: 'roundabout',
13: 'harbor',
14: 'swimming-pool',
15: 'helicopter'
}
catid2name.pop(0)
clsid2catid = {i: i + 1 for i in range(len(catid2name))}
return clsid2catid, catid2name
def _oid19_category():
clsid2catid = {k: k + 1 for k in range(500)}
catid2name = {
0: "background",
1: "Infant bed",
2: "Rose",
3: "Flag",
4: "Flashlight",
5: "Sea turtle",
6: "Camera",
7: "Animal",
8: "Glove",
9: "Crocodile",
10: "Cattle",
11: "House",
12: "Guacamole",
13: "Penguin",
14: "Vehicle registration plate",
15: "Bench",
16: "Ladybug",
17: "Human nose",
18: "Watermelon",
19: "Flute",
20: "Butterfly",
21: "Washing machine",
22: "Raccoon",
23: "Segway",
24: "Taco",
25: "Jellyfish",
26: "Cake",
27: "Pen",
28: "Cannon",
29: "Bread",
30: "Tree",
31: "Shellfish",
32: "Bed",
33: "Hamster",
34: "Hat",
35: "Toaster",
36: "Sombrero",
37: "Tiara",
38: "Bowl",
39: "Dragonfly",
40: "Moths and butterflies",
41: "Antelope",
42: "Vegetable",
43: "Torch",
44: "Building",
45: "Power plugs and sockets",
46: "Blender",
47: "Billiard table",
48: "Cutting board",
49: "Bronze sculpture",
50: "Turtle",
51: "Broccoli",
52: "Tiger",
53: "Mirror",
54: "Bear",
55: "Zucchini",
56: "Dress",
57: "Volleyball",
58: "Guitar",
59: "Reptile",
60: "Golf cart",
61: "Tart",
62: "Fedora",
63: "Carnivore",
64: "Car",
65: "Lighthouse",
66: "Coffeemaker",
67: "Food processor",
68: "Truck",
69: "Bookcase",
70: "Surfboard",
71: "Footwear",
72: "Bench",
73: "Necklace",
74: "Flower",
75: "Radish",
76: "Marine mammal",
77: "Frying pan",
78: "Tap",
79: "Peach",
80: "Knife",
81: "Handbag",
82: "Laptop",
83: "Tent",
84: "Ambulance",
85: "Christmas tree",
86: "Eagle",
87: "Limousine",
88: "Kitchen & dining room table",
89: "Polar bear",
90: "Tower",
91: "Football",
92: "Willow",
93: "Human head",
94: "Stop sign",
95: "Banana",
96: "Mixer",
97: "Binoculars",
98: "Dessert",
99: "Bee",
100: "Chair",
101: "Wood-burning stove",
102: "Flowerpot",
103: "Beaker",
104: "Oyster",
105: "Woodpecker",
106: "Harp",
107: "Bathtub",
108: "Wall clock",
109: "Sports uniform",
110: "Rhinoceros",
111: "Beehive",
112: "Cupboard",
113: "Chicken",
114: "Man",
115: "Blue jay",
116: "Cucumber",
117: "Balloon",
118: "Kite",
119: "Fireplace",
120: "Lantern",
121: "Missile",
122: "Book",
123: "Spoon",
124: "Grapefruit",
125: "Squirrel",
126: "Orange",
127: "Coat",
128: "Punching bag",
129: "Zebra",
130: "Billboard",
131: "Bicycle",
132: "Door handle",
133: "Mechanical fan",
134: "Ring binder",
135: "Table",
136: "Parrot",
137: "Sock",
138: "Vase",
139: "Weapon",
140: "Shotgun",
141: "Glasses",
142: "Seahorse",
143: "Belt",
144: "Watercraft",
145: "Window",
146: "Giraffe",
147: "Lion",
148: "Tire",
149: "Vehicle",
150: "Canoe",
151: "Tie",
152: "Shelf",
153: "Picture frame",
154: "Printer",
155: "Human leg",
156: "Boat",
157: "Slow cooker",
158: "Croissant",
159: "Candle",
160: "Pancake",
161: "Pillow",
162: "Coin",
163: "Stretcher",
164: "Sandal",
165: "Woman",
166: "Stairs",
167: "Harpsichord",
168: "Stool",
169: "Bus",
170: "Suitcase",
171: "Human mouth",
172: "Juice",
173: "Skull",
174: "Door",
175: "Violin",
176: "Chopsticks",
177: "Digital clock",
178: "Sunflower",
179: "Leopard",
180: "Bell pepper",
181: "Harbor seal",
182: "Snake",
183: "Sewing machine",
184: "Goose",
185: "Helicopter",
186: "Seat belt",
187: "Coffee cup",
188: "Microwave oven",
189: "Hot dog",
190: "Countertop",
191: "Serving tray",
192: "Dog bed",
193: "Beer",
194: "Sunglasses",
195: "Golf ball",
196: "Waffle",
197: "Palm tree",
198: "Trumpet",
199: "Ruler",
200: "Helmet",
201: "Ladder",
202: "Office building",
203: "Tablet computer",
204: "Toilet paper",
205: "Pomegranate",
206: "Skirt",
207: "Gas stove",
208: "Cookie",
209: "Cart",
210: "Raven",
211: "Egg",
212: "Burrito",
213: "Goat",
214: "Kitchen knife",
215: "Skateboard",
216: "Salt and pepper shakers",
217: "Lynx",
218: "Boot",
219: "Platter",
220: "Ski",
221: "Swimwear",
222: "Swimming pool",
223: "Drinking straw",
224: "Wrench",
225: "Drum",
226: "Ant",
227: "Human ear",
228: "Headphones",
229: "Fountain",
230: "Bird",
231: "Jeans",
232: "Television",
233: "Crab",
234: "Microphone",
235: "Home appliance",
236: "Snowplow",
237: "Beetle",
238: "Artichoke",
239: "Jet ski",
240: "Stationary bicycle",
241: "Human hair",
242: "Brown bear",
243: "Starfish",
244: "Fork",
245: "Lobster",
246: "Corded phone",
247: "Drink",
248: "Saucer",
249: "Carrot",
250: "Insect",
251: "Clock",
252: "Castle",
253: "Tennis racket",
254: "Ceiling fan",
255: "Asparagus",
256: "Jaguar",
257: "Musical instrument",
258: "Train",
259: "Cat",
260: "Rifle",
261: "Dumbbell",
262: "Mobile phone",
263: "Taxi",
264: "Shower",
265: "Pitcher",
266: "Lemon",
267: "Invertebrate",
268: "Turkey",
269: "High heels",
270: "Bust",
271: "Elephant",
272: "Scarf",
273: "Barrel",
274: "Trombone",
275: "Pumpkin",
276: "Box",
277: "Tomato",
278: "Frog",
279: "Bidet",
280: "Human face",
281: "Houseplant",
282: "Van",
283: "Shark",
284: "Ice cream",
285: "Swim cap",
286: "Falcon",
287: "Ostrich",
288: "Handgun",
289: "Whiteboard",
290: "Lizard",
291: "Pasta",
292: "Snowmobile",
293: "Light bulb",
294: "Window blind",
295: "Muffin",
296: "Pretzel",
297: "Computer monitor",
298: "Horn",
299: "Furniture",
300: "Sandwich",
301: "Fox",
302: "Convenience store",
303: "Fish",
304: "Fruit",
305: "Earrings",
306: "Curtain",
307: "Grape",
308: "Sofa bed",
309: "Horse",
310: "Luggage and bags",
311: "Desk",
312: "Crutch",
313: "Bicycle helmet",
314: "Tick",
315: "Airplane",
316: "Canary",
317: "Spatula",
318: "Watch",
319: "Lily",
320: "Kitchen appliance",
321: "Filing cabinet",
322: "Aircraft",
323: "Cake stand",
324: "Candy",
325: "Sink",
326: "Mouse",
327: "Wine",
328: "Wheelchair",
329: "Goldfish",
330: "Refrigerator",
331: "French fries",
332: "Drawer",
333: "Treadmill",
334: "Picnic basket",
335: "Dice",
336: "Cabbage",
337: "Football helmet",
338: "Pig",
339: "Person",
340: "Shorts",
341: "Gondola",
342: "Honeycomb",
343: "Doughnut",
344: "Chest of drawers",
345: "Land vehicle",
346: "Bat",
347: "Monkey",
348: "Dagger",
349: "Tableware",
350: "Human foot",
351: "Mug",
352: "Alarm clock",
353: "Pressure cooker",
354: "Human hand",
355: "Tortoise",
356: "Baseball glove",
357: "Sword",
358: "Pear",
359: "Miniskirt",
360: "Traffic sign",
361: "Girl",
362: "Roller skates",
363: "Dinosaur",
364: "Porch",
365: "Human beard",
366: "Submarine sandwich",
367: "Screwdriver",
368: "Strawberry",
369: "Wine glass",
370: "Seafood",
371: "Racket",
372: "Wheel",
373: "Sea lion",
374: "Toy",
375: "Tea",
376: "Tennis ball",
377: "Waste container",
378: "Mule",
379: "Cricket ball",
380: "Pineapple",
381: "Coconut",
382: "Doll",
383: "Coffee table",
384: "Snowman",
385: "Lavender",
386: "Shrimp",
387: "Maple",
388: "Cowboy hat",
389: "Goggles",
390: "Rugby ball",
391: "Caterpillar",
392: "Poster",
393: "Rocket",
394: "Organ",
395: "Saxophone",
396: "Traffic light",
397: "Cocktail",
398: "Plastic bag",
399: "Squash",
400: "Mushroom",
401: "Hamburger",
402: "Light switch",
403: "Parachute",
404: "Teddy bear",
405: "Winter melon",
406: "Deer",
407: "Musical keyboard",
408: "Plumbing fixture",
409: "Scoreboard",
410: "Baseball bat",
411: "Envelope",
412: "Adhesive tape",
413: "Briefcase",
414: "Paddle",
415: "Bow and arrow",
416: "Telephone",
417: "Sheep",
418: "Jacket",
419: "Boy",
420: "Pizza",
421: "Otter",
422: "Office supplies",
423: "Couch",
424: "Cello",
425: "Bull",
426: "Camel",
427: "Ball",
428: "Duck",
429: "Whale",
430: "Shirt",
431: "Tank",
432: "Motorcycle",
433: "Accordion",
434: "Owl",
435: "Porcupine",
436: "Sun hat",
437: "Nail",
438: "Scissors",
439: "Swan",
440: "Lamp",
441: "Crown",
442: "Piano",
443: "Sculpture",
444: "Cheetah",
445: "Oboe",
446: "Tin can",
447: "Mango",
448: "Tripod",
449: "Oven",
450: "Mouse",
451: "Barge",
452: "Coffee",
453: "Snowboard",
454: "Common fig",
455: "Salad",
456: "Marine invertebrates",
457: "Umbrella",
458: "Kangaroo",
459: "Human arm",
460: "Measuring cup",
461: "Snail",
462: "Loveseat",
463: "Suit",
464: "Teapot",
465: "Bottle",
466: "Alpaca",
467: "Kettle",
468: "Trousers",
469: "Popcorn",
470: "Centipede",
471: "Spider",
472: "Sparrow",
473: "Plate",
474: "Bagel",
475: "Personal care",
476: "Apple",
477: "Brassiere",
478: "Bathroom cabinet",
479: "studio couch",
480: "Computer keyboard",
481: "Table tennis racket",
482: "Sushi",
483: "Cabinetry",
484: "Street light",
485: "Towel",
486: "Nightstand",
487: "Rabbit",
488: "Dolphin",
489: "Dog",
490: "Jug",
491: "Wok",
492: "Fire hydrant",
493: "Human eye",
494: "Skyscraper",
495: "Backpack",
496: "Potato",
497: "Paper towel",
498: "Lifejacket",
499: "Bicycle wheel",
500: "Toilet",
}
return clsid2catid, catid2name
def _visdrone_category():
clsid2catid = {i: i for i in range(10)}
catid2name = {
0: 'pedestrian',
1: 'people',
2: 'bicycle',
3: 'car',
4: 'van',
5: 'truck',
6: 'tricycle',
7: 'awning-tricycle',
8: 'bus',
9: 'motor'
}
return clsid2catid, catid2name
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import numpy as np
try:
from collections.abc import Sequence
except Exception:
from collections import Sequence
from paddle.io import Dataset
import copy
from lib.utils.workspace import register, serializable
from lib.utils.download import get_dataset_path
__all__ = ['DetDataset', 'ImageFolder']
@serializable
class DetDataset(Dataset):
"""
Load detection dataset.
Args:
dataset_dir (str): root directory for dataset.
image_dir (str): directory for images.
anno_path (str): annotation file path.
data_fields (list): key name of data dictionary, at least have 'image'.
sample_num (int): number of samples to load, -1 means all.
use_default_label (bool): whether to load default label list.
"""
def __init__(self,
dataset_dir=None,
image_dir=None,
anno_path=None,
data_fields=['image'],
sample_num=-1,
use_default_label=None,
**kwargs):
super(DetDataset, self).__init__()
self.dataset_dir = dataset_dir if dataset_dir is not None else ''
self.anno_path = anno_path
self.image_dir = image_dir if image_dir is not None else ''
self.data_fields = data_fields
self.sample_num = sample_num
self.use_default_label = use_default_label
self._epoch = 0
self._curr_iter = 0
def __len__(self, ):
return len(self.roidbs)
def __getitem__(self, idx):
# data batch
roidb = copy.deepcopy(self.roidbs[idx])
if self.mixup_epoch == 0 or self._epoch < self.mixup_epoch:
n = len(self.roidbs)
idx = np.random.randint(n)
roidb = [roidb, copy.deepcopy(self.roidbs[idx])]
elif self.cutmix_epoch == 0 or self._epoch < self.cutmix_epoch:
n = len(self.roidbs)
idx = np.random.randint(n)
roidb = [roidb, copy.deepcopy(self.roidbs[idx])]
elif self.mosaic_epoch == 0 or self._epoch < self.mosaic_epoch:
n = len(self.roidbs)
roidb = [roidb, ] + [
copy.deepcopy(self.roidbs[np.random.randint(n)])
for _ in range(3)
]
if isinstance(roidb, Sequence):
for r in roidb:
r['curr_iter'] = self._curr_iter
else:
roidb['curr_iter'] = self._curr_iter
self._curr_iter += 1
return self.transform(roidb)
def check_or_download_dataset(self):
self.dataset_dir = get_dataset_path(self.dataset_dir, self.anno_path,
self.image_dir)
def set_kwargs(self, **kwargs):
self.mixup_epoch = kwargs.get('mixup_epoch', -1)
self.cutmix_epoch = kwargs.get('cutmix_epoch', -1)
self.mosaic_epoch = kwargs.get('mosaic_epoch', -1)
def set_transform(self, transform):
self.transform = transform
def set_epoch(self, epoch_id):
self._epoch = epoch_id
def parse_dataset(self, ):
raise NotImplementedError(
"Need to implement parse_dataset method of Dataset")
def get_anno(self):
if self.anno_path is None:
return
return os.path.join(self.dataset_dir, self.anno_path)
def _is_valid_file(f, extensions=('.jpg', '.jpeg', '.png', '.bmp')):
return f.lower().endswith(extensions)
def _make_dataset(dir):
dir = os.path.expanduser(dir)
if not os.path.isdir(dir):
raise ('{} should be a dir'.format(dir))
images = []
for root, _, fnames in sorted(os.walk(dir, followlinks=True)):
for fname in sorted(fnames):
path = os.path.join(root, fname)
if _is_valid_file(path):
images.append(path)
return images
@register
@serializable
class ImageFolder(DetDataset):
def __init__(self,
dataset_dir=None,
image_dir=None,
anno_path=None,
sample_num=-1,
use_default_label=None,
**kwargs):
super(ImageFolder, self).__init__(
dataset_dir,
image_dir,
anno_path,
sample_num=sample_num,
use_default_label=use_default_label)
self._imid2path = {}
self.roidbs = None
self.sample_num = sample_num
def check_or_download_dataset(self):
if self.dataset_dir:
# NOTE: ImageFolder is only used for prediction, in
# infer mode, image_dir is set by set_images
# so we only check anno_path here
self.dataset_dir = get_dataset_path(self.dataset_dir,
self.anno_path, None)
def parse_dataset(self, ):
if not self.roidbs:
self.roidbs = self._load_images()
def _parse(self):
image_dir = self.image_dir
if not isinstance(image_dir, Sequence):
image_dir = [image_dir]
images = []
for im_dir in image_dir:
if os.path.isdir(im_dir):
im_dir = os.path.join(self.dataset_dir, im_dir)
images.extend(_make_dataset(im_dir))
elif os.path.isfile(im_dir) and _is_valid_file(im_dir):
images.append(im_dir)
return images
def _load_images(self):
images = self._parse()
ct = 0
records = []
for image in images:
assert image != '' and os.path.isfile(image), \
"Image {} not found".format(image)
if self.sample_num > 0 and ct >= self.sample_num:
break
rec = {'im_id': np.array([ct]), 'im_file': image}
self._imid2path[ct] = image
ct += 1
records.append(rec)
assert len(records) > 0, "No image file found"
return records
def get_imid2path(self):
return self._imid2path
def set_images(self, images):
self.image_dir = images
self.roidbs = self._load_images()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import cv2
import numpy as np
import json
import copy
import pycocotools
from pycocotools.coco import COCO
from .dataset import DetDataset
from lib.utils.workspace import register, serializable
__all__ = ['KeypointTopDownBaseDataset', 'KeypointTopDownCocoDataset']
@serializable
class KeypointTopDownBaseDataset(DetDataset):
"""Base class for top_down datasets.
All datasets should subclass it.
All subclasses should overwrite:
Methods:`_get_db`
Args:
dataset_dir (str): Root path to the dataset.
image_dir (str): Path to a directory where images are held.
anno_path (str): Relative path to the annotation file.
num_joints (int): keypoint numbers
transform (composed(operators)): A sequence of data transforms.
"""
def __init__(self,
dataset_dir,
image_dir,
anno_path,
num_joints,
transform=[]):
super().__init__(dataset_dir, image_dir, anno_path)
self.image_info = {}
self.ann_info = {}
self.img_prefix = os.path.join(dataset_dir, image_dir)
self.transform = transform
self.ann_info['num_joints'] = num_joints
self.db = []
def __len__(self):
"""Get dataset length."""
return len(self.db)
def _get_db(self):
"""Get a sample"""
raise NotImplementedError
def __getitem__(self, idx):
"""Prepare sample for training given the index."""
records = copy.deepcopy(self.db[idx])
records['image'] = cv2.imread(records['image_file'], cv2.IMREAD_COLOR |
cv2.IMREAD_IGNORE_ORIENTATION)
records['image'] = cv2.cvtColor(records['image'], cv2.COLOR_BGR2RGB)
records['score'] = records['score'] if 'score' in records else 1
records = self.transform(records)
# print('records', records)
return records
@register
@serializable
class KeypointTopDownCocoDataset(KeypointTopDownBaseDataset):
"""COCO dataset for top-down pose estimation. Adapted from
https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
Copyright (c) Microsoft, under the MIT License.
The dataset loads raw features and apply specified transforms
to return a dict containing the image tensors and other information.
COCO keypoint indexes:
0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
Args:
dataset_dir (str): Root path to the dataset.
image_dir (str): Path to a directory where images are held.
anno_path (str): Relative path to the annotation file.
num_joints (int): Keypoint numbers
trainsize (list):[w, h] Image target size
transform (composed(operators)): A sequence of data transforms.
bbox_file (str): Path to a detection bbox file
Default: None.
use_gt_bbox (bool): Whether to use ground truth bbox
Default: True.
pixel_std (int): The pixel std of the scale
Default: 200.
image_thre (float): The threshold to filter the detection box
Default: 0.0.
"""
def __init__(self,
dataset_dir,
image_dir,
anno_path,
num_joints,
trainsize,
transform=[],
bbox_file=None,
use_gt_bbox=True,
pixel_std=200,
image_thre=0.0):
super().__init__(dataset_dir, image_dir, anno_path, num_joints,
transform)
self.bbox_file = bbox_file
self.use_gt_bbox = use_gt_bbox
self.trainsize = trainsize
self.pixel_std = pixel_std
self.image_thre = image_thre
self.dataset_name = 'coco'
def parse_dataset(self):
if self.use_gt_bbox:
self.db = self._load_coco_keypoint_annotations()
else:
self.db = self._load_coco_person_detection_results()
def _load_coco_keypoint_annotations(self):
coco = COCO(self.get_anno())
img_ids = coco.getImgIds()
gt_db = []
for index in img_ids:
im_ann = coco.loadImgs(index)[0]
width = im_ann['width']
height = im_ann['height']
file_name = im_ann['file_name']
im_id = int(im_ann["id"])
annIds = coco.getAnnIds(imgIds=index, iscrowd=False)
objs = coco.loadAnns(annIds)
valid_objs = []
for obj in objs:
x, y, w, h = obj['bbox']
x1 = np.max((0, x))
y1 = np.max((0, y))
x2 = np.min((width - 1, x1 + np.max((0, w - 1))))
y2 = np.min((height - 1, y1 + np.max((0, h - 1))))
if obj['area'] > 0 and x2 >= x1 and y2 >= y1:
obj['clean_bbox'] = [x1, y1, x2 - x1, y2 - y1]
valid_objs.append(obj)
objs = valid_objs
rec = []
for obj in objs:
if max(obj['keypoints']) == 0:
continue
joints = np.zeros(
(self.ann_info['num_joints'], 3), dtype=np.float)
joints_vis = np.zeros(
(self.ann_info['num_joints'], 3), dtype=np.float)
for ipt in range(self.ann_info['num_joints']):
joints[ipt, 0] = obj['keypoints'][ipt * 3 + 0]
joints[ipt, 1] = obj['keypoints'][ipt * 3 + 1]
joints[ipt, 2] = 0
t_vis = obj['keypoints'][ipt * 3 + 2]
if t_vis > 1:
t_vis = 1
joints_vis[ipt, 0] = t_vis
joints_vis[ipt, 1] = t_vis
joints_vis[ipt, 2] = 0
center, scale = self._box2cs(obj['clean_bbox'][:4])
rec.append({
'image_file': os.path.join(self.img_prefix, file_name),
'center': center,
'scale': scale,
'joints': joints,
'joints_vis': joints_vis,
'im_id': im_id,
})
gt_db.extend(rec)
return gt_db
def _box2cs(self, box):
x, y, w, h = box[:4]
center = np.zeros((2), dtype=np.float32)
center[0] = x + w * 0.5
center[1] = y + h * 0.5
aspect_ratio = self.trainsize[0] * 1.0 / self.trainsize[1]
if w > aspect_ratio * h:
h = w * 1.0 / aspect_ratio
elif w < aspect_ratio * h:
w = h * aspect_ratio
scale = np.array(
[w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std],
dtype=np.float32)
if center[0] != -1:
scale = scale * 1.25
return center, scale
def _load_coco_person_detection_results(self):
all_boxes = None
bbox_file_path = os.path.join(self.dataset_dir, self.bbox_file)
with open(bbox_file_path, 'r') as f:
all_boxes = json.load(f)
if not all_boxes:
print('=> Load %s fail!' % bbox_file_path)
return None
kpt_db = []
for n_img in range(0, len(all_boxes)):
det_res = all_boxes[n_img]
if det_res['category_id'] != 1:
continue
file_name = det_res[
'filename'] if 'filename' in det_res else '%012d.jpg' % det_res[
'image_id']
img_name = os.path.join(self.img_prefix, file_name)
box = det_res['bbox']
score = det_res['score']
im_id = int(det_res['image_id'])
if score < self.image_thre:
continue
center, scale = self._box2cs(box)
joints = np.zeros((self.ann_info['num_joints'], 3), dtype=np.float)
joints_vis = np.ones(
(self.ann_info['num_joints'], 3), dtype=np.float)
kpt_db.append({
'image_file': img_name,
'im_id': im_id,
'center': center,
'scale': scale,
'score': score,
'joints': joints,
'joints_vis': joints_vis,
})
return kpt_db
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import traceback
import six
import sys
if sys.version_info >= (3, 0):
pass
else:
pass
import numpy as np
from paddle.io import DataLoader, DistributedBatchSampler
from paddle.fluid.dataloader.collate import default_collate_fn
from lib.utils.workspace import register
from . import transform
from lib.utils.logger import setup_logger
logger = setup_logger('reader')
MAIN_PID = os.getpid()
__all__ = [
'Compose', 'BatchCompose', 'BaseDataLoader', 'TrainReader', 'EvalReader',
'TestReader'
]
class Compose(object):
def __init__(self, transforms, num_classes=80):
self.transforms = transforms
self.transforms_cls = []
for t in self.transforms:
for k, v in t.items():
op_cls = getattr(transform, k)
f = op_cls(**v)
if hasattr(f, 'num_classes'):
f.num_classes = num_classes
self.transforms_cls.append(f)
def __call__(self, data):
for f in self.transforms_cls:
try:
data = f(data)
except Exception as e:
stack_info = traceback.format_exc()
logger.warning("fail to map sample transform [{}] "
"with error: {} and stack:\n{}".format(
f, e, str(stack_info)))
raise e
return data
class BatchCompose(Compose):
def __init__(self, transforms, num_classes=80, collate_batch=True):
super(BatchCompose, self).__init__(transforms, num_classes)
self.collate_batch = collate_batch
def __call__(self, data):
for f in self.transforms_cls:
try:
data = f(data)
except Exception as e:
stack_info = traceback.format_exc()
logger.warning("fail to map batch transform [{}] "
"with error: {} and stack:\n{}".format(
f, e, str(stack_info)))
raise e
# remove keys which is not needed by model
extra_key = ['h', 'w', 'flipped']
for k in extra_key:
for sample in data:
if k in sample:
sample.pop(k)
# batch data, if user-define batch function needed
# use user-defined here
if self.collate_batch:
batch_data = default_collate_fn(data)
else:
batch_data = {}
for k in data[0].keys():
tmp_data = []
for i in range(len(data)):
tmp_data.append(data[i][k])
if not 'gt_' in k and not 'is_crowd' in k and not 'difficult' in k:
tmp_data = np.stack(tmp_data, axis=0)
batch_data[k] = tmp_data
return batch_data
SIZE_UNIT = ['K', 'M', 'G', 'T']
SHM_QUERY_CMD = 'df -h'
SHM_KEY = 'shm'
SHM_DEFAULT_MOUNT = '/dev/shm'
def _parse_size_in_M(size_str):
num, unit = size_str[:-1], size_str[-1]
assert unit in SIZE_UNIT, \
"unknown shm size unit {}".format(unit)
return float(num) * \
(1024 ** (SIZE_UNIT.index(unit) - 1))
def _get_shared_memory_size_in_M():
try:
df_infos = os.popen(SHM_QUERY_CMD).readlines()
except:
return None
else:
shm_infos = []
for df_info in df_infos:
info = df_info.strip()
if info.find(SHM_KEY) >= 0:
shm_infos.append(info.split())
if len(shm_infos) == 0:
return None
elif len(shm_infos) == 1:
return _parse_size_in_M(shm_infos[0][3])
else:
default_mount_infos = [
si for si in shm_infos if si[-1] == SHM_DEFAULT_MOUNT
]
if default_mount_infos:
return _parse_size_in_M(default_mount_infos[0][3])
else:
return max([_parse_size_in_M(si[3]) for si in shm_infos])
class BaseDataLoader(object):
"""
Base DataLoader implementation for detection models
Args:
sample_transforms (list): a list of transforms to perform
on each sample
batch_transforms (list): a list of transforms to perform
on batch
batch_size (int): batch size for batch collating, default 1.
shuffle (bool): whether to shuffle samples
drop_last (bool): whether to drop the last incomplete,
default False
num_classes (int): class number of dataset, default 80
collate_batch (bool): whether to collate batch in dataloader.
If set to True, the samples will collate into batch according
to the batch size. Otherwise, the ground-truth will not collate,
which is used when the number of ground-truch is different in
samples.
use_shared_memory (bool): whether to use shared memory to
accelerate data loading, enable this only if you
are sure that the shared memory size of your OS
is larger than memory cost of input datas of model.
Note that shared memory will be automatically
disabled if the shared memory of OS is less than
1G, which is not enough for detection models.
Default False.
"""
def __init__(self,
sample_transforms=[],
batch_transforms=[],
batch_size=1,
shuffle=False,
drop_last=False,
num_classes=80,
collate_batch=True,
use_shared_memory=False,
**kwargs):
# sample transform
self._sample_transforms = Compose(
sample_transforms, num_classes=num_classes)
# batch transfrom
self._batch_transforms = BatchCompose(batch_transforms, num_classes,
collate_batch)
self.batch_size = batch_size
self.shuffle = shuffle
self.drop_last = drop_last
self.use_shared_memory = use_shared_memory
self.kwargs = kwargs
def __call__(self,
dataset,
worker_num,
batch_sampler=None,
return_list=False):
self.dataset = dataset
self.dataset.check_or_download_dataset()
self.dataset.parse_dataset()
# get data
self.dataset.set_transform(self._sample_transforms)
# set kwargs
self.dataset.set_kwargs(**self.kwargs)
# batch sampler
if batch_sampler is None:
self._batch_sampler = DistributedBatchSampler(
self.dataset,
batch_size=self.batch_size,
shuffle=self.shuffle,
drop_last=self.drop_last)
else:
self._batch_sampler = batch_sampler
# DataLoader do not start sub-process in Windows and Mac
# system, do not need to use shared memory
use_shared_memory = self.use_shared_memory and \
sys.platform not in ['win32', 'darwin']
# check whether shared memory size is bigger than 1G(1024M)
if use_shared_memory:
shm_size = _get_shared_memory_size_in_M()
if shm_size is not None and shm_size < 1024.:
logger.warning("Shared memory size is less than 1G, "
"disable shared_memory in DataLoader")
use_shared_memory = False
self.dataloader = DataLoader(
dataset=self.dataset,
batch_sampler=self._batch_sampler,
collate_fn=self._batch_transforms,
num_workers=worker_num,
return_list=return_list,
use_shared_memory=use_shared_memory)
self.loader = iter(self.dataloader)
return self
def __len__(self):
return len(self._batch_sampler)
def __iter__(self):
return self
def __next__(self):
try:
return next(self.loader)
except StopIteration:
self.loader = iter(self.dataloader)
six.reraise(*sys.exc_info())
def next(self):
# python2 compatibility
return self.__next__()
@register
class TrainReader(BaseDataLoader):
__shared__ = ['num_classes']
def __init__(self,
sample_transforms=[],
batch_transforms=[],
batch_size=1,
shuffle=True,
drop_last=True,
num_classes=80,
collate_batch=True,
**kwargs):
super(TrainReader, self).__init__(sample_transforms, batch_transforms,
batch_size, shuffle, drop_last,
num_classes, collate_batch, **kwargs)
@register
class EvalReader(BaseDataLoader):
__shared__ = ['num_classes']
def __init__(self,
sample_transforms=[],
batch_transforms=[],
batch_size=1,
shuffle=False,
drop_last=True,
num_classes=80,
**kwargs):
super(EvalReader, self).__init__(sample_transforms, batch_transforms,
batch_size, shuffle, drop_last,
num_classes, **kwargs)
@register
class TestReader(BaseDataLoader):
__shared__ = ['num_classes']
def __init__(self,
sample_transforms=[],
batch_transforms=[],
batch_size=1,
shuffle=False,
drop_last=False,
num_classes=80,
**kwargs):
super(TestReader, self).__init__(sample_transforms, batch_transforms,
batch_size, shuffle, drop_last,
num_classes, **kwargs)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import operators
from . import keypoint_operators
from .operators import *
from .keypoint_operators import *
__all__ = []
__all__ += registered_ops
__all__ += keypoint_operators.__all__
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Reference:
# https://github.com/tensorflow/tpu/blob/master/models/official/detection/utils/autoaugment_utils.py
"""AutoAugment util file."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import inspect
import math
from PIL import Image, ImageEnhance
import numpy as np
import cv2
from copy import deepcopy
# This signifies the max integer that the controller RNN could predict for the
# augmentation scheme.
_MAX_LEVEL = 10.
# Represents an invalid bounding box that is used for checking for padding
# lists of bounding box coordinates for a few augmentation operations
_INVALID_BOX = [[-1.0, -1.0, -1.0, -1.0]]
def policy_v0():
"""Autoaugment policy that was used in AutoAugment Detection Paper."""
# Each tuple is an augmentation operation of the form
# (operation, probability, magnitude). Each element in policy is a
# sub-policy that will be applied sequentially on the image.
policy = [
[('TranslateX_BBox', 0.6, 4), ('Equalize', 0.8, 10)],
[('TranslateY_Only_BBoxes', 0.2, 2), ('Cutout', 0.8, 8)],
[('Sharpness', 0.0, 8), ('ShearX_BBox', 0.4, 0)],
[('ShearY_BBox', 1.0, 2), ('TranslateY_Only_BBoxes', 0.6, 6)],
[('Rotate_BBox', 0.6, 10), ('Color', 1.0, 6)],
]
return policy
def policy_v1():
"""Autoaugment policy that was used in AutoAugment Detection Paper."""
# Each tuple is an augmentation operation of the form
# (operation, probability, magnitude). Each element in policy is a
# sub-policy that will be applied sequentially on the image.
policy = [
[('TranslateX_BBox', 0.6, 4), ('Equalize', 0.8, 10)],
[('TranslateY_Only_BBoxes', 0.2, 2), ('Cutout', 0.8, 8)],
[('Sharpness', 0.0, 8), ('ShearX_BBox', 0.4, 0)],
[('ShearY_BBox', 1.0, 2), ('TranslateY_Only_BBoxes', 0.6, 6)],
[('Rotate_BBox', 0.6, 10), ('Color', 1.0, 6)],
[('Color', 0.0, 0), ('ShearX_Only_BBoxes', 0.8, 4)],
[('ShearY_Only_BBoxes', 0.8, 2), ('Flip_Only_BBoxes', 0.0, 10)],
[('Equalize', 0.6, 10), ('TranslateX_BBox', 0.2, 2)],
[('Color', 1.0, 10), ('TranslateY_Only_BBoxes', 0.4, 6)],
[('Rotate_BBox', 0.8, 10), ('Contrast', 0.0, 10)], # ,
[('Cutout', 0.2, 2), ('Brightness', 0.8, 10)],
[('Color', 1.0, 6), ('Equalize', 1.0, 2)],
[('Cutout_Only_BBoxes', 0.4, 6), ('TranslateY_Only_BBoxes', 0.8, 2)],
[('Color', 0.2, 8), ('Rotate_BBox', 0.8, 10)],
[('Sharpness', 0.4, 4), ('TranslateY_Only_BBoxes', 0.0, 4)],
[('Sharpness', 1.0, 4), ('SolarizeAdd', 0.4, 4)],
[('Rotate_BBox', 1.0, 8), ('Sharpness', 0.2, 8)],
[('ShearY_BBox', 0.6, 10), ('Equalize_Only_BBoxes', 0.6, 8)],
[('ShearX_BBox', 0.2, 6), ('TranslateY_Only_BBoxes', 0.2, 10)],
[('SolarizeAdd', 0.6, 8), ('Brightness', 0.8, 10)],
]
return policy
def policy_vtest():
"""Autoaugment test policy for debugging."""
# Each tuple is an augmentation operation of the form
# (operation, probability, magnitude). Each element in policy is a
# sub-policy that will be applied sequentially on the image.
policy = [[('TranslateX_BBox', 1.0, 4), ('Equalize', 1.0, 10)], ]
return policy
def policy_v2():
"""Additional policy that performs well on object detection."""
# Each tuple is an augmentation operation of the form
# (operation, probability, magnitude). Each element in policy is a
# sub-policy that will be applied sequentially on the image.
policy = [
[('Color', 0.0, 6), ('Cutout', 0.6, 8), ('Sharpness', 0.4, 8)],
[('Rotate_BBox', 0.4, 8), ('Sharpness', 0.4, 2),
('Rotate_BBox', 0.8, 10)],
[('TranslateY_BBox', 1.0, 8), ('AutoContrast', 0.8, 2)],
[('AutoContrast', 0.4, 6), ('ShearX_BBox', 0.8, 8),
('Brightness', 0.0, 10)],
[('SolarizeAdd', 0.2, 6), ('Contrast', 0.0, 10),
('AutoContrast', 0.6, 0)],
[('Cutout', 0.2, 0), ('Solarize', 0.8, 8), ('Color', 1.0, 4)],
[('TranslateY_BBox', 0.0, 4), ('Equalize', 0.6, 8),
('Solarize', 0.0, 10)],
[('TranslateY_BBox', 0.2, 2), ('ShearY_BBox', 0.8, 8),
('Rotate_BBox', 0.8, 8)],
[('Cutout', 0.8, 8), ('Brightness', 0.8, 8), ('Cutout', 0.2, 2)],
[('Color', 0.8, 4), ('TranslateY_BBox', 1.0, 6),
('Rotate_BBox', 0.6, 6)],
[('Rotate_BBox', 0.6, 10), ('BBox_Cutout', 1.0, 4),
('Cutout', 0.2, 8)],
[('Rotate_BBox', 0.0, 0), ('Equalize', 0.6, 6),
('ShearY_BBox', 0.6, 8)],
[('Brightness', 0.8, 8), ('AutoContrast', 0.4, 2),
('Brightness', 0.2, 2)],
[('TranslateY_BBox', 0.4, 8), ('Solarize', 0.4, 6),
('SolarizeAdd', 0.2, 10)],
[('Contrast', 1.0, 10), ('SolarizeAdd', 0.2, 8), ('Equalize', 0.2, 4)],
]
return policy
def policy_v3():
""""Additional policy that performs well on object detection."""
# Each tuple is an augmentation operation of the form
# (operation, probability, magnitude). Each element in policy is a
# sub-policy that will be applied sequentially on the image.
policy = [
[('Posterize', 0.8, 2), ('TranslateX_BBox', 1.0, 8)],
[('BBox_Cutout', 0.2, 10), ('Sharpness', 1.0, 8)],
[('Rotate_BBox', 0.6, 8), ('Rotate_BBox', 0.8, 10)],
[('Equalize', 0.8, 10), ('AutoContrast', 0.2, 10)],
[('SolarizeAdd', 0.2, 2), ('TranslateY_BBox', 0.2, 8)],
[('Sharpness', 0.0, 2), ('Color', 0.4, 8)],
[('Equalize', 1.0, 8), ('TranslateY_BBox', 1.0, 8)],
[('Posterize', 0.6, 2), ('Rotate_BBox', 0.0, 10)],
[('AutoContrast', 0.6, 0), ('Rotate_BBox', 1.0, 6)],
[('Equalize', 0.0, 4), ('Cutout', 0.8, 10)],
[('Brightness', 1.0, 2), ('TranslateY_BBox', 1.0, 6)],
[('Contrast', 0.0, 2), ('ShearY_BBox', 0.8, 0)],
[('AutoContrast', 0.8, 10), ('Contrast', 0.2, 10)],
[('Rotate_BBox', 1.0, 10), ('Cutout', 1.0, 10)],
[('SolarizeAdd', 0.8, 6), ('Equalize', 0.8, 8)],
]
return policy
def _equal(val1, val2, eps=1e-8):
return abs(val1 - val2) <= eps
def blend(image1, image2, factor):
"""Blend image1 and image2 using 'factor'.
Factor can be above 0.0. A value of 0.0 means only image1 is used.
A value of 1.0 means only image2 is used. A value between 0.0 and
1.0 means we linearly interpolate the pixel values between the two
images. A value greater than 1.0 "extrapolates" the difference
between the two pixel values, and we clip the results to values
between 0 and 255.
Args:
image1: An image Tensor of type uint8.
image2: An image Tensor of type uint8.
factor: A floating point value above 0.0.
Returns:
A blended image Tensor of type uint8.
"""
if factor == 0.0:
return image1
if factor == 1.0:
return image2
image1 = image1.astype(np.float32)
image2 = image2.astype(np.float32)
difference = image2 - image1
scaled = factor * difference
# Do addition in float.
temp = image1 + scaled
# Interpolate
if factor > 0.0 and factor < 1.0:
# Interpolation means we always stay within 0 and 255.
return temp.astype(np.uint8)
# Extrapolate:
#
# We need to clip and then cast.
return np.clip(temp, a_min=0, a_max=255).astype(np.uint8)
def cutout(image, pad_size, replace=0):
"""Apply cutout (https://arxiv.org/abs/1708.04552) to image.
This operation applies a (2*pad_size x 2*pad_size) mask of zeros to
a random location within `img`. The pixel values filled in will be of the
value `replace`. The located where the mask will be applied is randomly
chosen uniformly over the whole image.
Args:
image: An image Tensor of type uint8.
pad_size: Specifies how big the zero mask that will be generated is that
is applied to the image. The mask will be of size
(2*pad_size x 2*pad_size).
replace: What pixel value to fill in the image in the area that has
the cutout mask applied to it.
Returns:
An image Tensor that is of type uint8.
Example:
img = cv2.imread( "/home/vis/gry/train/img_data/test.jpg", cv2.COLOR_BGR2RGB )
new_img = cutout(img, pad_size=50, replace=0)
"""
image_height, image_width = image.shape[0], image.shape[1]
cutout_center_height = np.random.randint(low=0, high=image_height)
cutout_center_width = np.random.randint(low=0, high=image_width)
lower_pad = np.maximum(0, cutout_center_height - pad_size)
upper_pad = np.maximum(0, image_height - cutout_center_height - pad_size)
left_pad = np.maximum(0, cutout_center_width - pad_size)
right_pad = np.maximum(0, image_width - cutout_center_width - pad_size)
cutout_shape = [
image_height - (lower_pad + upper_pad),
image_width - (left_pad + right_pad)
]
padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]]
mask = np.pad(np.zeros(
cutout_shape, dtype=image.dtype),
padding_dims,
'constant',
constant_values=1)
mask = np.expand_dims(mask, -1)
mask = np.tile(mask, [1, 1, 3])
image = np.where(
np.equal(mask, 0),
np.ones_like(
image, dtype=image.dtype) * replace,
image)
return image.astype(np.uint8)
def solarize(image, threshold=128):
# For each pixel in the image, select the pixel
# if the value is less than the threshold.
# Otherwise, subtract 255 from the pixel.
return np.where(image < threshold, image, 255 - image)
def solarize_add(image, addition=0, threshold=128):
# For each pixel in the image less than threshold
# we add 'addition' amount to it and then clip the
# pixel value to be between 0 and 255. The value
# of 'addition' is between -128 and 128.
added_image = image.astype(np.int64) + addition
added_image = np.clip(added_image, a_min=0, a_max=255).astype(np.uint8)
return np.where(image < threshold, added_image, image)
def color(image, factor):
"""use cv2 to deal"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
degenerate = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
return blend(degenerate, image, factor)
# refer to https://github.com/4uiiurz1/pytorch-auto-augment/blob/024b2eac4140c38df8342f09998e307234cafc80/auto_augment.py#L197
def contrast(img, factor):
img = ImageEnhance.Contrast(Image.fromarray(img)).enhance(factor)
return np.array(img)
def brightness(image, factor):
"""Equivalent of PIL Brightness."""
degenerate = np.zeros_like(image)
return blend(degenerate, image, factor)
def posterize(image, bits):
"""Equivalent of PIL Posterize."""
shift = 8 - bits
return np.left_shift(np.right_shift(image, shift), shift)
def rotate(image, degrees, replace):
"""Rotates the image by degrees either clockwise or counterclockwise.
Args:
image: An image Tensor of type uint8.
degrees: Float, a scalar angle in degrees to rotate all images by. If
degrees is positive the image will be rotated clockwise otherwise it will
be rotated counterclockwise.
replace: A one or three value 1D tensor to fill empty pixels caused by
the rotate operation.
Returns:
The rotated version of image.
"""
image = wrap(image)
image = Image.fromarray(image)
image = image.rotate(degrees)
image = np.array(image, dtype=np.uint8)
return unwrap(image, replace)
def random_shift_bbox(image,
bbox,
pixel_scaling,
replace,
new_min_bbox_coords=None):
"""Move the bbox and the image content to a slightly new random location.
Args:
image: 3D uint8 Tensor.
bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
of type float that represents the normalized coordinates between 0 and 1.
The potential values for the new min corner of the bbox will be between
[old_min - pixel_scaling * bbox_height/2,
old_min - pixel_scaling * bbox_height/2].
pixel_scaling: A float between 0 and 1 that specifies the pixel range
that the new bbox location will be sampled from.
replace: A one or three value 1D tensor to fill empty pixels.
new_min_bbox_coords: If not None, then this is a tuple that specifies the
(min_y, min_x) coordinates of the new bbox. Normally this is randomly
specified, but this allows it to be manually set. The coordinates are
the absolute coordinates between 0 and image height/width and are int32.
Returns:
The new image that will have the shifted bbox location in it along with
the new bbox that contains the new coordinates.
"""
# Obtains image height and width and create helper clip functions.
image_height, image_width = image.shape[0], image.shape[1]
image_height = float(image_height)
image_width = float(image_width)
def clip_y(val):
return np.clip(val, a_min=0, a_max=image_height - 1).astype(np.int32)
def clip_x(val):
return np.clip(val, a_min=0, a_max=image_width - 1).astype(np.int32)
# Convert bbox to pixel coordinates.
min_y = int(image_height * bbox[0])
min_x = int(image_width * bbox[1])
max_y = clip_y(image_height * bbox[2])
max_x = clip_x(image_width * bbox[3])
bbox_height, bbox_width = (max_y - min_y + 1, max_x - min_x + 1)
image_height = int(image_height)
image_width = int(image_width)
# Select the new min/max bbox ranges that are used for sampling the
# new min x/y coordinates of the shifted bbox.
minval_y = clip_y(min_y - np.int32(pixel_scaling * float(bbox_height) /
2.0))
maxval_y = clip_y(min_y + np.int32(pixel_scaling * float(bbox_height) /
2.0))
minval_x = clip_x(min_x - np.int32(pixel_scaling * float(bbox_width) /
2.0))
maxval_x = clip_x(min_x + np.int32(pixel_scaling * float(bbox_width) /
2.0))
# Sample and calculate the new unclipped min/max coordinates of the new bbox.
if new_min_bbox_coords is None:
unclipped_new_min_y = np.random.randint(
low=minval_y, high=maxval_y, dtype=np.int32)
unclipped_new_min_x = np.random.randint(
low=minval_x, high=maxval_x, dtype=np.int32)
else:
unclipped_new_min_y, unclipped_new_min_x = (
clip_y(new_min_bbox_coords[0]), clip_x(new_min_bbox_coords[1]))
unclipped_new_max_y = unclipped_new_min_y + bbox_height - 1
unclipped_new_max_x = unclipped_new_min_x + bbox_width - 1
# Determine if any of the new bbox was shifted outside the current image.
# This is used for determining if any of the original bbox content should be
# discarded.
new_min_y, new_min_x, new_max_y, new_max_x = (
clip_y(unclipped_new_min_y), clip_x(unclipped_new_min_x),
clip_y(unclipped_new_max_y), clip_x(unclipped_new_max_x))
shifted_min_y = (new_min_y - unclipped_new_min_y) + min_y
shifted_max_y = max_y - (unclipped_new_max_y - new_max_y)
shifted_min_x = (new_min_x - unclipped_new_min_x) + min_x
shifted_max_x = max_x - (unclipped_new_max_x - new_max_x)
# Create the new bbox tensor by converting pixel integer values to floats.
new_bbox = np.stack([
float(new_min_y) / float(image_height), float(new_min_x) /
float(image_width), float(new_max_y) / float(image_height),
float(new_max_x) / float(image_width)
])
# Copy the contents in the bbox and fill the old bbox location
# with gray (128).
bbox_content = image[shifted_min_y:shifted_max_y + 1, shifted_min_x:
shifted_max_x + 1, :]
def mask_and_add_image(min_y_, min_x_, max_y_, max_x_, mask,
content_tensor, image_):
"""Applies mask to bbox region in image then adds content_tensor to it."""
mask = np.pad(mask, [[min_y_, (image_height - 1) - max_y_],
[min_x_, (image_width - 1) - max_x_], [0, 0]],
'constant',
constant_values=1)
content_tensor = np.pad(content_tensor,
[[min_y_, (image_height - 1) - max_y_],
[min_x_, (image_width - 1) - max_x_], [0, 0]],
'constant',
constant_values=0)
return image_ * mask + content_tensor
# Zero out original bbox location.
mask = np.zeros_like(image)[min_y:max_y + 1, min_x:max_x + 1, :]
grey_tensor = np.zeros_like(mask) + replace[0]
image = mask_and_add_image(min_y, min_x, max_y, max_x, mask, grey_tensor,
image)
# Fill in bbox content to new bbox location.
mask = np.zeros_like(bbox_content)
image = mask_and_add_image(new_min_y, new_min_x, new_max_y, new_max_x,
mask, bbox_content, image)
return image.astype(np.uint8), new_bbox
def _clip_bbox(min_y, min_x, max_y, max_x):
"""Clip bounding box coordinates between 0 and 1.
Args:
min_y: Normalized bbox coordinate of type float between 0 and 1.
min_x: Normalized bbox coordinate of type float between 0 and 1.
max_y: Normalized bbox coordinate of type float between 0 and 1.
max_x: Normalized bbox coordinate of type float between 0 and 1.
Returns:
Clipped coordinate values between 0 and 1.
"""
min_y = np.clip(min_y, a_min=0, a_max=1.0)
min_x = np.clip(min_x, a_min=0, a_max=1.0)
max_y = np.clip(max_y, a_min=0, a_max=1.0)
max_x = np.clip(max_x, a_min=0, a_max=1.0)
return min_y, min_x, max_y, max_x
def _check_bbox_area(min_y, min_x, max_y, max_x, delta=0.05):
"""Adjusts bbox coordinates to make sure the area is > 0.
Args:
min_y: Normalized bbox coordinate of type float between 0 and 1.
min_x: Normalized bbox coordinate of type float between 0 and 1.
max_y: Normalized bbox coordinate of type float between 0 and 1.
max_x: Normalized bbox coordinate of type float between 0 and 1.
delta: Float, this is used to create a gap of size 2 * delta between
bbox min/max coordinates that are the same on the boundary.
This prevents the bbox from having an area of zero.
Returns:
Tuple of new bbox coordinates between 0 and 1 that will now have a
guaranteed area > 0.
"""
height = max_y - min_y
width = max_x - min_x
def _adjust_bbox_boundaries(min_coord, max_coord):
# Make sure max is never 0 and min is never 1.
max_coord = np.maximum(max_coord, 0.0 + delta)
min_coord = np.minimum(min_coord, 1.0 - delta)
return min_coord, max_coord
if _equal(height, 0):
min_y, max_y = _adjust_bbox_boundaries(min_y, max_y)
if _equal(width, 0):
min_x, max_x = _adjust_bbox_boundaries(min_x, max_x)
return min_y, min_x, max_y, max_x
def _scale_bbox_only_op_probability(prob):
"""Reduce the probability of the bbox-only operation.
Probability is reduced so that we do not distort the content of too many
bounding boxes that are close to each other. The value of 3.0 was a chosen
hyper parameter when designing the autoaugment algorithm that we found
empirically to work well.
Args:
prob: Float that is the probability of applying the bbox-only operation.
Returns:
Reduced probability.
"""
return prob / 3.0
def _apply_bbox_augmentation(image, bbox, augmentation_func, *args):
"""Applies augmentation_func to the subsection of image indicated by bbox.
Args:
image: 3D uint8 Tensor.
bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
of type float that represents the normalized coordinates between 0 and 1.
augmentation_func: Augmentation function that will be applied to the
subsection of image.
*args: Additional parameters that will be passed into augmentation_func
when it is called.
Returns:
A modified version of image, where the bbox location in the image will
have `ugmentation_func applied to it.
"""
image_height = image.shape[0]
image_width = image.shape[1]
min_y = int(image_height * bbox[0])
min_x = int(image_width * bbox[1])
max_y = int(image_height * bbox[2])
max_x = int(image_width * bbox[3])
# Clip to be sure the max values do not fall out of range.
max_y = np.minimum(max_y, image_height - 1)
max_x = np.minimum(max_x, image_width - 1)
# Get the sub-tensor that is the image within the bounding box region.
bbox_content = image[min_y:max_y + 1, min_x:max_x + 1, :]
# Apply the augmentation function to the bbox portion of the image.
augmented_bbox_content = augmentation_func(bbox_content, *args)
# Pad the augmented_bbox_content and the mask to match the shape of original
# image.
augmented_bbox_content = np.pad(
augmented_bbox_content, [[min_y, (image_height - 1) - max_y],
[min_x, (image_width - 1) - max_x], [0, 0]],
'constant',
constant_values=1)
# Create a mask that will be used to zero out a part of the original image.
mask_tensor = np.zeros_like(bbox_content)
mask_tensor = np.pad(mask_tensor,
[[min_y, (image_height - 1) - max_y],
[min_x, (image_width - 1) - max_x], [0, 0]],
'constant',
constant_values=1)
# Replace the old bbox content with the new augmented content.
image = image * mask_tensor + augmented_bbox_content
return image.astype(np.uint8)
def _concat_bbox(bbox, bboxes):
"""Helper function that concates bbox to bboxes along the first dimension."""
# Note if all elements in bboxes are -1 (_INVALID_BOX), then this means
# we discard bboxes and start the bboxes Tensor with the current bbox.
bboxes_sum_check = np.sum(bboxes)
bbox = np.expand_dims(bbox, 0)
# This check will be true when it is an _INVALID_BOX
if _equal(bboxes_sum_check, -4):
bboxes = bbox
else:
bboxes = np.concatenate([bboxes, bbox], 0)
return bboxes
def _apply_bbox_augmentation_wrapper(image, bbox, new_bboxes, prob,
augmentation_func, func_changes_bbox,
*args):
"""Applies _apply_bbox_augmentation with probability prob.
Args:
image: 3D uint8 Tensor.
bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
of type float that represents the normalized coordinates between 0 and 1.
new_bboxes: 2D Tensor that is a list of the bboxes in the image after they
have been altered by aug_func. These will only be changed when
func_changes_bbox is set to true. Each bbox has 4 elements
(min_y, min_x, max_y, max_x) of type float that are the normalized
bbox coordinates between 0 and 1.
prob: Float that is the probability of applying _apply_bbox_augmentation.
augmentation_func: Augmentation function that will be applied to the
subsection of image.
func_changes_bbox: Boolean. Does augmentation_func return bbox in addition
to image.
*args: Additional parameters that will be passed into augmentation_func
when it is called.
Returns:
A tuple. Fist element is a modified version of image, where the bbox
location in the image will have augmentation_func applied to it if it is
chosen to be called with probability `prob`. The second element is a
Tensor of Tensors of length 4 that will contain the altered bbox after
applying augmentation_func.
"""
should_apply_op = (np.random.rand() + prob >= 1)
if func_changes_bbox:
if should_apply_op:
augmented_image, bbox = augmentation_func(image, bbox, *args)
else:
augmented_image, bbox = (image, bbox)
else:
if should_apply_op:
augmented_image = _apply_bbox_augmentation(
image, bbox, augmentation_func, *args)
else:
augmented_image = image
new_bboxes = _concat_bbox(bbox, new_bboxes)
return augmented_image.astype(np.uint8), new_bboxes
def _apply_multi_bbox_augmentation(image, bboxes, prob, aug_func,
func_changes_bbox, *args):
"""Applies aug_func to the image for each bbox in bboxes.
Args:
image: 3D uint8 Tensor.
bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox
has 4 elements (min_y, min_x, max_y, max_x) of type float.
prob: Float that is the probability of applying aug_func to a specific
bounding box within the image.
aug_func: Augmentation function that will be applied to the
subsections of image indicated by the bbox values in bboxes.
func_changes_bbox: Boolean. Does augmentation_func return bbox in addition
to image.
*args: Additional parameters that will be passed into augmentation_func
when it is called.
Returns:
A modified version of image, where each bbox location in the image will
have augmentation_func applied to it if it is chosen to be called with
probability prob independently across all bboxes. Also the final
bboxes are returned that will be unchanged if func_changes_bbox is set to
false and if true, the new altered ones will be returned.
"""
# Will keep track of the new altered bboxes after aug_func is repeatedly
# applied. The -1 values are a dummy value and this first Tensor will be
# removed upon appending the first real bbox.
new_bboxes = np.array(_INVALID_BOX)
# If the bboxes are empty, then just give it _INVALID_BOX. The result
# will be thrown away.
bboxes = np.array((_INVALID_BOX)) if bboxes.size == 0 else bboxes
assert bboxes.shape[1] == 4, "bboxes.shape[1] must be 4!!!!"
# pylint:disable=g-long-lambda
# pylint:disable=line-too-long
wrapped_aug_func = lambda _image, bbox, _new_bboxes: _apply_bbox_augmentation_wrapper(_image, bbox, _new_bboxes, prob, aug_func, func_changes_bbox, *args)
# pylint:enable=g-long-lambda
# pylint:enable=line-too-long
# Setup the while_loop.
num_bboxes = bboxes.shape[0] # We loop until we go over all bboxes.
idx = 0 # Counter for the while loop.
# Conditional function when to end the loop once we go over all bboxes
# images_and_bboxes contain (_image, _new_bboxes)
def cond(_idx, _images_and_bboxes):
return _idx < num_bboxes
# Shuffle the bboxes so that the augmentation order is not deterministic if
# we are not changing the bboxes with aug_func.
# if not func_changes_bbox:
# print(bboxes)
# loop_bboxes = np.take(bboxes,np.random.permutation(bboxes.shape[0]),axis=0)
# print(loop_bboxes)
# else:
# loop_bboxes = bboxes
# we can not shuffle the bbox because it does not contain class information here
loop_bboxes = deepcopy(bboxes)
# Main function of while_loop where we repeatedly apply augmentation on the
# bboxes in the image.
# pylint:disable=g-long-lambda
body = lambda _idx, _images_and_bboxes: [
_idx + 1, wrapped_aug_func(_images_and_bboxes[0],
loop_bboxes[_idx],
_images_and_bboxes[1])]
while (cond(idx, (image, new_bboxes))):
idx, (image, new_bboxes) = body(idx, (image, new_bboxes))
# Either return the altered bboxes or the original ones depending on if
# we altered them in anyway.
if func_changes_bbox:
final_bboxes = new_bboxes
else:
final_bboxes = bboxes
return image, final_bboxes
def _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob, aug_func,
func_changes_bbox, *args):
"""Checks to be sure num bboxes > 0 before calling inner function."""
num_bboxes = len(bboxes)
new_image = deepcopy(image)
new_bboxes = deepcopy(bboxes)
if num_bboxes != 0:
new_image, new_bboxes = _apply_multi_bbox_augmentation(
new_image, new_bboxes, prob, aug_func, func_changes_bbox, *args)
return new_image, new_bboxes
def rotate_only_bboxes(image, bboxes, prob, degrees, replace):
"""Apply rotate to each bbox in the image with probability prob."""
func_changes_bbox = False
prob = _scale_bbox_only_op_probability(prob)
return _apply_multi_bbox_augmentation_wrapper(
image, bboxes, prob, rotate, func_changes_bbox, degrees, replace)
def shear_x_only_bboxes(image, bboxes, prob, level, replace):
"""Apply shear_x to each bbox in the image with probability prob."""
func_changes_bbox = False
prob = _scale_bbox_only_op_probability(prob)
return _apply_multi_bbox_augmentation_wrapper(
image, bboxes, prob, shear_x, func_changes_bbox, level, replace)
def shear_y_only_bboxes(image, bboxes, prob, level, replace):
"""Apply shear_y to each bbox in the image with probability prob."""
func_changes_bbox = False
prob = _scale_bbox_only_op_probability(prob)
return _apply_multi_bbox_augmentation_wrapper(
image, bboxes, prob, shear_y, func_changes_bbox, level, replace)
def translate_x_only_bboxes(image, bboxes, prob, pixels, replace):
"""Apply translate_x to each bbox in the image with probability prob."""
func_changes_bbox = False
prob = _scale_bbox_only_op_probability(prob)
return _apply_multi_bbox_augmentation_wrapper(
image, bboxes, prob, translate_x, func_changes_bbox, pixels, replace)
def translate_y_only_bboxes(image, bboxes, prob, pixels, replace):
"""Apply translate_y to each bbox in the image with probability prob."""
func_changes_bbox = False
prob = _scale_bbox_only_op_probability(prob)
return _apply_multi_bbox_augmentation_wrapper(
image, bboxes, prob, translate_y, func_changes_bbox, pixels, replace)
def flip_only_bboxes(image, bboxes, prob):
"""Apply flip_lr to each bbox in the image with probability prob."""
func_changes_bbox = False
prob = _scale_bbox_only_op_probability(prob)
return _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob,
np.fliplr, func_changes_bbox)
def solarize_only_bboxes(image, bboxes, prob, threshold):
"""Apply solarize to each bbox in the image with probability prob."""
func_changes_bbox = False
prob = _scale_bbox_only_op_probability(prob)
return _apply_multi_bbox_augmentation_wrapper(
image, bboxes, prob, solarize, func_changes_bbox, threshold)
def equalize_only_bboxes(image, bboxes, prob):
"""Apply equalize to each bbox in the image with probability prob."""
func_changes_bbox = False
prob = _scale_bbox_only_op_probability(prob)
return _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob,
equalize, func_changes_bbox)
def cutout_only_bboxes(image, bboxes, prob, pad_size, replace):
"""Apply cutout to each bbox in the image with probability prob."""
func_changes_bbox = False
prob = _scale_bbox_only_op_probability(prob)
return _apply_multi_bbox_augmentation_wrapper(
image, bboxes, prob, cutout, func_changes_bbox, pad_size, replace)
def _rotate_bbox(bbox, image_height, image_width, degrees):
"""Rotates the bbox coordinated by degrees.
Args:
bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
of type float that represents the normalized coordinates between 0 and 1.
image_height: Int, height of the image.
image_width: Int, height of the image.
degrees: Float, a scalar angle in degrees to rotate all images by. If
degrees is positive the image will be rotated clockwise otherwise it will
be rotated counterclockwise.
Returns:
A tensor of the same shape as bbox, but now with the rotated coordinates.
"""
image_height, image_width = (float(image_height), float(image_width))
# Convert from degrees to radians.
degrees_to_radians = math.pi / 180.0
radians = degrees * degrees_to_radians
# Translate the bbox to the center of the image and turn the normalized 0-1
# coordinates to absolute pixel locations.
# Y coordinates are made negative as the y axis of images goes down with
# increasing pixel values, so we negate to make sure x axis and y axis points
# are in the traditionally positive direction.
min_y = -int(image_height * (bbox[0] - 0.5))
min_x = int(image_width * (bbox[1] - 0.5))
max_y = -int(image_height * (bbox[2] - 0.5))
max_x = int(image_width * (bbox[3] - 0.5))
coordinates = np.stack([[min_y, min_x], [min_y, max_x], [max_y, min_x],
[max_y, max_x]]).astype(np.float32)
# Rotate the coordinates according to the rotation matrix clockwise if
# radians is positive, else negative
rotation_matrix = np.stack([[math.cos(radians), math.sin(radians)],
[-math.sin(radians), math.cos(radians)]])
new_coords = np.matmul(rotation_matrix,
np.transpose(coordinates)).astype(np.int32)
# Find min/max values and convert them back to normalized 0-1 floats.
min_y = -(float(np.max(new_coords[0, :])) / image_height - 0.5)
min_x = float(np.min(new_coords[1, :])) / image_width + 0.5
max_y = -(float(np.min(new_coords[0, :])) / image_height - 0.5)
max_x = float(np.max(new_coords[1, :])) / image_width + 0.5
# Clip the bboxes to be sure the fall between [0, 1].
min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x)
min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x)
return np.stack([min_y, min_x, max_y, max_x])
def rotate_with_bboxes(image, bboxes, degrees, replace):
# Rotate the image.
image = rotate(image, degrees, replace)
# Convert bbox coordinates to pixel values.
image_height, image_width = image.shape[:2]
# pylint:disable=g-long-lambda
wrapped_rotate_bbox = lambda bbox: _rotate_bbox(bbox, image_height, image_width, degrees)
# pylint:enable=g-long-lambda
new_bboxes = np.zeros_like(bboxes)
for idx in range(len(bboxes)):
new_bboxes[idx] = wrapped_rotate_bbox(bboxes[idx])
return image, new_bboxes
def translate_x(image, pixels, replace):
"""Equivalent of PIL Translate in X dimension."""
image = Image.fromarray(wrap(image))
image = image.transform(image.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0))
return unwrap(np.array(image), replace)
def translate_y(image, pixels, replace):
"""Equivalent of PIL Translate in Y dimension."""
image = Image.fromarray(wrap(image))
image = image.transform(image.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels))
return unwrap(np.array(image), replace)
def _shift_bbox(bbox, image_height, image_width, pixels, shift_horizontal):
"""Shifts the bbox coordinates by pixels.
Args:
bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
of type float that represents the normalized coordinates between 0 and 1.
image_height: Int, height of the image.
image_width: Int, width of the image.
pixels: An int. How many pixels to shift the bbox.
shift_horizontal: Boolean. If true then shift in X dimension else shift in
Y dimension.
Returns:
A tensor of the same shape as bbox, but now with the shifted coordinates.
"""
pixels = int(pixels)
# Convert bbox to integer pixel locations.
min_y = int(float(image_height) * bbox[0])
min_x = int(float(image_width) * bbox[1])
max_y = int(float(image_height) * bbox[2])
max_x = int(float(image_width) * bbox[3])
if shift_horizontal:
min_x = np.maximum(0, min_x - pixels)
max_x = np.minimum(image_width, max_x - pixels)
else:
min_y = np.maximum(0, min_y - pixels)
max_y = np.minimum(image_height, max_y - pixels)
# Convert bbox back to floats.
min_y = float(min_y) / float(image_height)
min_x = float(min_x) / float(image_width)
max_y = float(max_y) / float(image_height)
max_x = float(max_x) / float(image_width)
# Clip the bboxes to be sure the fall between [0, 1].
min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x)
min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x)
return np.stack([min_y, min_x, max_y, max_x])
def translate_bbox(image, bboxes, pixels, replace, shift_horizontal):
"""Equivalent of PIL Translate in X/Y dimension that shifts image and bbox.
Args:
image: 3D uint8 Tensor.
bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox
has 4 elements (min_y, min_x, max_y, max_x) of type float with values
between [0, 1].
pixels: An int. How many pixels to shift the image and bboxes
replace: A one or three value 1D tensor to fill empty pixels.
shift_horizontal: Boolean. If true then shift in X dimension else shift in
Y dimension.
Returns:
A tuple containing a 3D uint8 Tensor that will be the result of translating
image by pixels. The second element of the tuple is bboxes, where now
the coordinates will be shifted to reflect the shifted image.
"""
if shift_horizontal:
image = translate_x(image, pixels, replace)
else:
image = translate_y(image, pixels, replace)
# Convert bbox coordinates to pixel values.
image_height, image_width = image.shape[0], image.shape[1]
# pylint:disable=g-long-lambda
wrapped_shift_bbox = lambda bbox: _shift_bbox(bbox, image_height, image_width, pixels, shift_horizontal)
# pylint:enable=g-long-lambda
new_bboxes = deepcopy(bboxes)
num_bboxes = len(bboxes)
for idx in range(num_bboxes):
new_bboxes[idx] = wrapped_shift_bbox(bboxes[idx])
return image.astype(np.uint8), new_bboxes
def shear_x(image, level, replace):
"""Equivalent of PIL Shearing in X dimension."""
# Shear parallel to x axis is a projective transform
# with a matrix form of:
# [1 level
# 0 1].
image = Image.fromarray(wrap(image))
image = image.transform(image.size, Image.AFFINE, (1, level, 0, 0, 1, 0))
return unwrap(np.array(image), replace)
def shear_y(image, level, replace):
"""Equivalent of PIL Shearing in Y dimension."""
# Shear parallel to y axis is a projective transform
# with a matrix form of:
# [1 0
# level 1].
image = Image.fromarray(wrap(image))
image = image.transform(image.size, Image.AFFINE, (1, 0, 0, level, 1, 0))
return unwrap(np.array(image), replace)
def _shear_bbox(bbox, image_height, image_width, level, shear_horizontal):
"""Shifts the bbox according to how the image was sheared.
Args:
bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
of type float that represents the normalized coordinates between 0 and 1.
image_height: Int, height of the image.
image_width: Int, height of the image.
level: Float. How much to shear the image.
shear_horizontal: If true then shear in X dimension else shear in
the Y dimension.
Returns:
A tensor of the same shape as bbox, but now with the shifted coordinates.
"""
image_height, image_width = (float(image_height), float(image_width))
# Change bbox coordinates to be pixels.
min_y = int(image_height * bbox[0])
min_x = int(image_width * bbox[1])
max_y = int(image_height * bbox[2])
max_x = int(image_width * bbox[3])
coordinates = np.stack(
[[min_y, min_x], [min_y, max_x], [max_y, min_x], [max_y, max_x]])
coordinates = coordinates.astype(np.float32)
# Shear the coordinates according to the translation matrix.
if shear_horizontal:
translation_matrix = np.stack([[1, 0], [-level, 1]])
else:
translation_matrix = np.stack([[1, -level], [0, 1]])
translation_matrix = translation_matrix.astype(np.float32)
new_coords = np.matmul(translation_matrix,
np.transpose(coordinates)).astype(np.int32)
# Find min/max values and convert them back to floats.
min_y = float(np.min(new_coords[0, :])) / image_height
min_x = float(np.min(new_coords[1, :])) / image_width
max_y = float(np.max(new_coords[0, :])) / image_height
max_x = float(np.max(new_coords[1, :])) / image_width
# Clip the bboxes to be sure the fall between [0, 1].
min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x)
min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x)
return np.stack([min_y, min_x, max_y, max_x])
def shear_with_bboxes(image, bboxes, level, replace, shear_horizontal):
"""Applies Shear Transformation to the image and shifts the bboxes.
Args:
image: 3D uint8 Tensor.
bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox
has 4 elements (min_y, min_x, max_y, max_x) of type float with values
between [0, 1].
level: Float. How much to shear the image. This value will be between
-0.3 to 0.3.
replace: A one or three value 1D tensor to fill empty pixels.
shear_horizontal: Boolean. If true then shear in X dimension else shear in
the Y dimension.
Returns:
A tuple containing a 3D uint8 Tensor that will be the result of shearing
image by level. The second element of the tuple is bboxes, where now
the coordinates will be shifted to reflect the sheared image.
"""
if shear_horizontal:
image = shear_x(image, level, replace)
else:
image = shear_y(image, level, replace)
# Convert bbox coordinates to pixel values.
image_height, image_width = image.shape[:2]
# pylint:disable=g-long-lambda
wrapped_shear_bbox = lambda bbox: _shear_bbox(bbox, image_height, image_width, level, shear_horizontal)
# pylint:enable=g-long-lambda
new_bboxes = deepcopy(bboxes)
num_bboxes = len(bboxes)
for idx in range(num_bboxes):
new_bboxes[idx] = wrapped_shear_bbox(bboxes[idx])
return image.astype(np.uint8), new_bboxes
def autocontrast(image):
"""Implements Autocontrast function from PIL.
Args:
image: A 3D uint8 tensor.
Returns:
The image after it has had autocontrast applied to it and will be of type
uint8.
"""
def scale_channel(image):
"""Scale the 2D image using the autocontrast rule."""
# A possibly cheaper version can be done using cumsum/unique_with_counts
# over the histogram values, rather than iterating over the entire image.
# to compute mins and maxes.
lo = float(np.min(image))
hi = float(np.max(image))
# Scale the image, making the lowest value 0 and the highest value 255.
def scale_values(im):
scale = 255.0 / (hi - lo)
offset = -lo * scale
im = im.astype(np.float32) * scale + offset
img = np.clip(im, a_min=0, a_max=255.0)
return im.astype(np.uint8)
result = scale_values(image) if hi > lo else image
return result
# Assumes RGB for now. Scales each channel independently
# and then stacks the result.
s1 = scale_channel(image[:, :, 0])
s2 = scale_channel(image[:, :, 1])
s3 = scale_channel(image[:, :, 2])
image = np.stack([s1, s2, s3], 2)
return image
def sharpness(image, factor):
"""Implements Sharpness function from PIL."""
orig_image = image
image = image.astype(np.float32)
# Make image 4D for conv operation.
# SMOOTH PIL Kernel.
kernel = np.array(
[[1, 1, 1], [1, 5, 1], [1, 1, 1]], dtype=np.float32) / 13.
result = cv2.filter2D(image, -1, kernel).astype(np.uint8)
# Blend the final result.
return blend(result, orig_image, factor)
def equalize(image):
"""Implements Equalize function from PIL using."""
def scale_channel(im, c):
"""Scale the data in the channel to implement equalize."""
im = im[:, :, c].astype(np.int32)
# Compute the histogram of the image channel.
histo, _ = np.histogram(im, range=[0, 255], bins=256)
# For the purposes of computing the step, filter out the nonzeros.
nonzero = np.where(np.not_equal(histo, 0))
nonzero_histo = np.reshape(np.take(histo, nonzero), [-1])
step = (np.sum(nonzero_histo) - nonzero_histo[-1]) // 255
def build_lut(histo, step):
# Compute the cumulative sum, shifting by step // 2
# and then normalization by step.
lut = (np.cumsum(histo) + (step // 2)) // step
# Shift lut, prepending with 0.
lut = np.concatenate([[0], lut[:-1]], 0)
# Clip the counts to be in range. This is done
# in the C code for image.point.
return np.clip(lut, a_min=0, a_max=255).astype(np.uint8)
# If step is zero, return the original image. Otherwise, build
# lut from the full histogram and step and then index from it.
if step == 0:
result = im
else:
result = np.take(build_lut(histo, step), im)
return result.astype(np.uint8)
# Assumes RGB for now. Scales each channel independently
# and then stacks the result.
s1 = scale_channel(image, 0)
s2 = scale_channel(image, 1)
s3 = scale_channel(image, 2)
image = np.stack([s1, s2, s3], 2)
return image
def wrap(image):
"""Returns 'image' with an extra channel set to all 1s."""
shape = image.shape
extended_channel = 255 * np.ones([shape[0], shape[1], 1], image.dtype)
extended = np.concatenate([image, extended_channel], 2).astype(image.dtype)
return extended
def unwrap(image, replace):
"""Unwraps an image produced by wrap.
Where there is a 0 in the last channel for every spatial position,
the rest of the three channels in that spatial dimension are grayed
(set to 128). Operations like translate and shear on a wrapped
Tensor will leave 0s in empty locations. Some transformations look
at the intensity of values to do preprocessing, and we want these
empty pixels to assume the 'average' value, rather than pure black.
Args:
image: A 3D Image Tensor with 4 channels.
replace: A one or three value 1D tensor to fill empty pixels.
Returns:
image: A 3D image Tensor with 3 channels.
"""
image_shape = image.shape
# Flatten the spatial dimensions.
flattened_image = np.reshape(image, [-1, image_shape[2]])
# Find all pixels where the last channel is zero.
alpha_channel = flattened_image[:, 3]
replace = np.concatenate([replace, np.ones([1], image.dtype)], 0)
# Where they are zero, fill them in with 'replace'.
alpha_channel = np.reshape(alpha_channel, (-1, 1))
alpha_channel = np.tile(alpha_channel, reps=(1, flattened_image.shape[1]))
flattened_image = np.where(
np.equal(alpha_channel, 0),
np.ones_like(
flattened_image, dtype=image.dtype) * replace,
flattened_image)
image = np.reshape(flattened_image, image_shape)
image = image[:, :, :3]
return image.astype(np.uint8)
def _cutout_inside_bbox(image, bbox, pad_fraction):
"""Generates cutout mask and the mean pixel value of the bbox.
First a location is randomly chosen within the image as the center where the
cutout mask will be applied. Note this can be towards the boundaries of the
image, so the full cutout mask may not be applied.
Args:
image: 3D uint8 Tensor.
bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x)
of type float that represents the normalized coordinates between 0 and 1.
pad_fraction: Float that specifies how large the cutout mask should be in
in reference to the size of the original bbox. If pad_fraction is 0.25,
then the cutout mask will be of shape
(0.25 * bbox height, 0.25 * bbox width).
Returns:
A tuple. Fist element is a tensor of the same shape as image where each
element is either a 1 or 0 that is used to determine where the image
will have cutout applied. The second element is the mean of the pixels
in the image where the bbox is located.
mask value: [0,1]
"""
image_height, image_width = image.shape[0], image.shape[1]
# Transform from shape [1, 4] to [4].
bbox = np.squeeze(bbox)
min_y = int(float(image_height) * bbox[0])
min_x = int(float(image_width) * bbox[1])
max_y = int(float(image_height) * bbox[2])
max_x = int(float(image_width) * bbox[3])
# Calculate the mean pixel values in the bounding box, which will be used
# to fill the cutout region.
mean = np.mean(image[min_y:max_y + 1, min_x:max_x + 1], axis=(0, 1))
# Cutout mask will be size pad_size_heigh * 2 by pad_size_width * 2 if the
# region lies entirely within the bbox.
box_height = max_y - min_y + 1
box_width = max_x - min_x + 1
pad_size_height = int(pad_fraction * (box_height / 2))
pad_size_width = int(pad_fraction * (box_width / 2))
# Sample the center location in the image where the zero mask will be applied.
cutout_center_height = np.random.randint(min_y, max_y + 1, dtype=np.int32)
cutout_center_width = np.random.randint(min_x, max_x + 1, dtype=np.int32)
lower_pad = np.maximum(0, cutout_center_height - pad_size_height)
upper_pad = np.maximum(
0, image_height - cutout_center_height - pad_size_height)
left_pad = np.maximum(0, cutout_center_width - pad_size_width)
right_pad = np.maximum(0,
image_width - cutout_center_width - pad_size_width)
cutout_shape = [
image_height - (lower_pad + upper_pad),
image_width - (left_pad + right_pad)
]
padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]]
mask = np.pad(np.zeros(
cutout_shape, dtype=image.dtype),
padding_dims,
'constant',
constant_values=1)
mask = np.expand_dims(mask, 2)
mask = np.tile(mask, [1, 1, 3])
return mask, mean
def bbox_cutout(image, bboxes, pad_fraction, replace_with_mean):
"""Applies cutout to the image according to bbox information.
This is a cutout variant that using bbox information to make more informed
decisions on where to place the cutout mask.
Args:
image: 3D uint8 Tensor.
bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox
has 4 elements (min_y, min_x, max_y, max_x) of type float with values
between [0, 1].
pad_fraction: Float that specifies how large the cutout mask should be in
in reference to the size of the original bbox. If pad_fraction is 0.25,
then the cutout mask will be of shape
(0.25 * bbox height, 0.25 * bbox width).
replace_with_mean: Boolean that specified what value should be filled in
where the cutout mask is applied. Since the incoming image will be of
uint8 and will not have had any mean normalization applied, by default
we set the value to be 128. If replace_with_mean is True then we find
the mean pixel values across the channel dimension and use those to fill
in where the cutout mask is applied.
Returns:
A tuple. First element is a tensor of the same shape as image that has
cutout applied to it. Second element is the bboxes that were passed in
that will be unchanged.
"""
def apply_bbox_cutout(image, bboxes, pad_fraction):
"""Applies cutout to a single bounding box within image."""
# Choose a single bounding box to apply cutout to.
random_index = np.random.randint(0, bboxes.shape[0], dtype=np.int32)
# Select the corresponding bbox and apply cutout.
chosen_bbox = np.take(bboxes, random_index, axis=0)
mask, mean = _cutout_inside_bbox(image, chosen_bbox, pad_fraction)
# When applying cutout we either set the pixel value to 128 or to the mean
# value inside the bbox.
replace = mean if replace_with_mean else [128] * 3
# Apply the cutout mask to the image. Where the mask is 0 we fill it with
# `replace`.
image = np.where(
np.equal(mask, 0),
np.ones_like(
image, dtype=image.dtype) * replace,
image).astype(image.dtype)
return image
# Check to see if there are boxes, if so then apply boxcutout.
if len(bboxes) != 0:
image = apply_bbox_cutout(image, bboxes, pad_fraction)
return image, bboxes
NAME_TO_FUNC = {
'AutoContrast': autocontrast,
'Equalize': equalize,
'Posterize': posterize,
'Solarize': solarize,
'SolarizeAdd': solarize_add,
'Color': color,
'Contrast': contrast,
'Brightness': brightness,
'Sharpness': sharpness,
'Cutout': cutout,
'BBox_Cutout': bbox_cutout,
'Rotate_BBox': rotate_with_bboxes,
# pylint:disable=g-long-lambda
'TranslateX_BBox': lambda image, bboxes, pixels, replace: translate_bbox(
image, bboxes, pixels, replace, shift_horizontal=True),
'TranslateY_BBox': lambda image, bboxes, pixels, replace: translate_bbox(
image, bboxes, pixels, replace, shift_horizontal=False),
'ShearX_BBox': lambda image, bboxes, level, replace: shear_with_bboxes(
image, bboxes, level, replace, shear_horizontal=True),
'ShearY_BBox': lambda image, bboxes, level, replace: shear_with_bboxes(
image, bboxes, level, replace, shear_horizontal=False),
# pylint:enable=g-long-lambda
'Rotate_Only_BBoxes': rotate_only_bboxes,
'ShearX_Only_BBoxes': shear_x_only_bboxes,
'ShearY_Only_BBoxes': shear_y_only_bboxes,
'TranslateX_Only_BBoxes': translate_x_only_bboxes,
'TranslateY_Only_BBoxes': translate_y_only_bboxes,
'Flip_Only_BBoxes': flip_only_bboxes,
'Solarize_Only_BBoxes': solarize_only_bboxes,
'Equalize_Only_BBoxes': equalize_only_bboxes,
'Cutout_Only_BBoxes': cutout_only_bboxes,
}
def _randomly_negate_tensor(tensor):
"""With 50% prob turn the tensor negative."""
should_flip = np.floor(np.random.rand() + 0.5) >= 1
final_tensor = tensor if should_flip else -tensor
return final_tensor
def _rotate_level_to_arg(level):
level = (level / _MAX_LEVEL) * 30.
level = _randomly_negate_tensor(level)
return (level, )
def _shrink_level_to_arg(level):
"""Converts level to ratio by which we shrink the image content."""
if level == 0:
return (1.0, ) # if level is zero, do not shrink the image
# Maximum shrinking ratio is 2.9.
level = 2. / (_MAX_LEVEL / level) + 0.9
return (level, )
def _enhance_level_to_arg(level):
return ((level / _MAX_LEVEL) * 1.8 + 0.1, )
def _shear_level_to_arg(level):
level = (level / _MAX_LEVEL) * 0.3
# Flip level to negative with 50% chance.
level = _randomly_negate_tensor(level)
return (level, )
def _translate_level_to_arg(level, translate_const):
level = (level / _MAX_LEVEL) * float(translate_const)
# Flip level to negative with 50% chance.
level = _randomly_negate_tensor(level)
return (level, )
def _bbox_cutout_level_to_arg(level, hparams):
cutout_pad_fraction = (
level / _MAX_LEVEL) * 0.75 # hparams.cutout_max_pad_fraction
return (cutout_pad_fraction,
False) # hparams.cutout_bbox_replace_with_mean
def level_to_arg(hparams):
return {
'AutoContrast': lambda level: (),
'Equalize': lambda level: (),
'Posterize': lambda level: (int((level / _MAX_LEVEL) * 4), ),
'Solarize': lambda level: (int((level / _MAX_LEVEL) * 256), ),
'SolarizeAdd': lambda level: (int((level / _MAX_LEVEL) * 110), ),
'Color': _enhance_level_to_arg,
'Contrast': _enhance_level_to_arg,
'Brightness': _enhance_level_to_arg,
'Sharpness': _enhance_level_to_arg,
'Cutout':
lambda level: (int((level / _MAX_LEVEL) * 100), ), # hparams.cutout_const=100
# pylint:disable=g-long-lambda
'BBox_Cutout': lambda level: _bbox_cutout_level_to_arg(level, hparams),
'TranslateX_BBox':
lambda level: _translate_level_to_arg(level, 250), # hparams.translate_const=250
'TranslateY_BBox':
lambda level: _translate_level_to_arg(level, 250), # hparams.translate_cons
# pylint:enable=g-long-lambda
'ShearX_BBox': _shear_level_to_arg,
'ShearY_BBox': _shear_level_to_arg,
'Rotate_BBox': _rotate_level_to_arg,
'Rotate_Only_BBoxes': _rotate_level_to_arg,
'ShearX_Only_BBoxes': _shear_level_to_arg,
'ShearY_Only_BBoxes': _shear_level_to_arg,
# pylint:disable=g-long-lambda
'TranslateX_Only_BBoxes':
lambda level: _translate_level_to_arg(level, 120), # hparams.translate_bbox_const
'TranslateY_Only_BBoxes':
lambda level: _translate_level_to_arg(level, 120), # hparams.translate_bbox_const
# pylint:enable=g-long-lambda
'Flip_Only_BBoxes': lambda level: (),
'Solarize_Only_BBoxes':
lambda level: (int((level / _MAX_LEVEL) * 256), ),
'Equalize_Only_BBoxes': lambda level: (),
# pylint:disable=g-long-lambda
'Cutout_Only_BBoxes':
lambda level: (int((level / _MAX_LEVEL) * 50), ), # hparams.cutout_bbox_const
# pylint:enable=g-long-lambda
}
def bbox_wrapper(func):
"""Adds a bboxes function argument to func and returns unchanged bboxes."""
def wrapper(images, bboxes, *args, **kwargs):
return (func(images, *args, **kwargs), bboxes)
return wrapper
def _parse_policy_info(name, prob, level, replace_value, augmentation_hparams):
"""Return the function that corresponds to `name` and update `level` param."""
func = NAME_TO_FUNC[name]
args = level_to_arg(augmentation_hparams)[name](level)
# Check to see if prob is passed into function. This is used for operations
# where we alter bboxes independently.
# pytype:disable=wrong-arg-types
if 'prob' in inspect.getfullargspec(func)[0]:
args = tuple([prob] + list(args))
# pytype:enable=wrong-arg-types
# Add in replace arg if it is required for the function that is being called.
if 'replace' in inspect.getfullargspec(func)[0]:
# Make sure replace is the final argument
assert 'replace' == inspect.getfullargspec(func)[0][-1]
args = tuple(list(args) + [replace_value])
# Add bboxes as the second positional argument for the function if it does
# not already exist.
if 'bboxes' not in inspect.getfullargspec(func)[0]:
func = bbox_wrapper(func)
return (func, prob, args)
def _apply_func_with_prob(func, image, args, prob, bboxes):
"""Apply `func` to image w/ `args` as input with probability `prob`."""
assert isinstance(args, tuple)
assert 'bboxes' == inspect.getfullargspec(func)[0][1]
# If prob is a function argument, then this randomness is being handled
# inside the function, so make sure it is always called.
if 'prob' in inspect.getfullargspec(func)[0]:
prob = 1.0
# Apply the function with probability `prob`.
should_apply_op = np.floor(np.random.rand() + 0.5) >= 1
if should_apply_op:
augmented_image, augmented_bboxes = func(image, bboxes, *args)
else:
augmented_image, augmented_bboxes = (image, bboxes)
return augmented_image, augmented_bboxes
def select_and_apply_random_policy(policies, image, bboxes):
"""Select a random policy from `policies` and apply it to `image`."""
policy_to_select = np.random.randint(0, len(policies), dtype=np.int32)
# policy_to_select = 6 # for test
for (i, policy) in enumerate(policies):
if i == policy_to_select:
image, bboxes = policy(image, bboxes)
return (image, bboxes)
def build_and_apply_nas_policy(policies, image, bboxes, augmentation_hparams):
"""Build a policy from the given policies passed in and apply to image.
Args:
policies: list of lists of tuples in the form `(func, prob, level)`, `func`
is a string name of the augmentation function, `prob` is the probability
of applying the `func` operation, `level` is the input argument for
`func`.
image: numpy array that the resulting policy will be applied to.
bboxes:
augmentation_hparams: Hparams associated with the NAS learned policy.
Returns:
A version of image that now has data augmentation applied to it based on
the `policies` pass into the function. Additionally, returns bboxes if
a value for them is passed in that is not None
"""
replace_value = [128, 128, 128]
# func is the string name of the augmentation function, prob is the
# probability of applying the operation and level is the parameter associated
# tf_policies are functions that take in an image and return an augmented
# image.
tf_policies = []
for policy in policies:
tf_policy = []
# Link string name to the correct python function and make sure the correct
# argument is passed into that function.
for policy_info in policy:
policy_info = list(
policy_info) + [replace_value, augmentation_hparams]
tf_policy.append(_parse_policy_info(*policy_info))
# Now build the tf policy that will apply the augmentation procedue
# on image.
def make_final_policy(tf_policy_):
def final_policy(image_, bboxes_):
for func, prob, args in tf_policy_:
image_, bboxes_ = _apply_func_with_prob(func, image_, args,
prob, bboxes_)
return image_, bboxes_
return final_policy
tf_policies.append(make_final_policy(tf_policy))
augmented_images, augmented_bboxes = select_and_apply_random_policy(
tf_policies, image, bboxes)
# If no bounding boxes were specified, then just return the images.
return (augmented_images, augmented_bboxes)
# TODO(barretzoph): Add in ArXiv link once paper is out.
def distort_image_with_autoaugment(image, bboxes, augmentation_name):
"""Applies the AutoAugment policy to `image` and `bboxes`.
Args:
image: `Tensor` of shape [height, width, 3] representing an image.
bboxes: `Tensor` of shape [N, 4] representing ground truth boxes that are
normalized between [0, 1].
augmentation_name: The name of the AutoAugment policy to use. The available
options are `v0`, `v1`, `v2`, `v3` and `test`. `v0` is the policy used for
all of the results in the paper and was found to achieve the best results
on the COCO dataset. `v1`, `v2` and `v3` are additional good policies
found on the COCO dataset that have slight variation in what operations
were used during the search procedure along with how many operations are
applied in parallel to a single image (2 vs 3).
Returns:
A tuple containing the augmented versions of `image` and `bboxes`.
"""
available_policies = {
'v0': policy_v0,
'v1': policy_v1,
'v2': policy_v2,
'v3': policy_v3,
'test': policy_vtest
}
if augmentation_name not in available_policies:
raise ValueError('Invalid augmentation_name: {}'.format(
augmentation_name))
policy = available_policies[augmentation_name]()
augmentation_hparams = {}
return build_and_apply_nas_policy(policy, image, bboxes,
augmentation_hparams)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
try:
from collections.abc import Sequence
except Exception:
from collections import Sequence
import cv2
import numpy as np
import math
import copy
from lib.utils.keypoint_utils import warp_affine_joints, get_affine_transform, affine_transform, get_warp_matrix
from lib.utils.workspace import serializable
from lib.utils.logger import setup_logger
logger = setup_logger(__name__)
registered_ops = []
__all__ = [
'RandomFlipHalfBodyTransform',
'TopDownAffine',
'ToHeatmapsTopDown',
'TopDownEvalAffine',
]
def register_keypointop(cls):
return serializable(cls)
@register_keypointop
class RandomFlipHalfBodyTransform(object):
"""apply data augment to image and coords
to achieve the flip, scale, rotate and half body transform effect for training image
Args:
trainsize (list):[w, h], Image target size
upper_body_ids (list): The upper body joint ids
flip_pairs (list): The left-right joints exchange order list
pixel_std (int): The pixel std of the scale
scale (float): The scale factor to transform the image
rot (int): The rotate factor to transform the image
num_joints_half_body (int): The joints threshold of the half body transform
prob_half_body (float): The threshold of the half body transform
flip (bool): Whether to flip the image
Returns:
records(dict): contain the image and coords after tranformed
"""
def __init__(self,
trainsize,
upper_body_ids,
flip_pairs,
pixel_std,
scale=0.35,
rot=40,
num_joints_half_body=8,
prob_half_body=0.3,
flip=True,
rot_prob=0.6):
super(RandomFlipHalfBodyTransform, self).__init__()
self.trainsize = trainsize
self.upper_body_ids = upper_body_ids
self.flip_pairs = flip_pairs
self.pixel_std = pixel_std
self.scale = scale
self.rot = rot
self.num_joints_half_body = num_joints_half_body
self.prob_half_body = prob_half_body
self.flip = flip
self.aspect_ratio = trainsize[0] * 1.0 / trainsize[1]
self.rot_prob = rot_prob
def halfbody_transform(self, joints, joints_vis):
upper_joints = []
lower_joints = []
for joint_id in range(joints.shape[0]):
if joints_vis[joint_id][0] > 0:
if joint_id in self.upper_body_ids:
upper_joints.append(joints[joint_id])
else:
lower_joints.append(joints[joint_id])
if np.random.randn() < 0.5 and len(upper_joints) > 2:
selected_joints = upper_joints
else:
selected_joints = lower_joints if len(
lower_joints) > 2 else upper_joints
if len(selected_joints) < 2:
return None, None
selected_joints = np.array(selected_joints, dtype=np.float32)
center = selected_joints.mean(axis=0)[:2]
left_top = np.amin(selected_joints, axis=0)
right_bottom = np.amax(selected_joints, axis=0)
w = right_bottom[0] - left_top[0]
h = right_bottom[1] - left_top[1]
if w > self.aspect_ratio * h:
h = w * 1.0 / self.aspect_ratio
elif w < self.aspect_ratio * h:
w = h * self.aspect_ratio
scale = np.array(
[w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std],
dtype=np.float32)
scale = scale * 1.5
return center, scale
def flip_joints(self, joints, joints_vis, width, matched_parts):
joints[:, 0] = width - joints[:, 0] - 1
for pair in matched_parts:
joints[pair[0], :], joints[pair[1], :] = \
joints[pair[1], :], joints[pair[0], :].copy()
joints_vis[pair[0], :], joints_vis[pair[1], :] = \
joints_vis[pair[1], :], joints_vis[pair[0], :].copy()
return joints * joints_vis, joints_vis
def __call__(self, records):
image = records['image']
joints = records['joints']
joints_vis = records['joints_vis']
c = records['center']
s = records['scale']
r = 0
if (np.sum(joints_vis[:, 0]) > self.num_joints_half_body and
np.random.rand() < self.prob_half_body):
c_half_body, s_half_body = self.halfbody_transform(joints,
joints_vis)
if c_half_body is not None and s_half_body is not None:
c, s = c_half_body, s_half_body
sf = self.scale
rf = self.rot
s = s * np.clip(np.random.randn() * sf + 1, 1 - sf, 1 + sf)
r = np.clip(np.random.randn() * rf, -rf * 2,
rf * 2) if np.random.random() <= self.rot_prob else 0
if self.flip and np.random.random() <= 0.5:
image = image[:, ::-1, :]
joints, joints_vis = self.flip_joints(
joints, joints_vis, image.shape[1], self.flip_pairs)
c[0] = image.shape[1] - c[0] - 1
records['image'] = image
records['joints'] = joints
records['joints_vis'] = joints_vis
records['center'] = c
records['scale'] = s
records['rotate'] = r
return records
@register_keypointop
class TopDownAffine(object):
"""apply affine transform to image and coords
Args:
trainsize (list): [w, h], the standard size used to train
use_udp (bool): whether to use Unbiased Data Processing.
records(dict): the dict contained the image and coords
Returns:
records (dict): contain the image and coords after tranformed
"""
def __init__(self, trainsize, use_udp=False):
self.trainsize = trainsize
self.use_udp = use_udp
def __call__(self, records):
image = records['image']
joints = records['joints']
joints_vis = records['joints_vis']
rot = records['rotate'] if "rotate" in records else 0
if self.use_udp:
trans = get_warp_matrix(
rot, records['center'] * 2.0,
[self.trainsize[0] - 1.0, self.trainsize[1] - 1.0],
records['scale'] * 200.0)
image = cv2.warpAffine(
image,
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
flags=cv2.INTER_LINEAR)
joints[:, 0:2] = warp_affine_joints(joints[:, 0:2].copy(), trans)
else:
trans = get_affine_transform(records['center'], records['scale'] *
200, rot, self.trainsize)
image = cv2.warpAffine(
image,
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
flags=cv2.INTER_LINEAR)
for i in range(joints.shape[0]):
if joints_vis[i, 0] > 0.0:
joints[i, 0:2] = affine_transform(joints[i, 0:2], trans)
records['image'] = image
records['joints'] = joints
return records
@register_keypointop
class TopDownEvalAffine(object):
"""apply affine transform to image and coords
Args:
trainsize (list): [w, h], the standard size used to train
use_udp (bool): whether to use Unbiased Data Processing.
records(dict): the dict contained the image and coords
Returns:
records (dict): contain the image and coords after tranformed
"""
def __init__(self, trainsize, use_udp=False):
self.trainsize = trainsize
self.use_udp = use_udp
def __call__(self, records):
image = records['image']
rot = 0
imshape = records['im_shape'][::-1]
center = imshape / 2.
scale = imshape
if self.use_udp:
trans = get_warp_matrix(
rot, center * 2.0,
[self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
image = cv2.warpAffine(
image,
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
flags=cv2.INTER_LINEAR)
else:
trans = get_affine_transform(center, scale, rot, self.trainsize)
image = cv2.warpAffine(
image,
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
flags=cv2.INTER_LINEAR)
records['image'] = image
return records
@register_keypointop
class ToHeatmapsTopDown(object):
"""to generate the gaussin heatmaps of keypoint for heatmap loss
Args:
hmsize (list): [w, h] output heatmap's size
sigma (float): the std of gaussin kernel genereted
records(dict): the dict contained the image and coords
Returns:
records (dict): contain the heatmaps used to heatmaploss
"""
def __init__(self, hmsize, sigma):
super(ToHeatmapsTopDown, self).__init__()
self.hmsize = np.array(hmsize)
self.sigma = sigma
def __call__(self, records):
joints = records['joints']
joints_vis = records['joints_vis']
num_joints = joints.shape[0]
image_size = np.array(
[records['image'].shape[1], records['image'].shape[0]])
target_weight = np.ones((num_joints, 1), dtype=np.float32)
target_weight[:, 0] = joints_vis[:, 0]
target = np.zeros(
(num_joints, self.hmsize[1], self.hmsize[0]), dtype=np.float32)
tmp_size = self.sigma * 3
feat_stride = image_size / self.hmsize
for joint_id in range(num_joints):
mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5)
mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5)
# Check that any part of the gaussian is in-bounds
ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)]
br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)]
if ul[0] >= self.hmsize[0] or ul[1] >= self.hmsize[1] or br[
0] < 0 or br[1] < 0:
# If not, just return the image as is
target_weight[joint_id] = 0
continue
# # Generate gaussian
size = 2 * tmp_size + 1
x = np.arange(0, size, 1, np.float32)
y = x[:, np.newaxis]
x0 = y0 = size // 2
# The gaussian is not normalized, we want the center value to equal 1
g = np.exp(-((x - x0)**2 + (y - y0)**2) / (2 * self.sigma**2))
# Usable gaussian range
g_x = max(0, -ul[0]), min(br[0], self.hmsize[0]) - ul[0]
g_y = max(0, -ul[1]), min(br[1], self.hmsize[1]) - ul[1]
# Image range
img_x = max(0, ul[0]), min(br[0], self.hmsize[0])
img_y = max(0, ul[1]), min(br[1], self.hmsize[1])
v = target_weight[joint_id]
if v > 0.5:
target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = g[g_y[
0]:g_y[1], g_x[0]:g_x[1]]
records['target'] = target
records['target_weight'] = target_weight
del records['joints'], records['joints_vis']
return records
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
try:
from collections.abc import Sequence
except Exception:
from collections import Sequence
from numbers import Number, Integral
import uuid
import random
import math
import numpy as np
import os
import copy
import logging
import cv2
import traceback
from PIL import Image, ImageDraw
import pickle
import threading
MUTEX = threading.Lock()
from lib.utils.workspace import serializable
from lib.utils.logger import setup_logger
logger = setup_logger(__name__)
registered_ops = []
def register_op(cls):
registered_ops.append(cls.__name__)
if not hasattr(BaseOperator, cls.__name__):
setattr(BaseOperator, cls.__name__, cls)
else:
raise KeyError("The {} class has been registered.".format(
cls.__name__))
return serializable(cls)
class BboxError(ValueError):
pass
class ImageError(ValueError):
pass
class Compose(object):
def __init__(self, transforms, num_classes=80):
self.transforms = transforms
self.transforms_cls = []
for t in self.transforms:
for k, v in t.items():
op_cls = getattr(transform, k)
f = op_cls(**v)
if hasattr(f, 'num_classes'):
f.num_classes = num_classes
self.transforms_cls.append(f)
def __call__(self, data):
for f in self.transforms_cls:
try:
data = f(data)
except Exception as e:
stack_info = traceback.format_exc()
logger.warning("fail to map sample transform [{}] "
"with error: {} and stack:\n{}".format(
f, e, str(stack_info)))
raise e
return data
class BaseOperator(object):
def __init__(self, name=None):
if name is None:
name = self.__class__.__name__
self._id = name + '_' + str(uuid.uuid4())[-6:]
def apply(self, sample, context=None):
""" Process a sample.
Args:
sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx}
context (dict): info about this sample processing
Returns:
result (dict): a processed sample
"""
return sample
def __call__(self, sample, context=None):
""" Process a sample.
Args:
sample (dict): a dict of sample, eg: {'image':xx, 'label': xxx}
context (dict): info about this sample processing
Returns:
result (dict): a processed sample
"""
if isinstance(sample, Sequence):
for i in range(len(sample)):
sample[i] = self.apply(sample[i], context)
else:
sample = self.apply(sample, context)
return sample
def __str__(self):
return str(self._id)
@register_op
class Decode(BaseOperator):
def __init__(self):
""" Transform the image data to numpy format following the rgb format
"""
super(Decode, self).__init__()
def apply(self, sample, context=None):
""" load image if 'im_file' field is not empty but 'image' is"""
if 'image' not in sample:
with open(sample['im_file'], 'rb') as f:
sample['image'] = f.read()
sample.pop('im_file')
im = sample['image']
data = np.frombuffer(im, dtype='uint8')
im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode
if 'keep_ori_im' in sample and sample['keep_ori_im']:
sample['ori_image'] = im
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
sample['image'] = im
if 'h' not in sample:
sample['h'] = im.shape[0]
elif sample['h'] != im.shape[0]:
logger.warning(
"The actual image height: {} is not equal to the "
"height: {} in annotation, and update sample['h'] by actual "
"image height.".format(im.shape[0], sample['h']))
sample['h'] = im.shape[0]
if 'w' not in sample:
sample['w'] = im.shape[1]
elif sample['w'] != im.shape[1]:
logger.warning(
"The actual image width: {} is not equal to the "
"width: {} in annotation, and update sample['w'] by actual "
"image width.".format(im.shape[1], sample['w']))
sample['w'] = im.shape[1]
sample['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
sample['scale_factor'] = np.array([1., 1.], dtype=np.float32)
return sample
def _make_dirs(dirname):
try:
from pathlib import Path
except ImportError:
from pathlib2 import Path
Path(dirname).mkdir(exist_ok=True)
@register_op
class Permute(BaseOperator):
def __init__(self):
"""
Change the channel to be (C, H, W)
"""
super(Permute, self).__init__()
def apply(self, sample, context=None):
im = sample['image']
im = im.transpose((2, 0, 1))
sample['image'] = im
return sample
@register_op
class NormalizeImage(BaseOperator):
def __init__(self,
mean=[0.485, 0.456, 0.406],
std=[1, 1, 1],
is_scale=True):
"""
Args:
mean (list): the pixel mean
std (list): the pixel variance
"""
super(NormalizeImage, self).__init__()
self.mean = mean
self.std = std
self.is_scale = is_scale
if not (isinstance(self.mean, list) and isinstance(self.std, list) and
isinstance(self.is_scale, bool)):
raise TypeError("{}: input type is invalid.".format(self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def apply(self, sample, context=None):
"""Normalize the image.
Operators:
1.(optional) Scale the image to [0,1]
2. Each pixel minus mean and is divided by std
"""
im = sample['image']
im = im.astype(np.float32, copy=False)
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
if self.is_scale:
im = im / 255.0
im -= mean
im /= std
sample['image'] = im
return sample
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import keypoint_metrics
from . import coco_utils
from . import json_results
from . import map_utils
from .keypoint_metrics import *
from .coco_utils import *
from .json_results import *
from .map_utils import *
__all__ = keypoint_metrics.__all__ + coco_utils.__all__ + json_results.__all__ + map_utils.__all__
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import numpy as np
import itertools
from .json_results import get_det_res, get_det_poly_res, get_seg_res, get_solov2_segm_res, get_keypoint_res
from .map_utils import draw_pr_curve
from lib.utils.logger import setup_logger
logger = setup_logger(__name__)
__all__ = ['get_infer_results', 'cocoapi_eval', 'json_eval_results']
def get_infer_results(outs, catid, bias=0):
"""
Get result at the stage of inference.
The output format is dictionary containing bbox or mask result.
For example, bbox result is a list and each element contains
image_id, category_id, bbox and score.
"""
if outs is None or len(outs) == 0:
raise ValueError(
'The number of valid detection result if zero. Please use reasonable model and check input data.'
)
im_id = outs['im_id']
infer_res = {}
if 'bbox' in outs:
if len(outs['bbox']) > 0 and len(outs['bbox'][0]) > 6:
infer_res['bbox'] = get_det_poly_res(
outs['bbox'], outs['bbox_num'], im_id, catid, bias=bias)
else:
infer_res['bbox'] = get_det_res(
outs['bbox'], outs['bbox_num'], im_id, catid, bias=bias)
if 'mask' in outs:
# mask post process
infer_res['mask'] = get_seg_res(outs['mask'], outs['bbox'],
outs['bbox_num'], im_id, catid)
if 'segm' in outs:
infer_res['segm'] = get_solov2_segm_res(outs, im_id, catid)
if 'keypoint' in outs:
infer_res['keypoint'] = get_keypoint_res(outs, im_id)
outs['bbox_num'] = [len(infer_res['keypoint'])]
return infer_res
def cocoapi_eval(jsonfile,
style,
coco_gt=None,
anno_file=None,
max_dets=(100, 300, 1000),
classwise=False,
sigmas=None,
use_area=True):
"""
Args:
jsonfile (str): Evaluation json file, eg: bbox.json, mask.json.
style (str): COCOeval style, can be `bbox` , `segm` , `proposal`, `keypoints` and `keypoints_crowd`.
coco_gt (str): Whether to load COCOAPI through anno_file,
eg: coco_gt = COCO(anno_file)
anno_file (str): COCO annotations file.
max_dets (tuple): COCO evaluation maxDets.
classwise (bool): Whether per-category AP and draw P-R Curve or not.
sigmas (nparray): keypoint labelling sigmas.
use_area (bool): If gt annotations (eg. CrowdPose, AIC)
do not have 'area', please set use_area=False.
"""
assert coco_gt != None or anno_file != None
if style == 'keypoints_crowd':
#please install xtcocotools==1.6
from xtcocotools.coco import COCO
from xtcocotools.cocoeval import COCOeval
else:
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
if coco_gt == None:
coco_gt = COCO(anno_file)
logger.info("Start evaluate...")
coco_dt = coco_gt.loadRes(jsonfile)
if style == 'proposal':
coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')
coco_eval.params.useCats = 0
coco_eval.params.maxDets = list(max_dets)
elif style == 'keypoints_crowd':
coco_eval = COCOeval(coco_gt, coco_dt, style, sigmas, use_area)
else:
coco_eval = COCOeval(coco_gt, coco_dt, style)
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
if classwise:
# Compute per-category AP and PR curve
try:
from terminaltables import AsciiTable
except Exception as e:
logger.error(
'terminaltables not found, plaese install terminaltables. '
'for example: `pip install terminaltables`.')
raise e
precisions = coco_eval.eval['precision']
cat_ids = coco_gt.getCatIds()
# precision: (iou, recall, cls, area range, max dets)
assert len(cat_ids) == precisions.shape[2]
results_per_category = []
for idx, catId in enumerate(cat_ids):
# area range index 0: all area ranges
# max dets index -1: typically 100 per image
nm = coco_gt.loadCats(catId)[0]
precision = precisions[:, :, idx, 0, -1]
precision = precision[precision > -1]
if precision.size:
ap = np.mean(precision)
else:
ap = float('nan')
results_per_category.append(
(str(nm["name"]), '{:0.3f}'.format(float(ap))))
pr_array = precisions[0, :, idx, 0, 2]
recall_array = np.arange(0.0, 1.01, 0.01)
draw_pr_curve(
pr_array,
recall_array,
out_dir=style + '_pr_curve',
file_name='{}_precision_recall_curve.jpg'.format(nm["name"]))
num_columns = min(6, len(results_per_category) * 2)
results_flatten = list(itertools.chain(*results_per_category))
headers = ['category', 'AP'] * (num_columns // 2)
results_2d = itertools.zip_longest(
* [results_flatten[i::num_columns] for i in range(num_columns)])
table_data = [headers]
table_data += [result for result in results_2d]
table = AsciiTable(table_data)
logger.info('Per-category of {} AP: \n{}'.format(style, table.table))
logger.info("per-category PR curve has output to {} folder.".format(
style + '_pr_curve'))
# flush coco evaluation result
sys.stdout.flush()
return coco_eval.stats
def json_eval_results(metric, json_directory, dataset):
"""
cocoapi eval with already exists proposal.json, bbox.json or mask.json
"""
assert metric == 'COCO'
anno_file = dataset.get_anno()
json_file_list = ['proposal.json', 'bbox.json', 'mask.json']
if json_directory:
assert os.path.exists(
json_directory), "The json directory:{} does not exist".format(
json_directory)
for k, v in enumerate(json_file_list):
json_file_list[k] = os.path.join(str(json_directory), v)
coco_eval_style = ['proposal', 'bbox', 'segm']
for i, v_json in enumerate(json_file_list):
if os.path.exists(v_json):
cocoapi_eval(v_json, coco_eval_style[i], anno_file=anno_file)
else:
logger.info("{} not exists!".format(v_json))
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import six
import numpy as np
__all__ = [
'get_det_res', 'get_det_poly_res', 'get_seg_res', 'get_solov2_segm_res',
'get_keypoint_res'
]
def get_det_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0):
det_res = []
k = 0
for i in range(len(bbox_nums)):
cur_image_id = int(image_id[i][0])
det_nums = bbox_nums[i]
for j in range(det_nums):
dt = bboxes[k]
k = k + 1
num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
if int(num_id) < 0:
continue
category_id = label_to_cat_id_map[int(num_id)]
w = xmax - xmin + bias
h = ymax - ymin + bias
bbox = [xmin, ymin, w, h]
dt_res = {
'image_id': cur_image_id,
'category_id': category_id,
'bbox': bbox,
'score': score
}
det_res.append(dt_res)
return det_res
def get_det_poly_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0):
det_res = []
k = 0
for i in range(len(bbox_nums)):
cur_image_id = int(image_id[i][0])
det_nums = bbox_nums[i]
for j in range(det_nums):
dt = bboxes[k]
k = k + 1
num_id, score, x1, y1, x2, y2, x3, y3, x4, y4 = dt.tolist()
if int(num_id) < 0:
continue
category_id = label_to_cat_id_map[int(num_id)]
rbox = [x1, y1, x2, y2, x3, y3, x4, y4]
dt_res = {
'image_id': cur_image_id,
'category_id': category_id,
'bbox': rbox,
'score': score
}
det_res.append(dt_res)
return det_res
def get_seg_res(masks, bboxes, mask_nums, image_id, label_to_cat_id_map):
import pycocotools.mask as mask_util
seg_res = []
k = 0
for i in range(len(mask_nums)):
cur_image_id = int(image_id[i][0])
det_nums = mask_nums[i]
for j in range(det_nums):
mask = masks[k].astype(np.uint8)
score = float(bboxes[k][1])
label = int(bboxes[k][0])
k = k + 1
if label == -1:
continue
cat_id = label_to_cat_id_map[label]
rle = mask_util.encode(
np.array(
mask[:, :, None], order="F", dtype="uint8"))[0]
if six.PY3:
if 'counts' in rle:
rle['counts'] = rle['counts'].decode("utf8")
sg_res = {
'image_id': cur_image_id,
'category_id': cat_id,
'segmentation': rle,
'score': score
}
seg_res.append(sg_res)
return seg_res
def get_solov2_segm_res(results, image_id, num_id_to_cat_id_map):
import pycocotools.mask as mask_util
segm_res = []
# for each batch
segms = results['segm'].astype(np.uint8)
clsid_labels = results['cate_label']
clsid_scores = results['cate_score']
lengths = segms.shape[0]
im_id = int(image_id[0][0])
if lengths == 0 or segms is None:
return None
# for each sample
for i in range(lengths - 1):
clsid = int(clsid_labels[i])
catid = num_id_to_cat_id_map[clsid]
score = float(clsid_scores[i])
mask = segms[i]
segm = mask_util.encode(np.array(mask[:, :, np.newaxis], order='F'))[0]
segm['counts'] = segm['counts'].decode('utf8')
coco_res = {
'image_id': im_id,
'category_id': catid,
'segmentation': segm,
'score': score
}
segm_res.append(coco_res)
return segm_res
def get_keypoint_res(results, im_id):
anns = []
preds = results['keypoint']
for idx in range(im_id.shape[0]):
image_id = im_id[idx].item()
kpts, scores = preds[idx]
for kpt, score in zip(kpts, scores):
kpt = kpt.flatten()
ann = {
'image_id': image_id,
'category_id': 1, # XXX hard code
'keypoints': kpt.tolist(),
'score': float(score)
}
x = kpt[0::3]
y = kpt[1::3]
x0, x1, y0, y1 = np.min(x).item(), np.max(x).item(), np.min(
y).item(), np.max(y).item()
ann['area'] = (x1 - x0) * (y1 - y0)
ann['bbox'] = [x0, y0, x1 - x0, y1 - y0]
anns.append(ann)
return anns
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import json
from collections import defaultdict, OrderedDict
import numpy as np
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
from scipy.io import loadmat, savemat
from lib.utils.keypoint_utils import oks_nms
from lib.utils.logger import setup_logger
logger = setup_logger(__name__)
__all__ = ['KeyPointTopDownCOCOEval']
class KeyPointTopDownCOCOEval(object):
'''
Adapted from
https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
Copyright (c) Microsoft, under the MIT License.
'''
def __init__(self,
anno_file,
num_samples,
num_joints,
output_eval,
iou_type='keypoints',
in_vis_thre=0.2,
oks_thre=0.9,
save_prediction_only=False):
super(KeyPointTopDownCOCOEval, self).__init__()
self.coco = COCO(anno_file)
self.num_samples = num_samples
self.num_joints = num_joints
self.iou_type = iou_type
self.in_vis_thre = in_vis_thre
self.oks_thre = oks_thre
self.output_eval = output_eval
self.res_file = os.path.join(output_eval, "keypoints_results.json")
self.save_prediction_only = save_prediction_only
self.reset()
def reset(self):
self.results = {
'all_preds': np.zeros(
(self.num_samples, self.num_joints, 3), dtype=np.float32),
'all_boxes': np.zeros((self.num_samples, 6)),
'image_path': []
}
self.eval_results = {}
self.idx = 0
def update(self, inputs, outputs):
kpts, _ = outputs['keypoint'][0]
num_images = inputs['image'].shape[0]
self.results['all_preds'][self.idx:self.idx + num_images, :, 0:
3] = kpts[:, :, 0:3]
self.results['all_boxes'][self.idx:self.idx + num_images, 0:
2] = inputs['center'].numpy()[:, 0:2]
self.results['all_boxes'][self.idx:self.idx + num_images, 2:
4] = inputs['scale'].numpy()[:, 0:2]
self.results['all_boxes'][self.idx:self.idx + num_images, 4] = np.prod(
inputs['scale'].numpy() * 200, 1)
self.results['all_boxes'][self.idx:self.idx + num_images,
5] = np.squeeze(inputs['score'].numpy())
self.results['image_path'].extend(inputs['im_id'].numpy())
self.idx += num_images
def _write_coco_keypoint_results(self, keypoints):
data_pack = [{
'cat_id': 1,
'cls': 'person',
'ann_type': 'keypoints',
'keypoints': keypoints
}]
results = self._coco_keypoint_results_one_category_kernel(data_pack[0])
if not os.path.exists(self.output_eval):
os.makedirs(self.output_eval)
with open(self.res_file, 'w') as f:
json.dump(results, f, sort_keys=True, indent=4)
logger.info(f'The keypoint result is saved to {self.res_file}.')
try:
json.load(open(self.res_file))
except Exception:
content = []
with open(self.res_file, 'r') as f:
for line in f:
content.append(line)
content[-1] = ']'
with open(self.res_file, 'w') as f:
for c in content:
f.write(c)
def _coco_keypoint_results_one_category_kernel(self, data_pack):
cat_id = data_pack['cat_id']
keypoints = data_pack['keypoints']
cat_results = []
for img_kpts in keypoints:
if len(img_kpts) == 0:
continue
_key_points = np.array(
[img_kpts[k]['keypoints'] for k in range(len(img_kpts))])
_key_points = _key_points.reshape(_key_points.shape[0], -1)
result = [{
'image_id': img_kpts[k]['image'],
'category_id': cat_id,
'keypoints': _key_points[k].tolist(),
'score': img_kpts[k]['score'],
'center': list(img_kpts[k]['center']),
'scale': list(img_kpts[k]['scale'])
} for k in range(len(img_kpts))]
cat_results.extend(result)
return cat_results
def get_final_results(self, preds, all_boxes, img_path):
_kpts = []
for idx, kpt in enumerate(preds):
_kpts.append({
'keypoints': kpt,
'center': all_boxes[idx][0:2],
'scale': all_boxes[idx][2:4],
'area': all_boxes[idx][4],
'score': all_boxes[idx][5],
'image': int(img_path[idx])
})
# image x person x (keypoints)
kpts = defaultdict(list)
for kpt in _kpts:
kpts[kpt['image']].append(kpt)
# rescoring and oks nms
num_joints = preds.shape[1]
in_vis_thre = self.in_vis_thre
oks_thre = self.oks_thre
oks_nmsed_kpts = []
for img in kpts.keys():
img_kpts = kpts[img]
for n_p in img_kpts:
box_score = n_p['score']
kpt_score = 0
valid_num = 0
for n_jt in range(0, num_joints):
t_s = n_p['keypoints'][n_jt][2]
if t_s > in_vis_thre:
kpt_score = kpt_score + t_s
valid_num = valid_num + 1
if valid_num != 0:
kpt_score = kpt_score / valid_num
# rescoring
n_p['score'] = kpt_score * box_score
keep = oks_nms([img_kpts[i] for i in range(len(img_kpts))],
oks_thre)
if len(keep) == 0:
oks_nmsed_kpts.append(img_kpts)
else:
oks_nmsed_kpts.append([img_kpts[_keep] for _keep in keep])
self._write_coco_keypoint_results(oks_nmsed_kpts)
def accumulate(self):
self.get_final_results(self.results['all_preds'],
self.results['all_boxes'],
self.results['image_path'])
if self.save_prediction_only:
logger.info(f'The keypoint result is saved to {self.res_file} '
'and do not evaluate the mAP.')
return
coco_dt = self.coco.loadRes(self.res_file)
coco_eval = COCOeval(self.coco, coco_dt, 'keypoints')
coco_eval.params.useSegm = None
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
keypoint_stats = []
for ind in range(len(coco_eval.stats)):
keypoint_stats.append((coco_eval.stats[ind]))
self.eval_results['keypoint'] = keypoint_stats
def log(self):
if self.save_prediction_only:
return
stats_names = [
'AP', 'Ap .5', 'AP .75', 'AP (M)', 'AP (L)', 'AR', 'AR .5',
'AR .75', 'AR (M)', 'AR (L)'
]
num_values = len(stats_names)
print(' '.join(['| {}'.format(name) for name in stats_names]) + ' |')
print('|---' * (num_values + 1) + '|')
print(' '.join([
'| {:.3f}'.format(value) for value in self.eval_results['keypoint']
]) + ' |')
def get_results(self):
return self.eval_results
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import sys
import math
import numpy as np
import itertools
import paddle
from lib.utils.logger import setup_logger
logger = setup_logger(__name__)
__all__ = [
'draw_pr_curve', 'bbox_area', 'jaccard_overlap', 'prune_zero_padding',
'DetectionMAP', 'ap_per_class', 'compute_ap', 'get_best_begin_point_single'
]
def cal_line_length(point1, point2):
import math
return math.sqrt(
math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1],
2))
def get_best_begin_point_single(coordinate):
x1, y1, x2, y2, x3, y3, x4, y4 = coordinate
xmin = min(x1, x2, x3, x4)
ymin = min(y1, y2, y3, y4)
xmax = max(x1, x2, x3, x4)
ymax = max(y1, y2, y3, y4)
combinate = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
[[x4, y4], [x1, y1], [x2, y2], [x3, y3]],
[[x3, y3], [x4, y4], [x1, y1], [x2, y2]],
[[x2, y2], [x3, y3], [x4, y4], [x1, y1]]]
dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]]
force = 100000000.0
force_flag = 0
for i in range(4):
temp_force = cal_line_length(combinate[i][0], dst_coordinate[0]) \
+ cal_line_length(combinate[i][1], dst_coordinate[1]) \
+ cal_line_length(combinate[i][2], dst_coordinate[2]) \
+ cal_line_length(combinate[i][3], dst_coordinate[3])
if temp_force < force:
force = temp_force
force_flag = i
if force_flag != 0:
pass
return np.array(combinate[force_flag]).reshape(8)
def poly2rbox(polys):
"""
poly:[x0,y0,x1,y1,x2,y2,x3,y3]
to
rotated_boxes:[x_ctr,y_ctr,w,h,angle]
"""
rotated_boxes = []
for poly in polys:
poly = np.array(poly[:8], dtype=np.float32)
pt1 = (poly[0], poly[1])
pt2 = (poly[2], poly[3])
pt3 = (poly[4], poly[5])
pt4 = (poly[6], poly[7])
edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[
1]) * (pt1[1] - pt2[1]))
edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[
1]) * (pt2[1] - pt3[1]))
width = max(edge1, edge2)
height = min(edge1, edge2)
rbox_angle = 0
if edge1 > edge2:
rbox_angle = np.arctan2(
float(pt2[1] - pt1[1]), float(pt2[0] - pt1[0]))
elif edge2 >= edge1:
rbox_angle = np.arctan2(
float(pt4[1] - pt1[1]), float(pt4[0] - pt1[0]))
def norm_angle(angle, range=[-np.pi / 4, np.pi]):
return (angle - range[0]) % range[1] + range[0]
rbox_angle = norm_angle(rbox_angle)
x_ctr = float(pt1[0] + pt3[0]) / 2
y_ctr = float(pt1[1] + pt3[1]) / 2
rotated_box = np.array([x_ctr, y_ctr, width, height, rbox_angle])
rotated_boxes.append(rotated_box)
ret_rotated_boxes = np.array(rotated_boxes)
assert ret_rotated_boxes.shape[1] == 5
return ret_rotated_boxes
def rbox2poly_np(rrects):
"""
rrect:[x_ctr,y_ctr,w,h,angle]
to
poly:[x0,y0,x1,y1,x2,y2,x3,y3]
"""
polys = []
for i in range(rrects.shape[0]):
rrect = rrects[i]
# x_ctr, y_ctr, width, height, angle = rrect[:5]
x_ctr = rrect[0]
y_ctr = rrect[1]
width = rrect[2]
height = rrect[3]
angle = rrect[4]
tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2
rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]])
R = np.array([[np.cos(angle), -np.sin(angle)],
[np.sin(angle), np.cos(angle)]])
poly = R.dot(rect)
x0, x1, x2, x3 = poly[0, :4] + x_ctr
y0, y1, y2, y3 = poly[1, :4] + y_ctr
poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32)
poly = get_best_begin_point_single(poly)
polys.append(poly)
polys = np.array(polys)
return polys
def draw_pr_curve(precision,
recall,
iou=0.5,
out_dir='pr_curve',
file_name='precision_recall_curve.jpg'):
if not os.path.exists(out_dir):
os.makedirs(out_dir)
output_path = os.path.join(out_dir, file_name)
try:
import matplotlib.pyplot as plt
except Exception as e:
logger.error('Matplotlib not found, plaese install matplotlib.'
'for example: `pip install matplotlib`.')
raise e
plt.cla()
plt.figure('P-R Curve')
plt.title('Precision/Recall Curve(IoU={})'.format(iou))
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.grid(True)
plt.plot(recall, precision)
plt.savefig(output_path)
def bbox_area(bbox, is_bbox_normalized):
"""
Calculate area of a bounding box
"""
norm = 1. - float(is_bbox_normalized)
width = bbox[2] - bbox[0] + norm
height = bbox[3] - bbox[1] + norm
return width * height
def jaccard_overlap(pred, gt, is_bbox_normalized=False):
"""
Calculate jaccard overlap ratio between two bounding box
"""
if pred[0] >= gt[2] or pred[2] <= gt[0] or \
pred[1] >= gt[3] or pred[3] <= gt[1]:
return 0.
inter_xmin = max(pred[0], gt[0])
inter_ymin = max(pred[1], gt[1])
inter_xmax = min(pred[2], gt[2])
inter_ymax = min(pred[3], gt[3])
inter_size = bbox_area([inter_xmin, inter_ymin, inter_xmax, inter_ymax],
is_bbox_normalized)
pred_size = bbox_area(pred, is_bbox_normalized)
gt_size = bbox_area(gt, is_bbox_normalized)
overlap = float(inter_size) / (pred_size + gt_size - inter_size)
return overlap
def calc_rbox_iou(pred, gt_rbox):
"""
calc iou between rotated bbox
"""
# calc iou of bounding box for speedup
pred = np.array(pred, np.float32).reshape(-1, 8)
pred = pred.reshape(-1, 2)
gt_poly = rbox2poly_np(np.array(gt_rbox).reshape(-1, 5))[0]
gt_poly = gt_poly.reshape(-1, 2)
pred_rect = [
np.min(pred[:, 0]), np.min(pred[:, 1]), np.max(pred[:, 0]),
np.max(pred[:, 1])
]
gt_rect = [
np.min(gt_poly[:, 0]), np.min(gt_poly[:, 1]), np.max(gt_poly[:, 0]),
np.max(gt_poly[:, 1])
]
iou = jaccard_overlap(pred_rect, gt_rect, False)
if iou <= 0:
return iou
# calc rbox iou
pred = pred.reshape(-1, 8)
pred = np.array(pred, np.float32).reshape(-1, 8)
pred_rbox = poly2rbox(pred)
pred_rbox = pred_rbox.reshape(-1, 5)
pred_rbox = pred_rbox.reshape(-1, 5)
try:
from rbox_iou_ops import rbox_iou
except Exception as e:
print("import custom_ops error, try install rbox_iou_ops " \
"following ppdet/ext_op/README.md", e)
sys.stdout.flush()
sys.exit(-1)
gt_rbox = np.array(gt_rbox, np.float32).reshape(-1, 5)
pd_gt_rbox = paddle.to_tensor(gt_rbox, dtype='float32')
pd_pred_rbox = paddle.to_tensor(pred_rbox, dtype='float32')
iou = rbox_iou(pd_gt_rbox, pd_pred_rbox)
iou = iou.numpy()
return iou[0][0]
def prune_zero_padding(gt_box, gt_label, difficult=None):
valid_cnt = 0
for i in range(len(gt_box)):
if gt_box[i, 0] == 0 and gt_box[i, 1] == 0 and \
gt_box[i, 2] == 0 and gt_box[i, 3] == 0:
break
valid_cnt += 1
return (gt_box[:valid_cnt], gt_label[:valid_cnt], difficult[:valid_cnt]
if difficult is not None else None)
class DetectionMAP(object):
"""
Calculate detection mean average precision.
Currently support two types: 11point and integral
Args:
class_num (int): The class number.
overlap_thresh (float): The threshold of overlap
ratio between prediction bounding box and
ground truth bounding box for deciding
true/false positive. Default 0.5.
map_type (str): Calculation method of mean average
precision, currently support '11point' and
'integral'. Default '11point'.
is_bbox_normalized (bool): Whether bounding boxes
is normalized to range[0, 1]. Default False.
evaluate_difficult (bool): Whether to evaluate
difficult bounding boxes. Default False.
catid2name (dict): Mapping between category id and category name.
classwise (bool): Whether per-category AP and draw
P-R Curve or not.
"""
def __init__(self,
class_num,
overlap_thresh=0.5,
map_type='11point',
is_bbox_normalized=False,
evaluate_difficult=False,
catid2name=None,
classwise=False):
self.class_num = class_num
self.overlap_thresh = overlap_thresh
assert map_type in ['11point', 'integral'], \
"map_type currently only support '11point' "\
"and 'integral'"
self.map_type = map_type
self.is_bbox_normalized = is_bbox_normalized
self.evaluate_difficult = evaluate_difficult
self.classwise = classwise
self.classes = []
for cname in catid2name.values():
self.classes.append(cname)
self.reset()
def update(self, bbox, score, label, gt_box, gt_label, difficult=None):
"""
Update metric statics from given prediction and ground
truth infomations.
"""
if difficult is None:
difficult = np.zeros_like(gt_label)
# record class gt count
for gtl, diff in zip(gt_label, difficult):
if self.evaluate_difficult or int(diff) == 0:
self.class_gt_counts[int(np.array(gtl))] += 1
# record class score positive
visited = [False] * len(gt_label)
for b, s, l in zip(bbox, score, label):
pred = b.tolist() if isinstance(b, np.ndarray) else b
max_idx = -1
max_overlap = -1.0
for i, gl in enumerate(gt_label):
if int(gl) == int(l):
if len(gt_box[i]) == 5:
overlap = calc_rbox_iou(pred, gt_box[i])
else:
overlap = jaccard_overlap(pred, gt_box[i],
self.is_bbox_normalized)
if overlap > max_overlap:
max_overlap = overlap
max_idx = i
if max_overlap > self.overlap_thresh:
if self.evaluate_difficult or \
int(np.array(difficult[max_idx])) == 0:
if not visited[max_idx]:
self.class_score_poss[int(l)].append([s, 1.0])
visited[max_idx] = True
else:
self.class_score_poss[int(l)].append([s, 0.0])
else:
self.class_score_poss[int(l)].append([s, 0.0])
def reset(self):
"""
Reset metric statics
"""
self.class_score_poss = [[] for _ in range(self.class_num)]
self.class_gt_counts = [0] * self.class_num
self.mAP = 0.0
def accumulate(self):
"""
Accumulate metric results and calculate mAP
"""
mAP = 0.
valid_cnt = 0
eval_results = []
for score_pos, count in zip(self.class_score_poss,
self.class_gt_counts):
if count == 0: continue
if len(score_pos) == 0:
valid_cnt += 1
continue
accum_tp_list, accum_fp_list = \
self._get_tp_fp_accum(score_pos)
precision = []
recall = []
for ac_tp, ac_fp in zip(accum_tp_list, accum_fp_list):
precision.append(float(ac_tp) / (ac_tp + ac_fp))
recall.append(float(ac_tp) / count)
one_class_ap = 0.0
if self.map_type == '11point':
max_precisions = [0.] * 11
start_idx = len(precision) - 1
for j in range(10, -1, -1):
for i in range(start_idx, -1, -1):
if recall[i] < float(j) / 10.:
start_idx = i
if j > 0:
max_precisions[j - 1] = max_precisions[j]
break
else:
if max_precisions[j] < precision[i]:
max_precisions[j] = precision[i]
one_class_ap = sum(max_precisions) / 11.
mAP += one_class_ap
valid_cnt += 1
elif self.map_type == 'integral':
import math
prev_recall = 0.
for i in range(len(precision)):
recall_gap = math.fabs(recall[i] - prev_recall)
if recall_gap > 1e-6:
one_class_ap += precision[i] * recall_gap
prev_recall = recall[i]
mAP += one_class_ap
valid_cnt += 1
else:
logger.error("Unspported mAP type {}".format(self.map_type))
sys.exit(1)
eval_results.append({
'class': self.classes[valid_cnt - 1],
'ap': one_class_ap,
'precision': precision,
'recall': recall,
})
self.eval_results = eval_results
self.mAP = mAP / float(valid_cnt) if valid_cnt > 0 else mAP
def get_map(self):
"""
Get mAP result
"""
if self.mAP is None:
logger.error("mAP is not calculated.")
if self.classwise:
# Compute per-category AP and PR curve
try:
from terminaltables import AsciiTable
except Exception as e:
logger.error(
'terminaltables not found, plaese install terminaltables. '
'for example: `pip install terminaltables`.')
raise e
results_per_category = []
for eval_result in self.eval_results:
results_per_category.append(
(str(eval_result['class']),
'{:0.3f}'.format(float(eval_result['ap']))))
draw_pr_curve(
eval_result['precision'],
eval_result['recall'],
out_dir='voc_pr_curve',
file_name='{}_precision_recall_curve.jpg'.format(
eval_result['class']))
num_columns = min(6, len(results_per_category) * 2)
results_flatten = list(itertools.chain(*results_per_category))
headers = ['category', 'AP'] * (num_columns // 2)
results_2d = itertools.zip_longest(* [
results_flatten[i::num_columns] for i in range(num_columns)
])
table_data = [headers]
table_data += [result for result in results_2d]
table = AsciiTable(table_data)
logger.info('Per-category of VOC AP: \n{}'.format(table.table))
logger.info(
"per-category PR curve has output to voc_pr_curve folder.")
return self.mAP
def _get_tp_fp_accum(self, score_pos_list):
"""
Calculate accumulating true/false positive results from
[score, pos] records
"""
sorted_list = sorted(score_pos_list, key=lambda s: s[0], reverse=True)
accum_tp = 0
accum_fp = 0
accum_tp_list = []
accum_fp_list = []
for (score, pos) in sorted_list:
accum_tp += int(pos)
accum_tp_list.append(accum_tp)
accum_fp += 1 - int(pos)
accum_fp_list.append(accum_fp)
return accum_tp_list, accum_fp_list
def ap_per_class(tp, conf, pred_cls, target_cls):
"""
Computes the average precision, given the recall and precision curves.
Method originally from https://github.com/rafaelpadilla/Object-Detection-Metrics.
Args:
tp (list): True positives.
conf (list): Objectness value from 0-1.
pred_cls (list): Predicted object classes.
target_cls (list): Target object classes.
"""
tp, conf, pred_cls, target_cls = np.array(tp), np.array(conf), np.array(
pred_cls), np.array(target_cls)
# Sort by objectness
i = np.argsort(-conf)
tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
# Find unique classes
unique_classes = np.unique(np.concatenate((pred_cls, target_cls), 0))
# Create Precision-Recall curve and compute AP for each class
ap, p, r = [], [], []
for c in unique_classes:
i = pred_cls == c
n_gt = sum(target_cls == c) # Number of ground truth objects
n_p = sum(i) # Number of predicted objects
if (n_p == 0) and (n_gt == 0):
continue
elif (n_p == 0) or (n_gt == 0):
ap.append(0)
r.append(0)
p.append(0)
else:
# Accumulate FPs and TPs
fpc = np.cumsum(1 - tp[i])
tpc = np.cumsum(tp[i])
# Recall
recall_curve = tpc / (n_gt + 1e-16)
r.append(tpc[-1] / (n_gt + 1e-16))
# Precision
precision_curve = tpc / (tpc + fpc)
p.append(tpc[-1] / (tpc[-1] + fpc[-1]))
# AP from recall-precision curve
ap.append(compute_ap(recall_curve, precision_curve))
return np.array(ap), unique_classes.astype('int32'), np.array(r), np.array(
p)
def compute_ap(recall, precision):
"""
Computes the average precision, given the recall and precision curves.
Code originally from https://github.com/rbgirshick/py-faster-rcnn.
Args:
recall (list): The recall curve.
precision (list): The precision curve.
Returns:
The average precision as computed in py-faster-rcnn.
"""
# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.], recall, [1.]))
mpre = np.concatenate(([0.], precision, [0.]))
# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import hrnet
from . import lite_hrnet
from . import keypoint_hrnet
from . import loss
from .hrnet import *
from .keypoint_hrnet import *
from .loss import *
from .lite_hrnet import *
__all__ = hrnet.__all__ + keypoint_hrnet.__all__ \
+ loss.__all__
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import AdaptiveAvgPool2D, Linear
from paddle.regularizer import L2Decay
from paddle import ParamAttr
from paddle.nn.initializer import Normal, Uniform
from collections import namedtuple
from numbers import Integral
import math
from lib.utils.workspace import register
__all__ = ['HRNet']
class ConvNormLayer(nn.Layer):
def __init__(self,
ch_in,
ch_out,
filter_size,
stride=1,
norm_type='bn',
norm_groups=32,
use_dcn=False,
norm_decay=0.,
freeze_norm=False,
act=None,
name=None):
super(ConvNormLayer, self).__init__()
assert norm_type in ['bn', 'sync_bn', 'gn']
self.act = act
self.conv = nn.Conv2D(
in_channels=ch_in,
out_channels=ch_out,
kernel_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=1,
weight_attr=ParamAttr(initializer=Normal(
mean=0., std=0.01)),
bias_attr=False)
norm_lr = 0. if freeze_norm else 1.
param_attr = ParamAttr(
learning_rate=norm_lr, regularizer=L2Decay(norm_decay))
bias_attr = ParamAttr(
learning_rate=norm_lr, regularizer=L2Decay(norm_decay))
global_stats = True if freeze_norm else False
if norm_type in ['bn', 'sync_bn']:
self.norm = nn.BatchNorm(
ch_out,
param_attr=param_attr,
bias_attr=bias_attr,
use_global_stats=global_stats)
elif norm_type == 'gn':
self.norm = nn.GroupNorm(
num_groups=norm_groups,
num_channels=ch_out,
weight_attr=param_attr,
bias_attr=bias_attr)
norm_params = self.norm.parameters()
if freeze_norm:
for param in norm_params:
param.stop_gradient = True
def forward(self, inputs):
out = self.conv(inputs)
out = self.norm(out)
if self.act == 'relu':
out = F.relu(out)
return out
class ShapeSpec(
namedtuple("_ShapeSpec", ["channels", "height", "width", "stride"])):
def __new__(cls, channels=None, height=None, width=None, stride=None):
return super(ShapeSpec, cls).__new__(cls, channels, height, width,
stride)
class Layer1(nn.Layer):
def __init__(self,
num_channels,
has_se=False,
norm_decay=0.,
freeze_norm=True,
name=None):
super(Layer1, self).__init__()
self.bottleneck_block_list = []
for i in range(4):
bottleneck_block = self.add_sublayer(
"block_{}_{}".format(name, i + 1),
BottleneckBlock(
num_channels=num_channels if i == 0 else 256,
num_filters=64,
has_se=has_se,
stride=1,
downsample=True if i == 0 else False,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name=name + '_' + str(i + 1)))
self.bottleneck_block_list.append(bottleneck_block)
def forward(self, input):
conv = input
for block_func in self.bottleneck_block_list:
conv = block_func(conv)
return conv
class TransitionLayer(nn.Layer):
def __init__(self,
in_channels,
out_channels,
norm_decay=0.,
freeze_norm=True,
name=None):
super(TransitionLayer, self).__init__()
num_in = len(in_channels)
num_out = len(out_channels)
out = []
self.conv_bn_func_list = []
for i in range(num_out):
residual = None
if i < num_in:
if in_channels[i] != out_channels[i]:
residual = self.add_sublayer(
"transition_{}_layer_{}".format(name, i + 1),
ConvNormLayer(
ch_in=in_channels[i],
ch_out=out_channels[i],
filter_size=3,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act='relu',
name=name + '_layer_' + str(i + 1)))
else:
residual = self.add_sublayer(
"transition_{}_layer_{}".format(name, i + 1),
ConvNormLayer(
ch_in=in_channels[-1],
ch_out=out_channels[i],
filter_size=3,
stride=2,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act='relu',
name=name + '_layer_' + str(i + 1)))
self.conv_bn_func_list.append(residual)
def forward(self, input):
outs = []
for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
if conv_bn_func is None:
outs.append(input[idx])
else:
if idx < len(input):
outs.append(conv_bn_func(input[idx]))
else:
outs.append(conv_bn_func(input[-1]))
return outs
class Branches(nn.Layer):
def __init__(self,
block_num,
in_channels,
out_channels,
has_se=False,
norm_decay=0.,
freeze_norm=True,
name=None):
super(Branches, self).__init__()
self.basic_block_list = []
for i in range(len(out_channels)):
self.basic_block_list.append([])
for j in range(block_num):
in_ch = in_channels[i] if j == 0 else out_channels[i]
basic_block_func = self.add_sublayer(
"bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
BasicBlock(
num_channels=in_ch,
num_filters=out_channels[i],
has_se=has_se,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name=name + '_branch_layer_' + str(i + 1) + '_' +
str(j + 1)))
self.basic_block_list[i].append(basic_block_func)
def forward(self, inputs):
outs = []
for idx, input in enumerate(inputs):
conv = input
basic_block_list = self.basic_block_list[idx]
for basic_block_func in basic_block_list:
conv = basic_block_func(conv)
outs.append(conv)
return outs
class BottleneckBlock(nn.Layer):
def __init__(self,
num_channels,
num_filters,
has_se,
stride=1,
downsample=False,
norm_decay=0.,
freeze_norm=True,
name=None):
super(BottleneckBlock, self).__init__()
self.has_se = has_se
self.downsample = downsample
self.conv1 = ConvNormLayer(
ch_in=num_channels,
ch_out=num_filters,
filter_size=1,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act="relu",
name=name + "_conv1")
self.conv2 = ConvNormLayer(
ch_in=num_filters,
ch_out=num_filters,
filter_size=3,
stride=stride,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act="relu",
name=name + "_conv2")
self.conv3 = ConvNormLayer(
ch_in=num_filters,
ch_out=num_filters * 4,
filter_size=1,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act=None,
name=name + "_conv3")
if self.downsample:
self.conv_down = ConvNormLayer(
ch_in=num_channels,
ch_out=num_filters * 4,
filter_size=1,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act=None,
name=name + "_downsample")
if self.has_se:
self.se = SELayer(
num_channels=num_filters * 4,
num_filters=num_filters * 4,
reduction_ratio=16,
name='fc' + name)
def forward(self, input):
residual = input
conv1 = self.conv1(input)
conv2 = self.conv2(conv1)
conv3 = self.conv3(conv2)
if self.downsample:
residual = self.conv_down(input)
if self.has_se:
conv3 = self.se(conv3)
y = paddle.add(x=residual, y=conv3)
y = F.relu(y)
return y
class BasicBlock(nn.Layer):
def __init__(self,
num_channels,
num_filters,
stride=1,
has_se=False,
downsample=False,
norm_decay=0.,
freeze_norm=True,
name=None):
super(BasicBlock, self).__init__()
self.has_se = has_se
self.downsample = downsample
self.conv1 = ConvNormLayer(
ch_in=num_channels,
ch_out=num_filters,
filter_size=3,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
stride=stride,
act="relu",
name=name + "_conv1")
self.conv2 = ConvNormLayer(
ch_in=num_filters,
ch_out=num_filters,
filter_size=3,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
stride=1,
act=None,
name=name + "_conv2")
if self.downsample:
self.conv_down = ConvNormLayer(
ch_in=num_channels,
ch_out=num_filters * 4,
filter_size=1,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act=None,
name=name + "_downsample")
if self.has_se:
self.se = SELayer(
num_channels=num_filters,
num_filters=num_filters,
reduction_ratio=16,
name='fc' + name)
def forward(self, input):
residual = input
conv1 = self.conv1(input)
conv2 = self.conv2(conv1)
if self.downsample:
residual = self.conv_down(input)
if self.has_se:
conv2 = self.se(conv2)
y = paddle.add(x=residual, y=conv2)
y = F.relu(y)
return y
class SELayer(nn.Layer):
def __init__(self, num_channels, num_filters, reduction_ratio, name=None):
super(SELayer, self).__init__()
self.pool2d_gap = AdaptiveAvgPool2D(1)
self._num_channels = num_channels
med_ch = int(num_channels / reduction_ratio)
stdv = 1.0 / math.sqrt(num_channels * 1.0)
self.squeeze = Linear(
num_channels,
med_ch,
weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv)))
stdv = 1.0 / math.sqrt(med_ch * 1.0)
self.excitation = Linear(
med_ch,
num_filters,
weight_attr=ParamAttr(initializer=Uniform(-stdv, stdv)))
def forward(self, input):
pool = self.pool2d_gap(input)
pool = paddle.squeeze(pool, axis=[2, 3])
squeeze = self.squeeze(pool)
squeeze = F.relu(squeeze)
excitation = self.excitation(squeeze)
excitation = F.sigmoid(excitation)
excitation = paddle.unsqueeze(excitation, axis=[2, 3])
out = input * excitation
return out
class Stage(nn.Layer):
def __init__(self,
num_channels,
num_modules,
num_filters,
has_se=False,
norm_decay=0.,
freeze_norm=True,
multi_scale_output=True,
name=None):
super(Stage, self).__init__()
self._num_modules = num_modules
self.stage_func_list = []
for i in range(num_modules):
if i == num_modules - 1 and not multi_scale_output:
stage_func = self.add_sublayer(
"stage_{}_{}".format(name, i + 1),
HighResolutionModule(
num_channels=num_channels,
num_filters=num_filters,
has_se=has_se,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
multi_scale_output=False,
name=name + '_' + str(i + 1)))
else:
stage_func = self.add_sublayer(
"stage_{}_{}".format(name, i + 1),
HighResolutionModule(
num_channels=num_channels,
num_filters=num_filters,
has_se=has_se,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name=name + '_' + str(i + 1)))
self.stage_func_list.append(stage_func)
def forward(self, input):
out = input
for idx in range(self._num_modules):
out = self.stage_func_list[idx](out)
return out
class HighResolutionModule(nn.Layer):
def __init__(self,
num_channels,
num_filters,
has_se=False,
multi_scale_output=True,
norm_decay=0.,
freeze_norm=True,
name=None):
super(HighResolutionModule, self).__init__()
self.branches_func = Branches(
block_num=4,
in_channels=num_channels,
out_channels=num_filters,
has_se=has_se,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name=name)
self.fuse_func = FuseLayers(
in_channels=num_filters,
out_channels=num_filters,
multi_scale_output=multi_scale_output,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name=name)
def forward(self, input):
out = self.branches_func(input)
out = self.fuse_func(out)
return out
class FuseLayers(nn.Layer):
def __init__(self,
in_channels,
out_channels,
multi_scale_output=True,
norm_decay=0.,
freeze_norm=True,
name=None):
super(FuseLayers, self).__init__()
self._actual_ch = len(in_channels) if multi_scale_output else 1
self._in_channels = in_channels
self.residual_func_list = []
for i in range(self._actual_ch):
for j in range(len(in_channels)):
residual_func = None
if j > i:
residual_func = self.add_sublayer(
"residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
ConvNormLayer(
ch_in=in_channels[j],
ch_out=out_channels[i],
filter_size=1,
stride=1,
act=None,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name=name + '_layer_' + str(i + 1) + '_' +
str(j + 1)))
self.residual_func_list.append(residual_func)
elif j < i:
pre_num_filters = in_channels[j]
for k in range(i - j):
if k == i - j - 1:
residual_func = self.add_sublayer(
"residual_{}_layer_{}_{}_{}".format(
name, i + 1, j + 1, k + 1),
ConvNormLayer(
ch_in=pre_num_filters,
ch_out=out_channels[i],
filter_size=3,
stride=2,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act=None,
name=name + '_layer_' + str(i + 1) + '_' +
str(j + 1) + '_' + str(k + 1)))
pre_num_filters = out_channels[i]
else:
residual_func = self.add_sublayer(
"residual_{}_layer_{}_{}_{}".format(
name, i + 1, j + 1, k + 1),
ConvNormLayer(
ch_in=pre_num_filters,
ch_out=out_channels[j],
filter_size=3,
stride=2,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act="relu",
name=name + '_layer_' + str(i + 1) + '_' +
str(j + 1) + '_' + str(k + 1)))
pre_num_filters = out_channels[j]
self.residual_func_list.append(residual_func)
def forward(self, input):
outs = []
residual_func_idx = 0
for i in range(self._actual_ch):
residual = input[i]
for j in range(len(self._in_channels)):
if j > i:
y = self.residual_func_list[residual_func_idx](input[j])
residual_func_idx += 1
y = F.interpolate(y, scale_factor=2**(j - i))
residual = paddle.add(x=residual, y=y)
elif j < i:
y = input[j]
for k in range(i - j):
y = self.residual_func_list[residual_func_idx](y)
residual_func_idx += 1
residual = paddle.add(x=residual, y=y)
residual = F.relu(residual)
outs.append(residual)
return outs
@register
class HRNet(nn.Layer):
"""
HRNet, see https://arxiv.org/abs/1908.07919
Args:
width (int): the width of HRNet
has_se (bool): whether to add SE block for each stage
freeze_at (int): the stage to freeze
freeze_norm (bool): whether to freeze norm in HRNet
norm_decay (float): weight decay for normalization layer weights
return_idx (List): the stage to return
upsample (bool): whether to upsample and concat the backbone feats
"""
def __init__(self,
width=18,
has_se=False,
freeze_at=0,
freeze_norm=True,
norm_decay=0.,
return_idx=[0, 1, 2, 3],
upsample=False):
super(HRNet, self).__init__()
self.width = width
self.has_se = has_se
if isinstance(return_idx, Integral):
return_idx = [return_idx]
assert len(return_idx) > 0, "need one or more return index"
self.freeze_at = freeze_at
self.return_idx = return_idx
self.upsample = upsample
self.channels = {
18: [[18, 36], [18, 36, 72], [18, 36, 72, 144]],
30: [[30, 60], [30, 60, 120], [30, 60, 120, 240]],
32: [[32, 64], [32, 64, 128], [32, 64, 128, 256]],
40: [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
44: [[44, 88], [44, 88, 176], [44, 88, 176, 352]],
48: [[48, 96], [48, 96, 192], [48, 96, 192, 384]],
60: [[60, 120], [60, 120, 240], [60, 120, 240, 480]],
64: [[64, 128], [64, 128, 256], [64, 128, 256, 512]]
}
channels_2, channels_3, channels_4 = self.channels[width]
num_modules_2, num_modules_3, num_modules_4 = 1, 4, 3
self._out_channels = [sum(channels_4)] if self.upsample else channels_4
self._out_strides = [4] if self.upsample else [4, 8, 16, 32]
self.conv_layer1_1 = ConvNormLayer(
ch_in=3,
ch_out=64,
filter_size=3,
stride=2,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act='relu',
name="layer1_1")
self.conv_layer1_2 = ConvNormLayer(
ch_in=64,
ch_out=64,
filter_size=3,
stride=2,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
act='relu',
name="layer1_2")
self.la1 = Layer1(
num_channels=64,
has_se=has_se,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name="layer2")
self.tr1 = TransitionLayer(
in_channels=[256],
out_channels=channels_2,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name="tr1")
self.st2 = Stage(
num_channels=channels_2,
num_modules=num_modules_2,
num_filters=channels_2,
has_se=self.has_se,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name="st2")
self.tr2 = TransitionLayer(
in_channels=channels_2,
out_channels=channels_3,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name="tr2")
self.st3 = Stage(
num_channels=channels_3,
num_modules=num_modules_3,
num_filters=channels_3,
has_se=self.has_se,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name="st3")
self.tr3 = TransitionLayer(
in_channels=channels_3,
out_channels=channels_4,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
name="tr3")
self.st4 = Stage(
num_channels=channels_4,
num_modules=num_modules_4,
num_filters=channels_4,
has_se=self.has_se,
norm_decay=norm_decay,
freeze_norm=freeze_norm,
multi_scale_output=len(return_idx) > 1,
name="st4")
def forward(self, inputs):
x = inputs['image']
conv1 = self.conv_layer1_1(x)
conv2 = self.conv_layer1_2(conv1)
la1 = self.la1(conv2)
tr1 = self.tr1([la1])
st2 = self.st2(tr1)
tr2 = self.tr2(st2)
st3 = self.st3(tr2)
tr3 = self.tr3(st3)
st4 = self.st4(tr3)
if self.upsample:
# Upsampling
x0_h, x0_w = st4[0].shape[2:4]
x1 = F.upsample(st4[1], size=(x0_h, x0_w), mode='bilinear')
x2 = F.upsample(st4[2], size=(x0_h, x0_w), mode='bilinear')
x3 = F.upsample(st4[3], size=(x0_h, x0_w), mode='bilinear')
x = paddle.concat([st4[0], x1, x2, x3], 1)
return x
res = []
for i, layer in enumerate(st4):
if i == self.freeze_at:
layer.stop_gradient = True
if i in self.return_idx:
res.append(layer)
return res
def out_shape(self):
if self.upsample:
self.return_idx = [0]
return [
ShapeSpec(
channels=self._out_channels[i], stride=self._out_strides[i])
for i in self.return_idx
]
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.nn as nn
from paddle.nn.initializer import Normal, Constant
import numpy as np
import math
import cv2
from ..utils.keypoint_utils import transform_preds
from ..utils.workspace import register, create
__all__ = ['TopDownHRNet']
class BaseArch(nn.Layer):
def __init__(self):
super(BaseArch, self).__init__()
self.inputs = {}
self.fuse_norm = False
def load_meanstd(self, cfg_transform):
self.scale = 1.
self.mean = paddle.to_tensor([0.485, 0.456, 0.406]).reshape(
(1, 3, 1, 1))
self.std = paddle.to_tensor([0.229, 0.224, 0.225]).reshape(
(1, 3, 1, 1))
for item in cfg_transform:
if 'NormalizeImage' in item:
self.mean = paddle.to_tensor(item['NormalizeImage'][
'mean']).reshape((1, 3, 1, 1))
self.std = paddle.to_tensor(item['NormalizeImage'][
'std']).reshape((1, 3, 1, 1))
if item['NormalizeImage'].get('is_scale', True):
self.scale = 1. / 255.
break
def forward(self, inputs):
if self.fuse_norm:
image = inputs['image']
self.inputs['image'] = (image * self.scale - self.mean) / self.std
self.inputs['im_shape'] = inputs['im_shape']
self.inputs['scale_factor'] = inputs['scale_factor']
else:
self.inputs = inputs
self.model_arch()
if self.training:
out = self.get_loss()
else:
out = self.get_pred()
return out
def build_inputs(self, data, input_def):
inputs = {}
for i, k in enumerate(input_def):
inputs[k] = data[i]
return inputs
def model_arch(self, ):
pass
def get_loss(self, ):
raise NotImplementedError("Should implement get_loss method!")
def get_pred(self, ):
raise NotImplementedError("Should implement get_pred method!")
def Conv2d(in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=1,
bias=True,
weight_init=Normal(std=0.001),
bias_init=Constant(0.)):
weight_attr = paddle.framework.ParamAttr(initializer=weight_init)
if bias:
bias_attr = paddle.framework.ParamAttr(initializer=bias_init)
else:
bias_attr = False
conv = nn.Conv2D(
in_channels,
out_channels,
kernel_size,
stride,
padding,
dilation,
groups,
weight_attr=weight_attr,
bias_attr=bias_attr)
return conv
@register
class TopDownHRNet(BaseArch):
__category__ = 'architecture'
__inject__ = ['loss']
def __init__(self,
width,
num_joints,
backbone='HRNet',
loss='KeyPointMSELoss',
post_process='HRNetPostProcess',
flip_perm=None,
flip=True,
shift_heatmap=True,
use_dark=True):
"""
HRNet network, see https://arxiv.org/abs/1902.09212
Args:
backbone (nn.Layer): backbone instance
post_process (object): `HRNetPostProcess` instance
flip_perm (list): The left-right joints exchange order list
use_dark(bool): Whether to use DARK in post processing
"""
super(TopDownHRNet, self).__init__()
self.backbone = backbone
self.post_process = HRNetPostProcess(use_dark)
self.loss = loss
self.flip_perm = flip_perm
self.flip = flip
self.final_conv = Conv2d(width, num_joints, 1, 1, 0, bias=True)
self.shift_heatmap = shift_heatmap
self.deploy = False
@classmethod
def from_config(cls, cfg, *args, **kwargs):
# backbone
backbone = create(cfg['backbone'])
return {'backbone': backbone, }
def _forward(self):
output = dict()
feats = self.backbone(self.inputs)
output["feats"] = feats
hrnet_outputs = self.final_conv(feats[0])
output["output"] = hrnet_outputs
if self.training:
loss = self.loss(hrnet_outputs, self.inputs)
output["loss"] = loss
return output
elif self.deploy:
outshape = hrnet_outputs.shape
max_idx = paddle.argmax(
hrnet_outputs.reshape(
(outshape[0], outshape[1], outshape[2] * outshape[3])),
axis=-1)
return hrnet_outputs, max_idx
else:
if self.flip:
self.inputs['image'] = self.inputs['image'].flip([3])
feats = self.backbone(self.inputs)
output_flipped = self.final_conv(feats[0])
output_flipped = self.flip_back(output_flipped.numpy(),
self.flip_perm)
output_flipped = paddle.to_tensor(output_flipped.copy())
if self.shift_heatmap:
output_flipped[:, :, :, 1:] = output_flipped.clone(
)[:, :, :, 0:-1]
hrnet_outputs = (hrnet_outputs + output_flipped) * 0.5
imshape = (self.inputs['im_shape'].numpy()
)[:, ::-1] if 'im_shape' in self.inputs else None
center = self.inputs['center'].numpy(
) if 'center' in self.inputs else np.round(imshape / 2.)
scale = self.inputs['scale'].numpy(
) if 'scale' in self.inputs else imshape / 200.
outputs = self.post_process(hrnet_outputs, center, scale)
return outputs
def get_loss(self):
return self._forward()
def get_pred(self):
res_lst = self._forward()
outputs = {'keypoint': res_lst}
return outputs
def flip_back(self, output_flipped, matched_parts):
assert output_flipped.ndim == 4,\
'output_flipped should be [batch_size, num_joints, height, width]'
output_flipped = output_flipped[:, :, :, ::-1]
for pair in matched_parts:
tmp = output_flipped[:, pair[0], :, :].copy()
output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
output_flipped[:, pair[1], :, :] = tmp
return output_flipped
class HRNetPostProcess(object):
def __init__(self, use_dark=True):
self.use_dark = use_dark
def get_max_preds(self, heatmaps):
'''get predictions from score maps
Args:
heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
Returns:
preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints
'''
assert isinstance(heatmaps,
np.ndarray), 'heatmaps should be numpy.ndarray'
assert heatmaps.ndim == 4, 'batch_images should be 4-ndim'
batch_size = heatmaps.shape[0]
num_joints = heatmaps.shape[1]
width = heatmaps.shape[3]
heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1))
idx = np.argmax(heatmaps_reshaped, 2)
maxvals = np.amax(heatmaps_reshaped, 2)
maxvals = maxvals.reshape((batch_size, num_joints, 1))
idx = idx.reshape((batch_size, num_joints, 1))
preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
preds[:, :, 0] = (preds[:, :, 0]) % width
preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
pred_mask = pred_mask.astype(np.float32)
preds *= pred_mask
return preds, maxvals
def gaussian_blur(self, heatmap, kernel):
border = (kernel - 1) // 2
batch_size = heatmap.shape[0]
num_joints = heatmap.shape[1]
height = heatmap.shape[2]
width = heatmap.shape[3]
for i in range(batch_size):
for j in range(num_joints):
origin_max = np.max(heatmap[i, j])
dr = np.zeros((height + 2 * border, width + 2 * border))
dr[border:-border, border:-border] = heatmap[i, j].copy()
dr = cv2.GaussianBlur(dr, (kernel, kernel), 0)
heatmap[i, j] = dr[border:-border, border:-border].copy()
heatmap[i, j] *= origin_max / np.max(heatmap[i, j])
return heatmap
def dark_parse(self, hm, coord):
heatmap_height = hm.shape[0]
heatmap_width = hm.shape[1]
px = int(coord[0])
py = int(coord[1])
if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2:
dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1])
dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px])
dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2])
dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \
+ hm[py-1][px-1])
dyy = 0.25 * (
hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px])
derivative = np.matrix([[dx], [dy]])
hessian = np.matrix([[dxx, dxy], [dxy, dyy]])
if dxx * dyy - dxy**2 != 0:
hessianinv = hessian.I
offset = -hessianinv * derivative
offset = np.squeeze(np.array(offset.T), axis=0)
coord += offset
return coord
def dark_postprocess(self, hm, coords, kernelsize):
'''DARK postpocessing, Zhang et al. Distribution-Aware Coordinate
Representation for Human Pose Estimation (CVPR 2020).
'''
hm = self.gaussian_blur(hm, kernelsize)
hm = np.maximum(hm, 1e-10)
hm = np.log(hm)
for n in range(coords.shape[0]):
for p in range(coords.shape[1]):
coords[n, p] = self.dark_parse(hm[n][p], coords[n][p])
return coords
def get_final_preds(self, heatmaps, center, scale, kernelsize=3):
"""the highest heatvalue location with a quarter offset in the
direction from the highest response to the second highest response.
Args:
heatmaps (numpy.ndarray): The predicted heatmaps
center (numpy.ndarray): The boxes center
scale (numpy.ndarray): The scale factor
Returns:
preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints
"""
coords, maxvals = self.get_max_preds(heatmaps)
heatmap_height = heatmaps.shape[2]
heatmap_width = heatmaps.shape[3]
if self.use_dark:
coords = self.dark_postprocess(heatmaps, coords, kernelsize)
else:
for n in range(coords.shape[0]):
for p in range(coords.shape[1]):
hm = heatmaps[n][p]
px = int(math.floor(coords[n][p][0] + 0.5))
py = int(math.floor(coords[n][p][1] + 0.5))
if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1:
diff = np.array([
hm[py][px + 1] - hm[py][px - 1],
hm[py + 1][px] - hm[py - 1][px]
])
coords[n][p] += np.sign(diff) * .25
preds = coords.copy()
# Transform back
for i in range(coords.shape[0]):
preds[i] = transform_preds(coords[i], center[i], scale[i],
[heatmap_width, heatmap_height])
return preds, maxvals
def __call__(self, output, center, scale):
preds, maxvals = self.get_final_preds(output.numpy(), center, scale)
outputs = [[
np.concatenate(
(preds, maxvals), axis=-1), np.mean(
maxvals, axis=1)
]]
return outputs
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from numbers import Integral
from paddle import ParamAttr
from paddle.regularizer import L2Decay
from paddle.nn.initializer import Normal, Constant
from lib.utils.workspace import register
from .hrnet import ShapeSpec
__all__ = ['LiteHRNet']
def channel_shuffle(x, groups):
batch_size, num_channels, height, width = x.shape[0:4]
assert num_channels % groups == 0, 'num_channels should be divisible by groups'
channels_per_group = num_channels // groups
x = paddle.reshape(
x=x, shape=[batch_size, groups, channels_per_group, height, width])
x = paddle.transpose(x=x, perm=[0, 2, 1, 3, 4])
x = paddle.reshape(x=x, shape=[batch_size, num_channels, height, width])
return x
class ConvNormLayer(nn.Layer):
def __init__(self,
ch_in,
ch_out,
filter_size,
stride=1,
groups=1,
norm_type=None,
norm_groups=32,
norm_decay=0.,
freeze_norm=False,
act=None):
super(ConvNormLayer, self).__init__()
self.act = act
norm_lr = 0. if freeze_norm else 1.
if norm_type is not None:
assert norm_type in ['bn', 'sync_bn', 'gn'],\
"norm_type should be one of ['bn', 'sync_bn', 'gn'], but got {}".format(norm_type)
param_attr = ParamAttr(
initializer=Constant(1.0),
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay), )
bias_attr = ParamAttr(
learning_rate=norm_lr, regularizer=L2Decay(norm_decay))
global_stats = True if freeze_norm else False
if norm_type in ['bn', 'sync_bn']:
self.norm = nn.BatchNorm(
ch_out,
param_attr=param_attr,
bias_attr=bias_attr,
use_global_stats=global_stats, )
elif norm_type == 'gn':
self.norm = nn.GroupNorm(
num_groups=norm_groups,
num_channels=ch_out,
weight_attr=param_attr,
bias_attr=bias_attr)
norm_params = self.norm.parameters()
if freeze_norm:
for param in norm_params:
param.stop_gradient = True
conv_bias_attr = False
else:
conv_bias_attr = True
self.norm = None
self.conv = nn.Conv2D(
in_channels=ch_in,
out_channels=ch_out,
kernel_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
weight_attr=ParamAttr(initializer=Normal(
mean=0., std=0.001)),
bias_attr=conv_bias_attr)
def forward(self, inputs):
out = self.conv(inputs)
if self.norm is not None:
out = self.norm(out)
if self.act == 'relu':
out = F.relu(out)
elif self.act == 'sigmoid':
out = F.sigmoid(out)
return out
class DepthWiseSeparableConvNormLayer(nn.Layer):
def __init__(self,
ch_in,
ch_out,
filter_size,
stride=1,
dw_norm_type=None,
pw_norm_type=None,
norm_decay=0.,
freeze_norm=False,
dw_act=None,
pw_act=None):
super(DepthWiseSeparableConvNormLayer, self).__init__()
self.depthwise_conv = ConvNormLayer(
ch_in=ch_in,
ch_out=ch_in,
filter_size=filter_size,
stride=stride,
groups=ch_in,
norm_type=dw_norm_type,
act=dw_act,
norm_decay=norm_decay,
freeze_norm=freeze_norm, )
self.pointwise_conv = ConvNormLayer(
ch_in=ch_in,
ch_out=ch_out,
filter_size=1,
stride=1,
norm_type=pw_norm_type,
act=pw_act,
norm_decay=norm_decay,
freeze_norm=freeze_norm, )
def forward(self, x):
x = self.depthwise_conv(x)
x = self.pointwise_conv(x)
return x
class CrossResolutionWeightingModule(nn.Layer):
def __init__(self,
channels,
ratio=16,
norm_type='bn',
freeze_norm=False,
norm_decay=0.):
super(CrossResolutionWeightingModule, self).__init__()
self.channels = channels
total_channel = sum(channels)
self.conv1 = ConvNormLayer(
ch_in=total_channel,
ch_out=total_channel // ratio,
filter_size=1,
stride=1,
norm_type=norm_type,
act='relu',
freeze_norm=freeze_norm,
norm_decay=norm_decay)
self.conv2 = ConvNormLayer(
ch_in=total_channel // ratio,
ch_out=total_channel,
filter_size=1,
stride=1,
norm_type=norm_type,
act='sigmoid',
freeze_norm=freeze_norm,
norm_decay=norm_decay)
def forward(self, x):
mini_size = x[-1].shape[-2:]
out = [F.adaptive_avg_pool2d(s, mini_size) for s in x[:-1]] + [x[-1]]
out = paddle.concat(out, 1)
out = self.conv1(out)
out = self.conv2(out)
out = paddle.split(out, self.channels, 1)
out = [
s * F.interpolate(
a, s.shape[-2:], mode='nearest') for s, a in zip(x, out)
]
return out
class SpatialWeightingModule(nn.Layer):
def __init__(self, in_channel, ratio=16, freeze_norm=False, norm_decay=0.):
super(SpatialWeightingModule, self).__init__()
self.global_avgpooling = nn.AdaptiveAvgPool2D(1)
self.conv1 = ConvNormLayer(
ch_in=in_channel,
ch_out=in_channel // ratio,
filter_size=1,
stride=1,
act='relu',
freeze_norm=freeze_norm,
norm_decay=norm_decay)
self.conv2 = ConvNormLayer(
ch_in=in_channel // ratio,
ch_out=in_channel,
filter_size=1,
stride=1,
act='sigmoid',
freeze_norm=freeze_norm,
norm_decay=norm_decay)
def forward(self, x):
out = self.global_avgpooling(x)
out = self.conv1(out)
out = self.conv2(out)
return x * out
class ConditionalChannelWeightingBlock(nn.Layer):
def __init__(self,
in_channels,
stride,
reduce_ratio,
norm_type='bn',
freeze_norm=False,
norm_decay=0.):
super(ConditionalChannelWeightingBlock, self).__init__()
assert stride in [1, 2]
branch_channels = [channel // 2 for channel in in_channels]
self.cross_resolution_weighting = CrossResolutionWeightingModule(
branch_channels,
ratio=reduce_ratio,
norm_type=norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay)
self.depthwise_convs = nn.LayerList([
ConvNormLayer(
channel,
channel,
filter_size=3,
stride=stride,
groups=channel,
norm_type=norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay) for channel in branch_channels
])
self.spatial_weighting = nn.LayerList([
SpatialWeightingModule(
channel,
ratio=4,
freeze_norm=freeze_norm,
norm_decay=norm_decay) for channel in branch_channels
])
def forward(self, x):
x = [s.chunk(2, axis=1) for s in x]
x1 = [s[0] for s in x]
x2 = [s[1] for s in x]
x2 = self.cross_resolution_weighting(x2)
x2 = [dw(s) for s, dw in zip(x2, self.depthwise_convs)]
x2 = [sw(s) for s, sw in zip(x2, self.spatial_weighting)]
out = [paddle.concat([s1, s2], axis=1) for s1, s2 in zip(x1, x2)]
out = [channel_shuffle(s, groups=2) for s in out]
return out
class ShuffleUnit(nn.Layer):
def __init__(self,
in_channel,
out_channel,
stride,
norm_type='bn',
freeze_norm=False,
norm_decay=0.):
super(ShuffleUnit, self).__init__()
branch_channel = out_channel // 2
self.stride = stride
if self.stride == 1:
assert in_channel == branch_channel * 2,\
"when stride=1, in_channel {} should equal to branch_channel*2 {}".format(in_channel, branch_channel * 2)
if stride > 1:
self.branch1 = nn.Sequential(
ConvNormLayer(
ch_in=in_channel,
ch_out=in_channel,
filter_size=3,
stride=self.stride,
groups=in_channel,
norm_type=norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay),
ConvNormLayer(
ch_in=in_channel,
ch_out=branch_channel,
filter_size=1,
stride=1,
norm_type=norm_type,
act='relu',
freeze_norm=freeze_norm,
norm_decay=norm_decay), )
self.branch2 = nn.Sequential(
ConvNormLayer(
ch_in=branch_channel if stride == 1 else in_channel,
ch_out=branch_channel,
filter_size=1,
stride=1,
norm_type=norm_type,
act='relu',
freeze_norm=freeze_norm,
norm_decay=norm_decay),
ConvNormLayer(
ch_in=branch_channel,
ch_out=branch_channel,
filter_size=3,
stride=self.stride,
groups=branch_channel,
norm_type=norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay),
ConvNormLayer(
ch_in=branch_channel,
ch_out=branch_channel,
filter_size=1,
stride=1,
norm_type=norm_type,
act='relu',
freeze_norm=freeze_norm,
norm_decay=norm_decay), )
def forward(self, x):
if self.stride > 1:
x1 = self.branch1(x)
x2 = self.branch2(x)
else:
x1, x2 = x.chunk(2, axis=1)
x2 = self.branch2(x2)
out = paddle.concat([x1, x2], axis=1)
out = channel_shuffle(out, groups=2)
return out
class IterativeHead(nn.Layer):
def __init__(self,
in_channels,
norm_type='bn',
freeze_norm=False,
norm_decay=0.):
super(IterativeHead, self).__init__()
num_branches = len(in_channels)
self.in_channels = in_channels[::-1]
projects = []
for i in range(num_branches):
if i != num_branches - 1:
projects.append(
DepthWiseSeparableConvNormLayer(
ch_in=self.in_channels[i],
ch_out=self.in_channels[i + 1],
filter_size=3,
stride=1,
dw_act=None,
pw_act='relu',
dw_norm_type=norm_type,
pw_norm_type=norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay))
else:
projects.append(
DepthWiseSeparableConvNormLayer(
ch_in=self.in_channels[i],
ch_out=self.in_channels[i],
filter_size=3,
stride=1,
dw_act=None,
pw_act='relu',
dw_norm_type=norm_type,
pw_norm_type=norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay))
self.projects = nn.LayerList(projects)
def forward(self, x):
x = x[::-1]
y = []
last_x = None
for i, s in enumerate(x):
if last_x is not None:
last_x = F.interpolate(
last_x,
size=s.shape[-2:],
mode='bilinear',
align_corners=True)
s = s + last_x
s = self.projects[i](s)
y.append(s)
last_x = s
return y[::-1]
class Stem(nn.Layer):
def __init__(self,
in_channel,
stem_channel,
out_channel,
expand_ratio,
norm_type='bn',
freeze_norm=False,
norm_decay=0.):
super(Stem, self).__init__()
self.conv1 = ConvNormLayer(
in_channel,
stem_channel,
filter_size=3,
stride=2,
norm_type=norm_type,
act='relu',
freeze_norm=freeze_norm,
norm_decay=norm_decay)
mid_channel = int(round(stem_channel * expand_ratio))
branch_channel = stem_channel // 2
if stem_channel == out_channel:
inc_channel = out_channel - branch_channel
else:
inc_channel = out_channel - stem_channel
self.branch1 = nn.Sequential(
ConvNormLayer(
ch_in=branch_channel,
ch_out=branch_channel,
filter_size=3,
stride=2,
groups=branch_channel,
norm_type=norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay),
ConvNormLayer(
ch_in=branch_channel,
ch_out=inc_channel,
filter_size=1,
stride=1,
norm_type=norm_type,
act='relu',
freeze_norm=freeze_norm,
norm_decay=norm_decay), )
self.expand_conv = ConvNormLayer(
ch_in=branch_channel,
ch_out=mid_channel,
filter_size=1,
stride=1,
norm_type=norm_type,
act='relu',
freeze_norm=freeze_norm,
norm_decay=norm_decay)
self.depthwise_conv = ConvNormLayer(
ch_in=mid_channel,
ch_out=mid_channel,
filter_size=3,
stride=2,
groups=mid_channel,
norm_type=norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay)
self.linear_conv = ConvNormLayer(
ch_in=mid_channel,
ch_out=branch_channel
if stem_channel == out_channel else stem_channel,
filter_size=1,
stride=1,
norm_type=norm_type,
act='relu',
freeze_norm=freeze_norm,
norm_decay=norm_decay)
def forward(self, x):
x = self.conv1(x)
x1, x2 = x.chunk(2, axis=1)
x1 = self.branch1(x1)
x2 = self.expand_conv(x2)
x2 = self.depthwise_conv(x2)
x2 = self.linear_conv(x2)
out = paddle.concat([x1, x2], axis=1)
out = channel_shuffle(out, groups=2)
return out
class LiteHRNetModule(nn.Layer):
def __init__(self,
num_branches,
num_blocks,
in_channels,
reduce_ratio,
module_type,
multiscale_output=False,
with_fuse=True,
norm_type='bn',
freeze_norm=False,
norm_decay=0.):
super(LiteHRNetModule, self).__init__()
assert num_branches == len(in_channels),\
"num_branches {} should equal to num_in_channels {}".format(num_branches, len(in_channels))
assert module_type in ['LITE', 'NAIVE'],\
"module_type should be one of ['LITE', 'NAIVE']"
self.num_branches = num_branches
self.in_channels = in_channels
self.multiscale_output = multiscale_output
self.with_fuse = with_fuse
self.norm_type = 'bn'
self.module_type = module_type
if self.module_type == 'LITE':
self.layers = self._make_weighting_blocks(
num_blocks,
reduce_ratio,
freeze_norm=freeze_norm,
norm_decay=norm_decay)
elif self.module_type == 'NAIVE':
self.layers = self._make_naive_branches(
num_branches,
num_blocks,
freeze_norm=freeze_norm,
norm_decay=norm_decay)
if self.with_fuse:
self.fuse_layers = self._make_fuse_layers(
freeze_norm=freeze_norm, norm_decay=norm_decay)
self.relu = nn.ReLU()
def _make_weighting_blocks(self,
num_blocks,
reduce_ratio,
stride=1,
freeze_norm=False,
norm_decay=0.):
layers = []
for i in range(num_blocks):
layers.append(
ConditionalChannelWeightingBlock(
self.in_channels,
stride=stride,
reduce_ratio=reduce_ratio,
norm_type=self.norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay))
return nn.Sequential(*layers)
def _make_naive_branches(self,
num_branches,
num_blocks,
freeze_norm=False,
norm_decay=0.):
branches = []
for branch_idx in range(num_branches):
layers = []
for i in range(num_blocks):
layers.append(
ShuffleUnit(
self.in_channels[branch_idx],
self.in_channels[branch_idx],
stride=1,
norm_type=self.norm_type,
freeze_norm=freeze_norm,
norm_decay=norm_decay))
branches.append(nn.Sequential(*layers))
return nn.LayerList(branches)
def _make_fuse_layers(self, freeze_norm=False, norm_decay=0.):
if self.num_branches == 1:
return None
fuse_layers = []
num_out_branches = self.num_branches if self.multiscale_output else 1
for i in range(num_out_branches):
fuse_layer = []
for j in range(self.num_branches):
if j > i:
fuse_layer.append(
nn.Sequential(
nn.Conv2D(
self.in_channels[j],
self.in_channels[i],
kernel_size=1,
stride=1,
padding=0,
weight_attr=paddle.ParamAttr(
initializer=Normal(std=0.001)),
bias_attr=False),
nn.BatchNorm(self.in_channels[i]),
nn.Upsample(
scale_factor=2**(j - i), mode='nearest')))
elif j == i:
fuse_layer.append(None)
else:
conv_downsamples = []
for k in range(i - j):
if k == i - j - 1:
conv_downsamples.append(
nn.Sequential(
nn.Conv2D(
self.in_channels[j],
self.in_channels[j],
kernel_size=3,
stride=2,
padding=1,
groups=self.in_channels[j],
weight_attr=paddle.ParamAttr(
initializer=Normal(std=0.001)),
bias_attr=False),
nn.BatchNorm(self.in_channels[j]),
nn.Conv2D(
self.in_channels[j],
self.in_channels[i],
kernel_size=1,
stride=1,
padding=0,
weight_attr=paddle.ParamAttr(
initializer=Normal(std=0.001)),
bias_attr=False),
nn.BatchNorm(self.in_channels[i])))
else:
conv_downsamples.append(
nn.Sequential(
nn.Conv2D(
self.in_channels[j],
self.in_channels[j],
kernel_size=3,
stride=2,
padding=1,
groups=self.in_channels[j],
weight_attr=paddle.ParamAttr(
initializer=Normal(std=0.001)),
bias_attr=False),
nn.BatchNorm(self.in_channels[j]),
nn.Conv2D(
self.in_channels[j],
self.in_channels[j],
kernel_size=1,
stride=1,
padding=0,
weight_attr=paddle.ParamAttr(
initializer=Normal(std=0.001)),
bias_attr=False),
nn.BatchNorm(self.in_channels[j]),
nn.ReLU()))
fuse_layer.append(nn.Sequential(*conv_downsamples))
fuse_layers.append(nn.LayerList(fuse_layer))
return nn.LayerList(fuse_layers)
def forward(self, x):
if self.num_branches == 1:
return [self.layers[0](x[0])]
if self.module_type == 'LITE':
out = self.layers(x)
elif self.module_type == 'NAIVE':
for i in range(self.num_branches):
x[i] = self.layers[i](x[i])
out = x
if self.with_fuse:
out_fuse = []
for i in range(len(self.fuse_layers)):
y = out[0] if i == 0 else self.fuse_layers[i][0](out[0])
for j in range(self.num_branches):
if j == 0:
y += y
elif i == j:
y += out[j]
else:
y += self.fuse_layers[i][j](out[j])
if i == 0:
out[i] = y
out_fuse.append(self.relu(y))
out = out_fuse
elif not self.multiscale_output:
out = [out[0]]
return out
@register
class LiteHRNet(nn.Layer):
"""
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},year={2021}
}
Args:
network_type (str): the network_type should be one of ["lite_18", "lite_30", "naive", "wider_naive"],
"naive": Simply combining the shuffle block in ShuffleNet and the highresolution design pattern in HRNet.
"wider_naive": Naive network with wider channels in each block.
"lite_18": Lite-HRNet-18, which replaces the pointwise convolution in a shuffle block by conditional channel weighting.
"lite_30": Lite-HRNet-30, with more blocks compared with Lite-HRNet-18.
freeze_at (int): the stage to freeze
freeze_norm (bool): whether to freeze norm in HRNet
norm_decay (float): weight decay for normalization layer weights
return_idx (List): the stage to return
"""
def __init__(self,
network_type,
freeze_at=0,
freeze_norm=True,
norm_decay=0.,
return_idx=[0, 1, 2, 3]):
super(LiteHRNet, self).__init__()
if isinstance(return_idx, Integral):
return_idx = [return_idx]
assert network_type in ["lite_18", "lite_30", "naive", "wider_naive"],\
"the network_type should be one of [lite_18, lite_30, naive, wider_naive]"
assert len(return_idx) > 0, "need one or more return index"
self.freeze_at = freeze_at
self.freeze_norm = freeze_norm
self.norm_decay = norm_decay
self.return_idx = return_idx
self.norm_type = 'bn'
self.module_configs = {
"lite_18": {
"num_modules": [2, 4, 2],
"num_branches": [2, 3, 4],
"num_blocks": [2, 2, 2],
"module_type": ["LITE", "LITE", "LITE"],
"reduce_ratios": [8, 8, 8],
"num_channels": [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
},
"lite_30": {
"num_modules": [3, 8, 3],
"num_branches": [2, 3, 4],
"num_blocks": [2, 2, 2],
"module_type": ["LITE", "LITE", "LITE"],
"reduce_ratios": [8, 8, 8],
"num_channels": [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
},
"naive": {
"num_modules": [2, 4, 2],
"num_branches": [2, 3, 4],
"num_blocks": [2, 2, 2],
"module_type": ["NAIVE", "NAIVE", "NAIVE"],
"reduce_ratios": [1, 1, 1],
"num_channels": [[30, 60], [30, 60, 120], [30, 60, 120, 240]],
},
"wider_naive": {
"num_modules": [2, 4, 2],
"num_branches": [2, 3, 4],
"num_blocks": [2, 2, 2],
"module_type": ["NAIVE", "NAIVE", "NAIVE"],
"reduce_ratios": [1, 1, 1],
"num_channels": [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
},
}
self.stages_config = self.module_configs[network_type]
self.stem = Stem(3, 32, 32, 1)
num_channels_pre_layer = [32]
for stage_idx in range(3):
num_channels = self.stages_config["num_channels"][stage_idx]
setattr(self, 'transition{}'.format(stage_idx),
self._make_transition_layer(num_channels_pre_layer,
num_channels, self.freeze_norm,
self.norm_decay))
stage, num_channels_pre_layer = self._make_stage(
self.stages_config, stage_idx, num_channels, True,
self.freeze_norm, self.norm_decay)
setattr(self, 'stage{}'.format(stage_idx), stage)
self.head_layer = IterativeHead(num_channels_pre_layer, 'bn',
self.freeze_norm, self.norm_decay)
def _make_transition_layer(self,
num_channels_pre_layer,
num_channels_cur_layer,
freeze_norm=False,
norm_decay=0.):
num_branches_pre = len(num_channels_pre_layer)
num_branches_cur = len(num_channels_cur_layer)
transition_layers = []
for i in range(num_branches_cur):
if i < num_branches_pre:
if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
transition_layers.append(
nn.Sequential(
nn.Conv2D(
num_channels_pre_layer[i],
num_channels_pre_layer[i],
kernel_size=3,
stride=1,
padding=1,
groups=num_channels_pre_layer[i],
weight_attr=paddle.ParamAttr(
initializer=Normal(std=0.001)),
bias_attr=False),
nn.BatchNorm(num_channels_pre_layer[i]),
nn.Conv2D(
num_channels_pre_layer[i],
num_channels_cur_layer[i],
kernel_size=1,
stride=1,
padding=0,
weight_attr=paddle.ParamAttr(
initializer=Normal(std=0.001)),
bias_attr=False),
nn.BatchNorm(num_channels_cur_layer[i]),
nn.ReLU()))
else:
transition_layers.append(None)
else:
conv_downsamples = []
for j in range(i + 1 - num_branches_pre):
conv_downsamples.append(
nn.Sequential(
nn.Conv2D(
num_channels_pre_layer[-1],
num_channels_pre_layer[-1],
groups=num_channels_pre_layer[-1],
kernel_size=3,
stride=2,
padding=1,
weight_attr=paddle.ParamAttr(
initializer=Normal(std=0.001)),
bias_attr=False),
nn.BatchNorm(num_channels_pre_layer[-1]),
nn.Conv2D(
num_channels_pre_layer[-1],
num_channels_cur_layer[i]
if j == i - num_branches_pre else
num_channels_pre_layer[-1],
kernel_size=1,
stride=1,
padding=0,
weight_attr=paddle.ParamAttr(
initializer=Normal(std=0.001)),
bias_attr=False),
nn.BatchNorm(num_channels_cur_layer[i]
if j == i - num_branches_pre else
num_channels_pre_layer[-1]),
nn.ReLU()))
transition_layers.append(nn.Sequential(*conv_downsamples))
return nn.LayerList(transition_layers)
def _make_stage(self,
stages_config,
stage_idx,
in_channels,
multiscale_output,
freeze_norm=False,
norm_decay=0.):
num_modules = stages_config["num_modules"][stage_idx]
num_branches = stages_config["num_branches"][stage_idx]
num_blocks = stages_config["num_blocks"][stage_idx]
reduce_ratio = stages_config['reduce_ratios'][stage_idx]
module_type = stages_config['module_type'][stage_idx]
modules = []
for i in range(num_modules):
if not multiscale_output and i == num_modules - 1:
reset_multiscale_output = False
else:
reset_multiscale_output = True
modules.append(
LiteHRNetModule(
num_branches,
num_blocks,
in_channels,
reduce_ratio,
module_type,
multiscale_output=reset_multiscale_output,
with_fuse=True,
freeze_norm=freeze_norm,
norm_decay=norm_decay))
in_channels = modules[-1].in_channels
return nn.Sequential(*modules), in_channels
def forward(self, inputs):
x = inputs['image']
x = self.stem(x)
y_list = [x]
for stage_idx in range(3):
x_list = []
transition = getattr(self, 'transition{}'.format(stage_idx))
for j in range(self.stages_config["num_branches"][stage_idx]):
if transition[j] is not None:
if j >= len(y_list):
x_list.append(transition[j](y_list[-1]))
else:
x_list.append(transition[j](y_list[j]))
else:
x_list.append(y_list[j])
y_list = getattr(self, 'stage{}'.format(stage_idx))(x_list)
x = self.head_layer(y_list)
res = []
for i, layer in enumerate(x):
if i == self.freeze_at:
layer.stop_gradient = True
if i in self.return_idx:
res.append(layer)
return res
def out_shape(self):
return [
ShapeSpec(
channels=self._out_channels[i], stride=self._out_strides[i])
for i in self.return_idx
]
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from itertools import cycle, islice
from collections import abc
import paddle
import paddle.nn as nn
from lib.utils.workspace import register, serializable
__all__ = ['KeyPointMSELoss']
@register
@serializable
class KeyPointMSELoss(nn.Layer):
def __init__(self, use_target_weight=True, loss_scale=0.5):
"""
KeyPointMSELoss layer
Args:
use_target_weight (bool): whether to use target weight
"""
super(KeyPointMSELoss, self).__init__()
self.criterion = nn.MSELoss(reduction='mean')
self.use_target_weight = use_target_weight
self.loss_scale = loss_scale
def forward(self, output, records):
target = records['target']
target_weight = records['target_weight']
batch_size = output.shape[0]
num_joints = output.shape[1]
heatmaps_pred = output.reshape(
(batch_size, num_joints, -1)).split(num_joints, 1)
heatmaps_gt = target.reshape(
(batch_size, num_joints, -1)).split(num_joints, 1)
loss = 0
for idx in range(num_joints):
heatmap_pred = heatmaps_pred[idx].squeeze()
heatmap_gt = heatmaps_gt[idx].squeeze()
if self.use_target_weight:
loss += self.loss_scale * self.criterion(
heatmap_pred.multiply(target_weight[:, idx]),
heatmap_gt.multiply(target_weight[:, idx]))
else:
loss += self.loss_scale * self.criterion(heatmap_pred,
heatmap_gt)
loss = loss / num_joints
return loss
@register
@serializable
class DistMSELoss(nn.Layer):
def __init__(self,
use_target_weight=True,
loss_scale=0.5,
key=None,
weight=1.0):
super().__init__()
self.criterion = nn.MSELoss(reduction='mean')
self.use_target_weight = use_target_weight
self.loss_scale = loss_scale
self.key = key
self.weight = weight
def forward(self, student_out, teacher_out, records):
if self.key is not None:
student_out = student_out[self.key]
teacher_out = teacher_out[self.key]
target_weight = records['target_weight']
batch_size = student_out.shape[0]
num_joints = student_out.shape[1]
heatmaps_pred = student_out.reshape(
(batch_size, num_joints, -1)).split(num_joints, 1)
heatmaps_gt = teacher_out.reshape(
(batch_size, num_joints, -1)).split(num_joints, 1)
loss = 0
for idx in range(num_joints):
heatmap_pred = heatmaps_pred[idx].squeeze()
heatmap_gt = heatmaps_gt[idx].squeeze()
if self.use_target_weight:
loss += self.loss_scale * self.criterion(
heatmap_pred.multiply(target_weight[:, idx]),
heatmap_gt.multiply(target_weight[:, idx]))
else:
loss += self.loss_scale * self.criterion(heatmap_pred,
heatmap_gt)
loss = loss / num_joints * self.weight
return loss
from . import quant
from .quant import *
import yaml
from lib.utils.workspace import load_config, create
from lib.utils.checkpoint import load_pretrain_weight
def build_slim_model(cfg, mode='train'):
assert cfg.slim == 'QAT', 'Only QAT is supported now'
model = create(cfg.architecture)
if mode == 'train':
load_pretrain_weight(model, cfg.pretrain_weights)
slim = create(cfg.slim)
cfg['slim_type'] = cfg.slim
# TODO: fix quant export model in framework.
if mode == 'test' and cfg.slim == 'QAT':
slim.quant_config['activation_preprocess_type'] = None
cfg['model'] = slim(model)
cfg['slim'] = slim
if mode != 'train':
load_pretrain_weight(cfg['model'], cfg.weights)
return cfg
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle.utils import try_import
from lib.utils.workspace import register, serializable
from lib.utils.logger import setup_logger
logger = setup_logger(__name__)
@register
@serializable
class QAT(object):
def __init__(self, quant_config, print_model):
super(QAT, self).__init__()
self.quant_config = quant_config
self.print_model = print_model
def __call__(self, model):
paddleslim = try_import('paddleslim')
self.quanter = paddleslim.dygraph.quant.QAT(config=self.quant_config)
if self.print_model:
logger.info("Model before quant:")
logger.info(model)
self.quanter.quantize(model)
if self.print_model:
logger.info("Quantized model:")
logger.info(model)
return model
def save_quantized_model(self, layer, path, input_spec=None, **config):
self.quanter.save_quantized_model(
model=layer, path=path, input_spec=input_spec, **config)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import check
from . import checkpoint
from . import cli
from . import download
from . import env
from . import logger
from . import stats
from . import visualizer
from . import workspace
from . import config
from . import keypoint_utils
from .workspace import *
from .visualizer import *
from .cli import *
from .download import *
from .env import *
from .logger import *
from .stats import *
from .checkpoint import *
from .check import *
from .config import *
from .keypoint_utils import *
__all__ = workspace.__all__ + visualizer.__all__ + cli.__all__ \
+ download.__all__ + env.__all__ + logger.__all__ \
+ stats.__all__ + checkpoint.__all__ + check.__all__ \
+ config.__all__ + keypoint_utils.__all__
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import paddle
import six
import paddle.version as fluid_version
from .logger import setup_logger
logger = setup_logger(__name__)
__all__ = ['check_gpu', 'check_version', 'check_config']
def check_gpu(use_gpu):
"""
Log error and exit when set use_gpu=true in paddlepaddle
cpu version.
"""
err = "Config use_gpu cannot be set as true while you are " \
"using paddlepaddle cpu version ! \nPlease try: \n" \
"\t1. Install paddlepaddle-gpu to run model on GPU \n" \
"\t2. Set use_gpu as false in config file to run " \
"model on CPU"
try:
if use_gpu and not paddle.is_compiled_with_cuda():
logger.error(err)
sys.exit(1)
except Exception as e:
pass
def check_version(version='2.0'):
"""
Log error and exit when the installed version of paddlepaddle is
not satisfied.
"""
err = "PaddlePaddle version {} or higher is required, " \
"or a suitable develop version is satisfied as well. \n" \
"Please make sure the version is good with your code.".format(version)
version_installed = [
fluid_version.major, fluid_version.minor, fluid_version.patch,
fluid_version.rc
]
if version_installed == ['0', '0', '0', '0']:
return
version_split = version.split('.')
length = min(len(version_installed), len(version_split))
for i in six.moves.range(length):
if version_installed[i] > version_split[i]:
return
if version_installed[i] < version_split[i]:
raise Exception(err)
def check_config(cfg):
"""
Check the correctness of the configuration file. Log error and exit
when Config is not compliant.
"""
err = "'{}' not specified in config file. Please set it in config file."
check_list = ['architecture', 'num_classes']
try:
for var in check_list:
if not var in cfg:
logger.error(err.format(var))
sys.exit(1)
except Exception as e:
pass
if 'log_iter' not in cfg:
cfg.log_iter = 20
return cfg
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import errno
import os
import time
import numpy as np
import paddle
import paddle.nn as nn
from .download import get_weights_path
from .logger import setup_logger
logger = setup_logger(__name__)
__all__ = [
'is_url', 'load_weight', 'match_state_dict', 'load_pretrain_weight',
'save_model'
]
def is_url(path):
"""
Whether path is URL.
Args:
path (string): URL string or not.
"""
return path.startswith('http://') \
or path.startswith('https://') \
or path.startswith('ppdet://')
def _get_unique_endpoints(trainer_endpoints):
# Sorting is to avoid different environmental variables for each card
trainer_endpoints.sort()
ips = set()
unique_endpoints = set()
for endpoint in trainer_endpoints:
ip = endpoint.split(":")[0]
if ip in ips:
continue
ips.add(ip)
unique_endpoints.add(endpoint)
logger.info("unique_endpoints {}".format(unique_endpoints))
return unique_endpoints
def _strip_postfix(path):
path, ext = os.path.splitext(path)
assert ext in ['', '.pdparams', '.pdopt', '.pdmodel'], \
"Unknown postfix {} from weights".format(ext)
return path
def load_weight(model, weight, optimizer=None):
if is_url(weight):
weight = get_weights_path(weight)
path = _strip_postfix(weight)
pdparam_path = path + '.pdparams'
if not os.path.exists(pdparam_path):
raise ValueError("Model pretrain path {} does not "
"exists.".format(pdparam_path))
param_state_dict = paddle.load(pdparam_path)
model_dict = model.state_dict()
model_weight = {}
incorrect_keys = 0
for key in model_dict.keys():
if key in param_state_dict.keys():
model_weight[key] = param_state_dict[key]
else:
logger.info('Unmatched key: {}'.format(key))
incorrect_keys += 1
assert incorrect_keys == 0, "Load weight {} incorrectly, \
{} keys unmatched, please check again.".format(weight,
incorrect_keys)
logger.info('Finish resuming model weights: {}'.format(pdparam_path))
model.set_dict(model_weight)
last_epoch = 0
if optimizer is not None and os.path.exists(path + '.pdopt'):
optim_state_dict = paddle.load(path + '.pdopt')
# to solve resume bug, will it be fixed in paddle 2.0
for key in optimizer.state_dict().keys():
if not key in optim_state_dict.keys():
optim_state_dict[key] = optimizer.state_dict()[key]
if 'last_epoch' in optim_state_dict:
last_epoch = optim_state_dict.pop('last_epoch')
optimizer.set_state_dict(optim_state_dict)
return last_epoch
def match_state_dict(model_state_dict, weight_state_dict):
"""
Match between the model state dict and pretrained weight state dict.
Return the matched state dict.
The method supposes that all the names in pretrained weight state dict are
subclass of the names in models`, if the prefix 'backbone.' in pretrained weight
keys is stripped. And we could get the candidates for each model key. Then we
select the name with the longest matched size as the final match result. For
example, the model state dict has the name of
'backbone.res2.res2a.branch2a.conv.weight' and the pretrained weight as
name of 'res2.res2a.branch2a.conv.weight' and 'branch2a.conv.weight'. We
match the 'res2.res2a.branch2a.conv.weight' to the model key.
"""
model_keys = sorted(model_state_dict.keys())
weight_keys = sorted(weight_state_dict.keys())
def match(a, b):
if a.startswith('backbone.res5'):
# In Faster RCNN, res5 pretrained weights have prefix of backbone,
# however, the corresponding model weights have difficult prefix,
# bbox_head.
b = b[9:]
return a == b or a.endswith("." + b)
match_matrix = np.zeros([len(model_keys), len(weight_keys)])
for i, m_k in enumerate(model_keys):
for j, w_k in enumerate(weight_keys):
if match(m_k, w_k):
match_matrix[i, j] = len(w_k)
max_id = match_matrix.argmax(1)
max_len = match_matrix.max(1)
max_id[max_len == 0] = -1
not_load_weight_name = []
for match_idx in range(len(max_id)):
if match_idx < len(weight_keys) and max_id[match_idx] == -1:
not_load_weight_name.append(weight_keys[match_idx])
if len(not_load_weight_name) > 0:
logger.info('{} in pretrained weight is not used in the model, '
'and its will not be loaded'.format(not_load_weight_name))
matched_keys = {}
result_state_dict = {}
for model_id, weight_id in enumerate(max_id):
if weight_id == -1:
continue
model_key = model_keys[model_id]
weight_key = weight_keys[weight_id]
weight_value = weight_state_dict[weight_key]
model_value_shape = list(model_state_dict[model_key].shape)
if list(weight_value.shape) != model_value_shape:
logger.info(
'The shape {} in pretrained weight {} is unmatched with '
'the shape {} in model {}. And the weight {} will not be '
'loaded'.format(weight_value.shape, weight_key,
model_value_shape, model_key, weight_key))
continue
assert model_key not in result_state_dict
result_state_dict[model_key] = weight_value
if weight_key in matched_keys:
raise ValueError('Ambiguity weight {} loaded, it matches at least '
'{} and {} in the model'.format(
weight_key, model_key, matched_keys[
weight_key]))
matched_keys[weight_key] = model_key
return result_state_dict
def load_pretrain_weight(model, pretrain_weight):
if is_url(pretrain_weight):
pretrain_weight = get_weights_path(pretrain_weight)
path = _strip_postfix(pretrain_weight)
if not (os.path.isdir(path) or os.path.isfile(path) or
os.path.exists(path + '.pdparams')):
raise ValueError("Model pretrain path `{}` does not exists. "
"If you don't want to load pretrain model, "
"please delete `pretrain_weights` field in "
"config file.".format(path))
model_dict = model.state_dict()
weights_path = path + '.pdparams'
param_state_dict = paddle.load(weights_path)
param_state_dict = match_state_dict(model_dict, param_state_dict)
model.set_dict(param_state_dict)
logger.info('Finish loading model weights: {}'.format(weights_path))
def save_model(model, optimizer, save_dir, save_name, last_epoch):
"""
save model into disk.
Args:
model (paddle.nn.Layer): the Layer instalce to save parameters.
optimizer (paddle.optimizer.Optimizer): the Optimizer instance to
save optimizer states.
save_dir (str): the directory to be saved.
save_name (str): the path to be saved.
last_epoch (int): the epoch index.
"""
if paddle.distributed.get_rank() != 0:
return
if not os.path.exists(save_dir):
os.makedirs(save_dir)
save_path = os.path.join(save_dir, save_name)
if isinstance(model, nn.Layer):
paddle.save(model.state_dict(), save_path + ".pdparams")
else:
assert isinstance(model,
dict), 'model is not a instance of nn.layer or dict'
paddle.save(model, save_path + ".pdparams")
state_dict = optimizer.state_dict()
state_dict['last_epoch'] = last_epoch
paddle.save(state_dict, save_path + ".pdopt")
logger.info("Save checkpoint: {}".format(save_dir))
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from argparse import ArgumentParser, RawDescriptionHelpFormatter
import yaml
import re
from .workspace import get_registered_modules, dump_value
__all__ = ['ColorTTY', 'ArgsParser']
class ColorTTY(object):
def __init__(self):
super(ColorTTY, self).__init__()
self.colors = ['red', 'green', 'yellow', 'blue', 'magenta', 'cyan']
def __getattr__(self, attr):
if attr in self.colors:
color = self.colors.index(attr) + 31
def color_message(message):
return "[{}m{}".format(color, message)
setattr(self, attr, color_message)
return color_message
def bold(self, message):
return self.with_code('01', message)
def with_code(self, code, message):
return "[{}m{}".format(code, message)
class ArgsParser(ArgumentParser):
def __init__(self):
super(ArgsParser, self).__init__(
formatter_class=RawDescriptionHelpFormatter)
self.add_argument("-c", "--config", help="configuration file to use")
self.add_argument(
"-o", "--opt", nargs='*', help="set configuration options")
def parse_args(self, argv=None):
args = super(ArgsParser, self).parse_args(argv)
assert args.config is not None, \
"Please specify --config=configure_file_path."
args.opt = self._parse_opt(args.opt)
return args
def _parse_opt(self, opts):
config = {}
if not opts:
return config
for s in opts:
s = s.strip()
k, v = s.split('=', 1)
if '.' not in k:
config[k] = yaml.load(v, Loader=yaml.Loader)
else:
keys = k.split('.')
if keys[0] not in config:
config[keys[0]] = {}
cur = config[keys[0]]
for idx, key in enumerate(keys[1:]):
if idx == len(keys) - 2:
cur[key] = yaml.load(v, Loader=yaml.Loader)
else:
cur[key] = {}
cur = cur[key]
return config
def print_total_cfg(config):
modules = get_registered_modules()
color_tty = ColorTTY()
green = '___{}___'.format(color_tty.colors.index('green') + 31)
styled = {}
for key in config.keys():
if not config[key]: # empty schema
continue
if key not in modules and not hasattr(config[key], '__dict__'):
styled[key] = config[key]
continue
elif key in modules:
module = modules[key]
else:
type_name = type(config[key]).__name__
if type_name in modules:
module = modules[type_name].copy()
module.update({
k: v
for k, v in config[key].__dict__.items()
if k in module.schema
})
key += " ({})".format(type_name)
default = module.find_default_keys()
missing = module.find_missing_keys()
mismatch = module.find_mismatch_keys()
extra = module.find_extra_keys()
dep_missing = []
for dep in module.inject:
if isinstance(module[dep], str) and module[dep] != '<value>':
if module[dep] not in modules: # not a valid module
dep_missing.append(dep)
else:
dep_mod = modules[module[dep]]
# empty dict but mandatory
if not dep_mod and dep_mod.mandatory():
dep_missing.append(dep)
override = list(
set(module.keys()) - set(default) - set(extra) - set(dep_missing))
replacement = {}
for name in set(override + default + extra + mismatch + missing):
new_name = name
if name in missing:
value = "<missing>"
else:
value = module[name]
if name in extra:
value = dump_value(value) + " <extraneous>"
elif name in mismatch:
value = dump_value(value) + " <type mismatch>"
elif name in dep_missing:
value = dump_value(value) + " <module config missing>"
elif name in override and value != '<missing>':
mark = green
new_name = mark + name
replacement[new_name] = value
styled[key] = replacement
buffer = yaml.dump(styled, default_flow_style=False, default_style='')
buffer = (re.sub(r"<missing>", r"[31m<missing>[0m", buffer))
buffer = (re.sub(r"<extraneous>", r"[33m<extraneous>[0m", buffer))
buffer = (re.sub(r"<type mismatch>", r"[31m<type mismatch>[0m", buffer))
buffer = (re.sub(r"<module config missing>",
r"[31m<module config missing>[0m", buffer))
buffer = re.sub(r"___(\d+)___(.*?):", r"[\1m\2[0m:", buffer)
print(buffer)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import yaml_helpers
from .yaml_helpers import *
__all__ = yaml_helpers.__all__
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
import inspect
import importlib
import re
try:
from docstring_parser import parse as doc_parse
except Exception:
def doc_parse(*args):
pass
try:
from typeguard import check_type
except Exception:
def check_type(*args):
pass
__all__ = ['SchemaValue', 'SchemaDict', 'SharedConfig', 'extract_schema']
class SchemaValue(object):
def __init__(self, name, doc='', type=None):
super(SchemaValue, self).__init__()
self.name = name
self.doc = doc
self.type = type
def set_default(self, value):
self.default = value
def has_default(self):
return hasattr(self, 'default')
class SchemaDict(dict):
def __init__(self, **kwargs):
super().__init__()
self.schema = {}
self.strict = False
self.doc = ""
self.update(kwargs)
def __setitem__(self, key, value):
# XXX also update regular dict to SchemaDict??
if isinstance(value, dict) and key in self and isinstance(self[key],
SchemaDict):
self[key].update(value)
else:
super().__setitem__(key, value)
def __missing__(self, key):
if self.has_default(key):
return self.schema[key].default
elif key in self.schema:
return self.schema[key]
else:
raise KeyError(key)
def copy(self):
newone = SchemaDict()
newone.__dict__.update(self.__dict__)
newone.update(self)
return newone
def set_schema(self, key, value):
assert isinstance(value, SchemaValue)
self.schema[key] = value
def set_strict(self, strict):
self.strict = strict
def has_default(self, key):
return key in self.schema and self.schema[key].has_default()
def is_default(self, key):
if not self.has_default(key):
return False
if hasattr(self[key], '__dict__'):
return True
else:
return key not in self or self[key] == self.schema[key].default
def find_default_keys(self):
return [
k for k in list(self.keys()) + list(self.schema.keys())
if self.is_default(k)
]
def mandatory(self):
return any([k for k in self.schema.keys() if not self.has_default(k)])
def find_missing_keys(self):
missing = [
k for k in self.schema.keys()
if k not in self and not self.has_default(k)
]
placeholders = [k for k in self if self[k] in ('<missing>', '<value>')]
return missing + placeholders
def find_extra_keys(self):
return list(set(self.keys()) - set(self.schema.keys()))
def find_mismatch_keys(self):
mismatch_keys = []
for arg in self.schema.values():
if arg.type is not None:
try:
check_type("{}.{}".format(self.name, arg.name),
self[arg.name], arg.type)
except Exception:
mismatch_keys.append(arg.name)
return mismatch_keys
def validate(self):
missing_keys = self.find_missing_keys()
if missing_keys:
raise ValueError("Missing param for class<{}>: {}".format(
self.name, ", ".join(missing_keys)))
extra_keys = self.find_extra_keys()
if extra_keys and self.strict:
raise ValueError("Extraneous param for class<{}>: {}".format(
self.name, ", ".join(extra_keys)))
mismatch_keys = self.find_mismatch_keys()
if mismatch_keys:
raise TypeError("Wrong param type for class<{}>: {}".format(
self.name, ", ".join(mismatch_keys)))
class SharedConfig(object):
"""
Representation class for `__shared__` annotations, which work as follows:
- if `key` is set for the module in config file, its value will take
precedence
- if `key` is not set for the module but present in the config file, its
value will be used
- otherwise, use the provided `default_value` as fallback
Args:
key: config[key] will be injected
default_value: fallback value
"""
def __init__(self, key, default_value=None):
super(SharedConfig, self).__init__()
self.key = key
self.default_value = default_value
def extract_schema(cls):
"""
Extract schema from a given class
Args:
cls (type): Class from which to extract.
Returns:
schema (SchemaDict): Extracted schema.
"""
ctor = cls.__init__
# python 2 compatibility
if hasattr(inspect, 'getfullargspec'):
argspec = inspect.getfullargspec(ctor)
annotations = argspec.annotations
has_kwargs = argspec.varkw is not None
else:
argspec = inspect.getfullargspec(ctor)
# python 2 type hinting workaround, see pep-3107
# however, since `typeguard` does not support python 2, type checking
# is still python 3 only for now
annotations = getattr(ctor, '__annotations__', {})
has_kwargs = argspec.varkw is not None
names = [arg for arg in argspec.args if arg != 'self']
defaults = argspec.defaults
num_defaults = argspec.defaults is not None and len(argspec.defaults) or 0
num_required = len(names) - num_defaults
docs = cls.__doc__
if docs is None and getattr(cls, '__category__', None) == 'op':
docs = cls.__call__.__doc__
try:
docstring = doc_parse(docs)
except Exception:
docstring = None
if docstring is None:
comments = {}
else:
comments = {}
for p in docstring.params:
match_obj = re.match('^([a-zA-Z_]+[a-zA-Z_0-9]*).*', p.arg_name)
if match_obj is not None:
comments[match_obj.group(1)] = p.description
schema = SchemaDict()
schema.name = cls.__name__
schema.doc = ""
if docs is not None:
start_pos = docs[0] == '\n' and 1 or 0
schema.doc = docs[start_pos:].split("\n")[0].strip()
# XXX handle paddle's weird doc convention
if '**' == schema.doc[:2] and '**' == schema.doc[-2:]:
schema.doc = schema.doc[2:-2].strip()
schema.category = hasattr(cls, '__category__') and getattr(
cls, '__category__') or 'module'
schema.strict = not has_kwargs
schema.pymodule = importlib.import_module(cls.__module__)
schema.inject = getattr(cls, '__inject__', [])
schema.shared = getattr(cls, '__shared__', [])
for idx, name in enumerate(names):
comment = name in comments and comments[name] or name
if name in schema.inject:
type_ = None
else:
type_ = name in annotations and annotations[name] or None
value_schema = SchemaValue(name, comment, type_)
if name in schema.shared:
assert idx >= num_required, "shared config must have default value"
default = defaults[idx - num_required]
value_schema.set_default(SharedConfig(name, default))
elif idx >= num_required:
default = defaults[idx - num_required]
value_schema.set_default(default)
schema.set_schema(name, value_schema)
return schema
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import importlib
import inspect
import yaml
from .schema import SharedConfig
__all__ = ['serializable', 'Callable']
def represent_dictionary_order(self, dict_data):
return self.represent_mapping('tag:yaml.org,2002:map', dict_data.items())
def setup_orderdict():
from collections import OrderedDict
yaml.add_representer(OrderedDict, represent_dictionary_order)
def _make_python_constructor(cls):
def python_constructor(loader, node):
if isinstance(node, yaml.SequenceNode):
args = loader.construct_sequence(node, deep=True)
return cls(*args)
else:
kwargs = loader.construct_mapping(node, deep=True)
try:
return cls(**kwargs)
except Exception as ex:
print("Error when construct {} instance from yaml config".
format(cls.__name__))
raise ex
return python_constructor
def _make_python_representer(cls):
# python 2 compatibility
if hasattr(inspect, 'getfullargspec'):
argspec = inspect.getfullargspec(cls)
else:
argspec = inspect.getfullargspec(cls.__init__)
argnames = [arg for arg in argspec.args if arg != 'self']
def python_representer(dumper, obj):
if argnames:
data = {name: getattr(obj, name) for name in argnames}
else:
data = obj.__dict__
if '_id' in data:
del data['_id']
return dumper.represent_mapping(u'!{}'.format(cls.__name__), data)
return python_representer
def serializable(cls):
"""
Add loader and dumper for given class, which must be
"trivially serializable"
Args:
cls: class to be serialized
Returns: cls
"""
yaml.add_constructor(u'!{}'.format(cls.__name__),
_make_python_constructor(cls))
yaml.add_representer(cls, _make_python_representer(cls))
return cls
yaml.add_representer(SharedConfig,
lambda d, o: d.represent_data(o.default_value))
@serializable
class Callable(object):
"""
Helper to be used in Yaml for creating arbitrary class objects
Args:
full_type (str): the full module path to target function
"""
def __init__(self, full_type, args=[], kwargs={}):
super(Callable, self).__init__()
self.full_type = full_type
self.args = args
self.kwargs = kwargs
def __call__(self):
if '.' in self.full_type:
idx = self.full_type.rfind('.')
module = importlib.import_module(self.full_type[:idx])
func_name = self.full_type[idx + 1:]
else:
try:
module = importlib.import_module('builtins')
except Exception:
module = importlib.import_module('__builtin__')
func_name = self.full_type
func = getattr(module, func_name)
return func(*self.args, **self.kwargs)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import os.path as osp
import sys
import yaml
import time
import shutil
import requests
import tqdm
import hashlib
import base64
import binascii
import tarfile
import zipfile
from paddle.utils.download import _get_unique_endpoints
from .logger import setup_logger
logger = setup_logger(__name__)
__all__ = [
'get_weights_path',
'get_dataset_path',
'get_config_path',
'download_dataset',
]
WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights")
DATASET_HOME = osp.expanduser("~/.cache/paddle/dataset")
CONFIGS_HOME = osp.expanduser("~/.cache/paddle/configs")
# dict of {dataset_name: (download_info, sub_dirs)}
# download info: [(url, md5sum)]
DATASETS = {
'coco': ([
(
'http://images.cocodataset.org/zips/train2017.zip',
'cced6f7f71b7629ddf16f17bbcfab6b2', ),
(
'http://images.cocodataset.org/zips/val2017.zip',
'442b8da7639aecaf257c1dceb8ba8c80', ),
(
'http://images.cocodataset.org/annotations/annotations_trainval2017.zip',
'f4bbac642086de4f52a3fdda2de5fa2c', ),
], ["annotations", "train2017", "val2017"]),
}
DOWNLOAD_RETRY_LIMIT = 3
PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX = 'https://paddledet.bj.bcebos.com/'
def parse_url(url):
url = url.replace("ppdet://", PPDET_WEIGHTS_DOWNLOAD_URL_PREFIX)
return url
def get_weights_path(url):
"""Get weights path from WEIGHTS_HOME, if not exists,
download it from url.
"""
url = parse_url(url)
path, _ = get_path(url, WEIGHTS_HOME)
return path
def get_config_path(url):
"""Get weights path from CONFIGS_HOME, if not exists,
download it from url.
"""
url = parse_url(url)
path = map_path(url, CONFIGS_HOME, path_depth=2)
if os.path.isfile(path):
return path
# config file not found, try download
# 1. clear configs directory
if osp.isdir(CONFIGS_HOME):
shutil.rmtree(CONFIGS_HOME)
# 2. get url
try:
from ppdet import __version__ as version
except ImportError:
version = None
cfg_url = "ppdet://configs/{}/configs.tar".format(version) \
if version else "ppdet://configs/configs.tar"
cfg_url = parse_url(cfg_url)
# 3. download and decompress
cfg_fullname = _download_dist(cfg_url, osp.dirname(CONFIGS_HOME))
_decompress_dist(cfg_fullname)
# 4. check config file existing
if os.path.isfile(path):
return path
else:
logger.error("Get config {} failed after download, please contact us on " \
"https://github.com/PaddlePaddle/PaddleDetection/issues".format(path))
sys.exit(1)
def get_dataset_path(path, annotation, image_dir):
"""
If path exists, return path.
Otherwise, get dataset path from DATASET_HOME, if not exists,
download it.
"""
if _dataset_exists(path, annotation, image_dir):
return path
logger.info(
"Dataset {} is not valid for reason above, try searching {} or "
"downloading dataset...".format(osp.realpath(path), DATASET_HOME))
data_name = os.path.split(path.strip().lower())[-1]
for name, dataset in DATASETS.items():
if data_name == name:
logger.debug("Parse dataset_dir {} as dataset "
"{}".format(path, name))
if name == 'objects365':
raise NotImplementedError(
"Dataset {} is not valid for download automatically. "
"Please apply and download the dataset from "
"https://www.objects365.org/download.html".format(name))
data_dir = osp.join(DATASET_HOME, name)
if name == 'mot':
if osp.exists(path) or osp.exists(data_dir):
return data_dir
else:
raise NotImplementedError(
"Dataset {} is not valid for download automatically. "
"Please apply and download the dataset following docs/tutorials/PrepareMOTDataSet.md".
format(name))
if name == "spine_coco":
if _dataset_exists(data_dir, annotation, image_dir):
return data_dir
# For voc, only check dir VOCdevkit/VOC2012, VOCdevkit/VOC2007
if name in ['voc', 'fruit', 'roadsign_voc']:
exists = True
for sub_dir in dataset[1]:
check_dir = osp.join(data_dir, sub_dir)
if osp.exists(check_dir):
logger.info("Found {}".format(check_dir))
else:
exists = False
if exists:
return data_dir
# voc exist is checked above, voc is not exist here
check_exist = name != 'voc' and name != 'fruit' and name != 'roadsign_voc'
for url, md5sum in dataset[0]:
get_path(url, data_dir, md5sum, check_exist)
# voc should create list after download
if name == 'voc':
create_voc_list(data_dir)
return data_dir
# not match any dataset in DATASETS
raise ValueError(
"Dataset {} is not valid and cannot parse dataset type "
"'{}' for automaticly downloading, which only supports "
"'voc' , 'coco', 'wider_face', 'fruit', 'roadsign_voc' and 'mot' currently".
format(path, osp.split(path)[-1]))
def map_path(url, root_dir, path_depth=1):
# parse path after download to decompress under root_dir
assert path_depth > 0, "path_depth should be a positive integer"
dirname = url
for _ in range(path_depth):
dirname = osp.dirname(dirname)
fpath = osp.relpath(url, dirname)
zip_formats = ['.zip', '.tar', '.gz']
for zip_format in zip_formats:
fpath = fpath.replace(zip_format, '')
return osp.join(root_dir, fpath)
def get_path(url, root_dir, md5sum=None, check_exist=True):
""" Download from given url to root_dir.
if file or directory specified by url is exists under
root_dir, return the path directly, otherwise download
from url and decompress it, return the path.
url (str): download url
root_dir (str): root dir for downloading, it should be
WEIGHTS_HOME or DATASET_HOME
md5sum (str): md5 sum of download package
"""
# parse path after download to decompress under root_dir
fullpath = map_path(url, root_dir)
# For same zip file, decompressed directory name different
# from zip file name, rename by following map
decompress_name_map = {
"VOCtrainval_11-May-2012": "VOCdevkit/VOC2012",
"VOCtrainval_06-Nov-2007": "VOCdevkit/VOC2007",
"VOCtest_06-Nov-2007": "VOCdevkit/VOC2007",
"annotations_trainval": "annotations"
}
for k, v in decompress_name_map.items():
if fullpath.find(k) >= 0:
fullpath = osp.join(osp.split(fullpath)[0], v)
if osp.exists(fullpath) and check_exist:
if not osp.isfile(fullpath) or \
_check_exist_file_md5(fullpath, md5sum, url):
logger.debug("Found {}".format(fullpath))
return fullpath, True
else:
os.remove(fullpath)
fullname = _download_dist(url, root_dir, md5sum)
# new weights format which postfix is 'pdparams' not
# need to decompress
if osp.splitext(fullname)[-1] not in ['.pdparams', '.yml']:
_decompress_dist(fullname)
return fullpath, False
def download_dataset(path, dataset=None):
if dataset not in DATASETS.keys():
logger.error("Unknown dataset {}, it should be "
"{}".format(dataset, DATASETS.keys()))
return
dataset_info = DATASETS[dataset][0]
for info in dataset_info:
get_path(info[0], path, info[1], False)
logger.debug("Download dataset {} finished.".format(dataset))
def _dataset_exists(path, annotation, image_dir):
"""
Check if user define dataset exists
"""
if not osp.exists(path):
logger.warning("Config dataset_dir {} is not exits, "
"dataset config is not valid".format(path))
return False
if annotation:
annotation_path = osp.join(path, annotation)
if not osp.isfile(annotation_path):
logger.warning("Config annotation {} is not a "
"file, dataset config is not "
"valid".format(annotation_path))
return False
if image_dir:
image_path = osp.join(path, image_dir)
if not osp.isdir(image_path):
logger.warning("Config image_dir {} is not a "
"directory, dataset config is not "
"valid".format(image_path))
return False
return True
def _download(url, path, md5sum=None):
"""
Download from url, save to path.
url (str): download url
path (str): download to given path
"""
if not osp.exists(path):
os.makedirs(path)
fname = osp.split(url)[-1]
fullname = osp.join(path, fname)
retry_cnt = 0
while not (osp.exists(fullname) and _check_exist_file_md5(fullname, md5sum,
url)):
if retry_cnt < DOWNLOAD_RETRY_LIMIT:
retry_cnt += 1
else:
raise RuntimeError("Download from {} failed. "
"Retry limit reached".format(url))
logger.info("Downloading {} from {}".format(fname, url))
# NOTE: windows path join may incur \, which is invalid in url
if sys.platform == "win32":
url = url.replace('\\', '/')
req = requests.get(url, stream=True)
if req.status_code != 200:
raise RuntimeError("Downloading from {} failed with code "
"{}!".format(url, req.status_code))
# For protecting download interupted, download to
# tmp_fullname firstly, move tmp_fullname to fullname
# after download finished
tmp_fullname = fullname + "_tmp"
total_size = req.headers.get('content-length')
with open(tmp_fullname, 'wb') as f:
if total_size:
for chunk in tqdm.tqdm(
req.iter_content(chunk_size=1024),
total=(int(total_size) + 1023) // 1024,
unit='KB'):
f.write(chunk)
else:
for chunk in req.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
shutil.move(tmp_fullname, fullname)
return fullname
def _download_dist(url, path, md5sum=None):
env = os.environ
if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env:
trainer_id = int(env['PADDLE_TRAINER_ID'])
num_trainers = int(env['PADDLE_TRAINERS_NUM'])
if num_trainers <= 1:
return _download(url, path, md5sum)
else:
fname = osp.split(url)[-1]
fullname = osp.join(path, fname)
lock_path = fullname + '.download.lock'
if not osp.isdir(path):
os.makedirs(path)
if not osp.exists(fullname):
from paddle.distributed import ParallelEnv
unique_endpoints = _get_unique_endpoints(ParallelEnv()
.trainer_endpoints[:])
with open(lock_path, 'w'): # touch
os.utime(lock_path, None)
if ParallelEnv().current_endpoint in unique_endpoints:
_download(url, path, md5sum)
os.remove(lock_path)
else:
while os.path.exists(lock_path):
time.sleep(0.5)
return fullname
else:
return _download(url, path, md5sum)
def _check_exist_file_md5(filename, md5sum, url):
# if md5sum is None, and file to check is weights file,
# read md5um from url and check, else check md5sum directly
return _md5check_from_url(filename, url) if md5sum is None \
and filename.endswith('pdparams') \
else _md5check(filename, md5sum)
def _md5check_from_url(filename, url):
# For weights in bcebos URLs, MD5 value is contained
# in request header as 'content_md5'
req = requests.get(url, stream=True)
content_md5 = req.headers.get('content-md5')
req.close()
if not content_md5 or _md5check(
filename,
binascii.hexlify(base64.b64decode(content_md5.strip('"'))).decode(
)):
return True
else:
return False
def _md5check(fullname, md5sum=None):
if md5sum is None:
return True
logger.debug("File {} md5 checking...".format(fullname))
md5 = hashlib.md5()
with open(fullname, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b""):
md5.update(chunk)
calc_md5sum = md5.hexdigest()
if calc_md5sum != md5sum:
logger.warning("File {} md5 check failed, {}(calc) != "
"{}(base)".format(fullname, calc_md5sum, md5sum))
return False
return True
def _decompress(fname):
"""
Decompress for zip and tar file
"""
logger.info("Decompressing {}...".format(fname))
# For protecting decompressing interupted,
# decompress to fpath_tmp directory firstly, if decompress
# successed, move decompress files to fpath and delete
# fpath_tmp and remove download compress file.
fpath = osp.split(fname)[0]
fpath_tmp = osp.join(fpath, 'tmp')
if osp.isdir(fpath_tmp):
shutil.rmtree(fpath_tmp)
os.makedirs(fpath_tmp)
if fname.find('tar') >= 0:
with tarfile.open(fname) as tf:
tf.extractall(path=fpath_tmp)
elif fname.find('zip') >= 0:
with zipfile.ZipFile(fname) as zf:
zf.extractall(path=fpath_tmp)
elif fname.find('.txt') >= 0:
return
else:
raise TypeError("Unsupport compress file type {}".format(fname))
for f in os.listdir(fpath_tmp):
src_dir = osp.join(fpath_tmp, f)
dst_dir = osp.join(fpath, f)
_move_and_merge_tree(src_dir, dst_dir)
shutil.rmtree(fpath_tmp)
os.remove(fname)
def _decompress_dist(fname):
env = os.environ
if 'PADDLE_TRAINERS_NUM' in env and 'PADDLE_TRAINER_ID' in env:
trainer_id = int(env['PADDLE_TRAINER_ID'])
num_trainers = int(env['PADDLE_TRAINERS_NUM'])
if num_trainers <= 1:
_decompress(fname)
else:
lock_path = fname + '.decompress.lock'
from paddle.distributed import ParallelEnv
unique_endpoints = _get_unique_endpoints(ParallelEnv()
.trainer_endpoints[:])
# NOTE(dkp): _decompress_dist always performed after
# _download_dist, in _download_dist sub-trainers is waiting
# for download lock file release with sleeping, if decompress
# prograss is very fast and finished with in the sleeping gap
# time, e.g in tiny dataset such as coco_ce, spine_coco, main
# trainer may finish decompress and release lock file, so we
# only craete lock file in main trainer and all sub-trainer
# wait 1s for main trainer to create lock file, for 1s is
# twice as sleeping gap, this waiting time can keep all
# trainer pipeline in order
# **change this if you have more elegent methods**
if ParallelEnv().current_endpoint in unique_endpoints:
with open(lock_path, 'w'): # touch
os.utime(lock_path, None)
_decompress(fname)
os.remove(lock_path)
else:
time.sleep(1)
while os.path.exists(lock_path):
time.sleep(0.5)
else:
_decompress(fname)
def _move_and_merge_tree(src, dst):
"""
Move src directory to dst, if dst is already exists,
merge src to dst
"""
if not osp.exists(dst):
shutil.move(src, dst)
elif osp.isfile(src):
shutil.move(src, dst)
else:
for fp in os.listdir(src):
src_fp = osp.join(src, fp)
dst_fp = osp.join(dst, fp)
if osp.isdir(src_fp):
if osp.isdir(dst_fp):
_move_and_merge_tree(src_fp, dst_fp)
else:
shutil.move(src_fp, dst_fp)
elif osp.isfile(src_fp) and \
not osp.isfile(dst_fp):
shutil.move(src_fp, dst_fp)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import random
import numpy as np
import paddle
from paddle.distributed import fleet
__all__ = ['init_parallel_env', 'set_random_seed', 'init_fleet_env']
def init_fleet_env(find_unused_parameters=False):
strategy = fleet.DistributedStrategy()
strategy.find_unused_parameters = find_unused_parameters
fleet.init(is_collective=True, strategy=strategy)
def init_parallel_env():
env = os.environ
dist = 'PADDLE_TRAINER_ID' in env and 'PADDLE_TRAINERS_NUM' in env
if dist:
trainer_id = int(env['PADDLE_TRAINER_ID'])
local_seed = (99 + trainer_id)
random.seed(local_seed)
np.random.seed(local_seed)
paddle.distributed.init_parallel_env()
def set_random_seed(seed):
paddle.seed(seed)
random.seed(seed)
np.random.seed(seed)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import cv2
import numpy as np
__all__ = [
'get_affine_mat_kernel', 'get_affine_transform', 'get_warp_matrix',
'rotate_point', 'transpred', 'warp_affine_joints', 'affine_transform',
'transform_preds', 'oks_iou', 'oks_nms', 'rescore', 'soft_oks_nms'
]
def get_affine_mat_kernel(h, w, s, inv=False):
if w < h:
w_ = s
h_ = int(np.ceil((s / w * h) / 64.) * 64)
scale_w = w
scale_h = h_ / w_ * w
else:
h_ = s
w_ = int(np.ceil((s / h * w) / 64.) * 64)
scale_h = h
scale_w = w_ / h_ * h
center = np.array([np.round(w / 2.), np.round(h / 2.)])
size_resized = (w_, h_)
trans = get_affine_transform(
center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv)
return trans, size_resized
def get_affine_transform(center,
input_size,
rot,
output_size,
shift=(0., 0.),
inv=False):
"""Get the affine transform matrix, given the center/scale/rot/output_size.
Args:
center (np.ndarray[2, ]): Center of the bounding box (x, y).
input_size (np.ndarray[2, ]): Size of input feature (width, height).
rot (float): Rotation angle (degree).
output_size (np.ndarray[2, ]): Size of the destination heatmaps.
shift (0-100%): Shift translation ratio wrt the width/height.
Default (0., 0.).
inv (bool): Option to inverse the affine transform direction.
(inv=False: src->dst or inv=True: dst->src)
Returns:
np.ndarray: The transform matrix.
"""
assert len(center) == 2
assert len(output_size) == 2
assert len(shift) == 2
if not isinstance(input_size, (np.ndarray, list)):
input_size = np.array([input_size, input_size], dtype=np.float32)
scale_tmp = input_size
shift = np.array(shift)
src_w = scale_tmp[0]
dst_w = output_size[0]
dst_h = output_size[1]
rot_rad = np.pi * rot / 180
src_dir = rotate_point([0., src_w * -0.5], rot_rad)
dst_dir = np.array([0., dst_w * -0.5])
src = np.zeros((3, 2), dtype=np.float32)
src[0, :] = center + scale_tmp * shift
src[1, :] = center + src_dir + scale_tmp * shift
src[2, :] = _get_3rd_point(src[0, :], src[1, :])
dst = np.zeros((3, 2), dtype=np.float32)
dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
if inv:
trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
else:
trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
return trans
def get_warp_matrix(theta, size_input, size_dst, size_target):
"""Calculate the transformation matrix under the constraint of unbiased.
Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
Data Processing for Human Pose Estimation (CVPR 2020).
Args:
theta (float): Rotation angle in degrees.
size_input (np.ndarray): Size of input image [w, h].
size_dst (np.ndarray): Size of output image [w, h].
size_target (np.ndarray): Size of ROI in input plane [w, h].
Returns:
matrix (np.ndarray): A matrix for transformation.
"""
theta = np.deg2rad(theta)
matrix = np.zeros((2, 3), dtype=np.float32)
scale_x = size_dst[0] / size_target[0]
scale_y = size_dst[1] / size_target[1]
matrix[0, 0] = np.cos(theta) * scale_x
matrix[0, 1] = -np.sin(theta) * scale_x
matrix[0, 2] = scale_x * (
-0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
np.sin(theta) + 0.5 * size_target[0])
matrix[1, 0] = np.sin(theta) * scale_y
matrix[1, 1] = np.cos(theta) * scale_y
matrix[1, 2] = scale_y * (
-0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
np.cos(theta) + 0.5 * size_target[1])
return matrix
def _get_3rd_point(a, b):
"""To calculate the affine matrix, three pairs of points are required. This
function is used to get the 3rd point, given 2D points a & b.
The 3rd point is defined by rotating vector `a - b` by 90 degrees
anticlockwise, using b as the rotation center.
Args:
a (np.ndarray): point(x,y)
b (np.ndarray): point(x,y)
Returns:
np.ndarray: The 3rd point.
"""
assert len(
a) == 2, 'input of _get_3rd_point should be point with length of 2'
assert len(
b) == 2, 'input of _get_3rd_point should be point with length of 2'
direction = a - b
third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
return third_pt
def rotate_point(pt, angle_rad):
"""Rotate a point by an angle.
Args:
pt (list[float]): 2 dimensional point to be rotated
angle_rad (float): rotation angle by radian
Returns:
list[float]: Rotated point.
"""
assert len(pt) == 2
sn, cs = np.sin(angle_rad), np.cos(angle_rad)
new_x = pt[0] * cs - pt[1] * sn
new_y = pt[0] * sn + pt[1] * cs
rotated_pt = [new_x, new_y]
return rotated_pt
def transpred(kpts, h, w, s):
trans, _ = get_affine_mat_kernel(h, w, s, inv=True)
return warp_affine_joints(kpts[..., :2].copy(), trans)
def warp_affine_joints(joints, mat):
"""Apply affine transformation defined by the transform matrix on the
joints.
Args:
joints (np.ndarray[..., 2]): Origin coordinate of joints.
mat (np.ndarray[3, 2]): The affine matrix.
Returns:
matrix (np.ndarray[..., 2]): Result coordinate of joints.
"""
joints = np.array(joints)
shape = joints.shape
joints = joints.reshape(-1, 2)
return np.dot(np.concatenate(
(joints, joints[:, 0:1] * 0 + 1), axis=1),
mat.T).reshape(shape)
def affine_transform(pt, t):
new_pt = np.array([pt[0], pt[1], 1.]).T
new_pt = np.dot(t, new_pt)
return new_pt[:2]
def transform_preds(coords, center, scale, output_size):
target_coords = np.zeros(coords.shape)
trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1)
for p in range(coords.shape[0]):
target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
return target_coords
def oks_iou(g, d, a_g, a_d, sigmas=None, in_vis_thre=None):
if not isinstance(sigmas, np.ndarray):
sigmas = np.array([
.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07,
.87, .87, .89, .89
]) / 10.0
vars = (sigmas * 2)**2
xg = g[0::3]
yg = g[1::3]
vg = g[2::3]
ious = np.zeros((d.shape[0]))
for n_d in range(0, d.shape[0]):
xd = d[n_d, 0::3]
yd = d[n_d, 1::3]
vd = d[n_d, 2::3]
dx = xd - xg
dy = yd - yg
e = (dx**2 + dy**2) / vars / ((a_g + a_d[n_d]) / 2 + np.spacing(1)) / 2
if in_vis_thre is not None:
ind = list(vg > in_vis_thre) and list(vd > in_vis_thre)
e = e[ind]
ious[n_d] = np.sum(np.exp(-e)) / e.shape[0] if e.shape[0] != 0 else 0.0
return ious
def oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None):
"""greedily select boxes with high confidence and overlap with current maximum <= thresh
rule out overlap >= thresh
Args:
kpts_db (list): The predicted keypoints within the image
thresh (float): The threshold to select the boxes
sigmas (np.array): The variance to calculate the oks iou
Default: None
in_vis_thre (float): The threshold to select the high confidence boxes
Default: None
Return:
keep (list): indexes to keep
"""
if len(kpts_db) == 0:
return []
scores = np.array([kpts_db[i]['score'] for i in range(len(kpts_db))])
kpts = np.array(
[kpts_db[i]['keypoints'].flatten() for i in range(len(kpts_db))])
areas = np.array([kpts_db[i]['area'] for i in range(len(kpts_db))])
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]],
sigmas, in_vis_thre)
inds = np.where(oks_ovr <= thresh)[0]
order = order[inds + 1]
return keep
def rescore(overlap, scores, thresh, type='gaussian'):
assert overlap.shape[0] == scores.shape[0]
if type == 'linear':
inds = np.where(overlap >= thresh)[0]
scores[inds] = scores[inds] * (1 - overlap[inds])
else:
scores = scores * np.exp(-overlap**2 / thresh)
return scores
def soft_oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None):
"""greedily select boxes with high confidence and overlap with current maximum <= thresh
rule out overlap >= thresh
Args:
kpts_db (list): The predicted keypoints within the image
thresh (float): The threshold to select the boxes
sigmas (np.array): The variance to calculate the oks iou
Default: None
in_vis_thre (float): The threshold to select the high confidence boxes
Default: None
Return:
keep (list): indexes to keep
"""
if len(kpts_db) == 0:
return []
scores = np.array([kpts_db[i]['score'] for i in range(len(kpts_db))])
kpts = np.array(
[kpts_db[i]['keypoints'].flatten() for i in range(len(kpts_db))])
areas = np.array([kpts_db[i]['area'] for i in range(len(kpts_db))])
order = scores.argsort()[::-1]
scores = scores[order]
# max_dets = order.size
max_dets = 20
keep = np.zeros(max_dets, dtype=np.intp)
keep_cnt = 0
while order.size > 0 and keep_cnt < max_dets:
i = order[0]
oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]],
sigmas, in_vis_thre)
order = order[1:]
scores = rescore(oks_ovr, scores[1:], thresh)
tmp = scores.argsort()[::-1]
order = order[tmp]
scores = scores[tmp]
keep[keep_cnt] = i
keep_cnt += 1
keep = keep[:keep_cnt]
return keep
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
import sys
import paddle.distributed as dist
__all__ = ['setup_logger']
logger_initialized = []
def setup_logger(name="ppdet", output=None):
"""
Initialize logger and set its verbosity level to INFO.
Args:
output (str): a file name or a directory to save log. If None, will not save log file.
If ends with ".txt" or ".log", assumed to be a file name.
Otherwise, logs will be saved to `output/log.txt`.
name (str): the root module name of this logger
Returns:
logging.Logger: a logger
"""
logger = logging.getLogger(name)
if name in logger_initialized:
return logger
logger.setLevel(logging.INFO)
logger.propagate = False
formatter = logging.Formatter(
"[%(asctime)s] %(name)s %(levelname)s: %(message)s",
datefmt="%m/%d %H:%M:%S")
# stdout logging: master only
local_rank = dist.get_rank()
if local_rank == 0:
ch = logging.StreamHandler(stream=sys.stdout)
ch.setLevel(logging.DEBUG)
ch.setFormatter(formatter)
logger.addHandler(ch)
# file logging: all workers
if output is not None:
if output.endswith(".txt") or output.endswith(".log"):
filename = output
else:
filename = os.path.join(output, "log.txt")
if local_rank > 0:
filename = filename + ".rank{}".format(local_rank)
os.makedirs(os.path.dirname(filename))
fh = logging.FileHandler(filename, mode='a')
fh.setLevel(logging.DEBUG)
fh.setFormatter(logging.Formatter())
logger.addHandler(fh)
logger_initialized.append(name)
return logger
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
import numpy as np
__all__ = ['SmoothedValue', 'TrainingStats']
class SmoothedValue(object):
"""Track a series of values and provide access to smoothed values over a
window or the global series average.
"""
def __init__(self, window_size=20, fmt=None):
if fmt is None:
fmt = "{median:.4f} ({avg:.4f})"
self.deque = collections.deque(maxlen=window_size)
self.fmt = fmt
self.total = 0.
self.count = 0
def update(self, value, n=1):
self.deque.append(value)
self.count += n
self.total += value * n
@property
def median(self):
return np.median(self.deque)
@property
def avg(self):
return np.mean(self.deque)
@property
def max(self):
return np.max(self.deque)
@property
def value(self):
return self.deque[-1]
@property
def global_avg(self):
return self.total / self.count
def __str__(self):
return self.fmt.format(
median=self.median, avg=self.avg, max=self.max, value=self.value)
class TrainingStats(object):
def __init__(self, window_size, delimiter=' '):
self.meters = None
self.window_size = window_size
self.delimiter = delimiter
def update(self, stats):
if self.meters is None:
self.meters = {
k: SmoothedValue(self.window_size)
for k in stats.keys()
}
for k, v in self.meters.items():
v.update(stats[k].numpy())
def get(self, extras=None):
stats = collections.OrderedDict()
if extras:
for k, v in extras.items():
stats[k] = v
for k, v in self.meters.items():
stats[k] = format(v.median, '.6f')
return stats
def log(self, extras=None):
d = self.get(extras)
strs = []
for k, v in d.items():
strs.append("{}: {}".format(k, str(v)))
return self.delimiter.join(strs)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
from PIL import Image, ImageDraw
import cv2
import math
from .logger import setup_logger
logger = setup_logger(__name__)
__all__ = ['colormap', 'visualize_results']
def colormap(rgb=False):
"""
Get colormap
The code of this function is copied from https://github.com/facebookresearch/Detectron/blob/main/detectron/utils/colormap.py
"""
color_list = np.array([
0.000, 0.447, 0.741, 0.850, 0.325, 0.098, 0.929, 0.694, 0.125, 0.494,
0.184, 0.556, 0.466, 0.674, 0.188, 0.301, 0.745, 0.933, 0.635, 0.078,
0.184, 0.300, 0.300, 0.300, 0.600, 0.600, 0.600, 1.000, 0.000, 0.000,
1.000, 0.500, 0.000, 0.749, 0.749, 0.000, 0.000, 1.000, 0.000, 0.000,
0.000, 1.000, 0.667, 0.000, 1.000, 0.333, 0.333, 0.000, 0.333, 0.667,
0.000, 0.333, 1.000, 0.000, 0.667, 0.333, 0.000, 0.667, 0.667, 0.000,
0.667, 1.000, 0.000, 1.000, 0.333, 0.000, 1.000, 0.667, 0.000, 1.000,
1.000, 0.000, 0.000, 0.333, 0.500, 0.000, 0.667, 0.500, 0.000, 1.000,
0.500, 0.333, 0.000, 0.500, 0.333, 0.333, 0.500, 0.333, 0.667, 0.500,
0.333, 1.000, 0.500, 0.667, 0.000, 0.500, 0.667, 0.333, 0.500, 0.667,
0.667, 0.500, 0.667, 1.000, 0.500, 1.000, 0.000, 0.500, 1.000, 0.333,
0.500, 1.000, 0.667, 0.500, 1.000, 1.000, 0.500, 0.000, 0.333, 1.000,
0.000, 0.667, 1.000, 0.000, 1.000, 1.000, 0.333, 0.000, 1.000, 0.333,
0.333, 1.000, 0.333, 0.667, 1.000, 0.333, 1.000, 1.000, 0.667, 0.000,
1.000, 0.667, 0.333, 1.000, 0.667, 0.667, 1.000, 0.667, 1.000, 1.000,
1.000, 0.000, 1.000, 1.000, 0.333, 1.000, 1.000, 0.667, 1.000, 0.167,
0.000, 0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000,
0.000, 0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000,
0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000,
0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000, 0.000,
0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000, 0.833,
0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.143, 0.143, 0.143, 0.286,
0.286, 0.286, 0.429, 0.429, 0.429, 0.571, 0.571, 0.571, 0.714, 0.714,
0.714, 0.857, 0.857, 0.857, 1.000, 1.000, 1.000
]).astype(np.float32)
color_list = color_list.reshape((-1, 3)) * 255
if not rgb:
color_list = color_list[:, ::-1]
return color_list
def visualize_results(image,
bbox_res,
keypoint_res,
im_id,
catid2name,
threshold=0.5):
"""
Visualize bbox and mask results
"""
if bbox_res is not None:
image = draw_bbox(image, im_id, catid2name, bbox_res, threshold)
if keypoint_res is not None:
image = draw_pose(image, keypoint_res, threshold)
return image
def draw_bbox(image, im_id, catid2name, bboxes, threshold):
"""
Draw bbox on image
"""
draw = ImageDraw.Draw(image)
catid2color = {}
color_list = colormap(rgb=True)[:40]
for dt in np.array(bboxes):
if im_id != dt['image_id']:
continue
catid, bbox, score = dt['category_id'], dt['bbox'], dt['score']
if score < threshold:
continue
if catid not in catid2color:
idx = np.random.randint(len(color_list))
catid2color[catid] = color_list[idx]
color = tuple(catid2color[catid])
# draw bbox
if len(bbox) == 4:
# draw bbox
xmin, ymin, w, h = bbox
xmax = xmin + w
ymax = ymin + h
draw.line(
[(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
(xmin, ymin)],
width=2,
fill=color)
elif len(bbox) == 8:
x1, y1, x2, y2, x3, y3, x4, y4 = bbox
draw.line(
[(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
width=2,
fill=color)
xmin = min(x1, x2, x3, x4)
ymin = min(y1, y2, y3, y4)
else:
logger.error('the shape of bbox must be [M, 4] or [M, 8]!')
# draw label
text = "{} {:.2f}".format(catid2name[catid], score)
tw, th = draw.textsize(text)
draw.rectangle(
[(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color)
draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
return image
def save_result(save_path, results, catid2name, threshold):
"""
save result as txt
"""
img_id = int(results["im_id"])
with open(save_path, 'w') as f:
if "bbox_res" in results:
for dt in results["bbox_res"]:
catid, bbox, score = dt['category_id'], dt['bbox'], dt['score']
if score < threshold:
continue
# each bbox result as a line
# for rbox: classname score x1 y1 x2 y2 x3 y3 x4 y4
# for bbox: classname score x1 y1 w h
bbox_pred = '{} {} '.format(catid2name[catid],
score) + ' '.join(
[str(e) for e in bbox])
f.write(bbox_pred + '\n')
elif "keypoint_res" in results:
for dt in results["keypoint_res"]:
kpts = dt['keypoints']
scores = dt['score']
keypoint_pred = [img_id, scores, kpts]
print(keypoint_pred, file=f)
else:
print("No valid results found, skip txt save")
def draw_pose(image,
results,
visual_thread=0.6,
save_name='pose.jpg',
save_dir='output',
returnimg=False,
ids=None):
try:
import matplotlib.pyplot as plt
import matplotlib
plt.switch_backend('agg')
except Exception as e:
logger.error('Matplotlib not found, please install matplotlib.'
'for example: `pip install matplotlib`.')
raise e
skeletons = np.array([item['keypoints'] for item in results])
kpt_nums = 17
if len(skeletons) > 0:
kpt_nums = int(skeletons.shape[1] / 3)
skeletons = skeletons.reshape(-1, kpt_nums, 3)
if kpt_nums == 17: #plot coco keypoint
EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7),
(6, 8), (7, 9), (8, 10), (5, 11), (6, 12), (11, 13), (12, 14),
(13, 15), (14, 16), (11, 12)]
else: #plot mpii keypoint
EDGES = [(0, 1), (1, 2), (3, 4), (4, 5), (2, 6), (3, 6), (6, 7),
(7, 8), (8, 9), (10, 11), (11, 12), (13, 14), (14, 15),
(8, 12), (8, 13)]
NUM_EDGES = len(EDGES)
colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
[0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
[170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
cmap = matplotlib.cm.get_cmap('hsv')
plt.figure()
img = np.array(image).astype('float32')
color_set = results['colors'] if 'colors' in results else None
if 'bbox' in results and ids is None:
bboxs = results['bbox']
for j, rect in enumerate(bboxs):
xmin, ymin, xmax, ymax = rect
color = colors[0] if color_set is None else colors[color_set[j] %
len(colors)]
cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1)
canvas = img.copy()
for i in range(kpt_nums):
for j in range(len(skeletons)):
if skeletons[j][i, 2] < visual_thread:
continue
if ids is None:
color = colors[i] if color_set is None else colors[color_set[j]
%
len(colors)]
else:
color = get_color(ids[j])
cv2.circle(
canvas,
tuple(skeletons[j][i, 0:2].astype('int32')),
2,
color,
thickness=-1)
to_plot = cv2.addWeighted(img, 0.3, canvas, 0.7, 0)
fig = matplotlib.pyplot.gcf()
stickwidth = 2
for i in range(NUM_EDGES):
for j in range(len(skeletons)):
edge = EDGES[i]
if skeletons[j][edge[0], 2] < visual_thread or skeletons[j][edge[
1], 2] < visual_thread:
continue
cur_canvas = canvas.copy()
X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]]
Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]]
mX = np.mean(X)
mY = np.mean(Y)
length = ((X[0] - X[1])**2 + (Y[0] - Y[1])**2)**0.5
angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
polygon = cv2.ellipse2Poly((int(mY), int(mX)),
(int(length / 2), stickwidth),
int(angle), 0, 360, 1)
if ids is None:
color = colors[i] if color_set is None else colors[color_set[j]
%
len(colors)]
else:
color = get_color(ids[j])
cv2.fillConvexPoly(cur_canvas, polygon, color)
canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0)
image = Image.fromarray(canvas.astype('uint8'))
plt.close()
return image
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
import importlib
import os
import sys
import yaml
import collections
try:
collectionsAbc = collections.abc
except AttributeError:
collectionsAbc = collections
from .config.schema import SchemaDict, SharedConfig, extract_schema
from .config.yaml_helpers import serializable
__all__ = [
'global_config',
'load_config',
'merge_config',
'get_registered_modules',
'create',
'register',
'serializable',
'dump_value',
]
def dump_value(value):
# XXX this is hackish, but collections.abc is not available in python 2
if hasattr(value, '__dict__') or isinstance(value, (dict, tuple, list)):
value = yaml.dump(value, default_flow_style=True)
value = value.replace('\n', '')
value = value.replace('...', '')
return "'{}'".format(value)
else:
# primitive types
return str(value)
class AttrDict(dict):
"""Single level attribute dict, NOT recursive"""
def __init__(self, **kwargs):
super(AttrDict, self).__init__()
super(AttrDict, self).update(kwargs)
def __getattr__(self, key):
if key in self:
return self[key]
raise AttributeError("object has no attribute '{}'".format(key))
global_config = AttrDict()
def load_config(file_path):
"""
Load config from file.
Args:
file_path (str): Path of the config file to be loaded.
Returns: global config
"""
_, ext = os.path.splitext(file_path)
assert ext in ['.yml', '.yaml'], "only support yaml files for now"
# load config from file and merge into global config
with open(file_path) as f:
cfg = yaml.load(f, Loader=yaml.Loader)
cfg['filename'] = os.path.splitext(os.path.split(file_path)[-1])[0]
merge_config(cfg)
return global_config
def dict_merge(dct, merge_dct):
""" Recursive dict merge. Inspired by :meth:``dict.update()``, instead of
updating only top-level keys, dict_merge recurses down into dicts nested
to an arbitrary depth, updating keys. The ``merge_dct`` is merged into
``dct``.
Args:
dct: dict onto which the merge is executed
merge_dct: dct merged into dct
Returns: dct
"""
for k, v in merge_dct.items():
if (k in dct and isinstance(dct[k], dict) and
isinstance(merge_dct[k], collectionsAbc.Mapping)):
dict_merge(dct[k], merge_dct[k])
else:
dct[k] = merge_dct[k]
return dct
def merge_config(config, another_cfg=None):
"""
Merge config into global config or another_cfg.
Args:
config (dict): Config to be merged.
Returns: global config
"""
global global_config
dct = another_cfg or global_config
return dict_merge(dct, config)
def get_registered_modules():
return {
k: v
for k, v in global_config.items() if isinstance(v, SchemaDict)
}
def make_partial(cls):
op_module = importlib.import_module(cls.__op__.__module__)
op = getattr(op_module, cls.__op__.__name__)
cls.__category__ = getattr(cls, '__category__', None) or 'op'
def partial_apply(self, *args, **kwargs):
kwargs_ = self.__dict__.copy()
kwargs_.update(kwargs)
return op(*args, **kwargs_)
if getattr(cls, '__append_doc__', True): # XXX should default to True?
if sys.version_info[0] > 2:
cls.__doc__ = "Wrapper for `{}` OP".format(op.__name__)
cls.__init__.__doc__ = op.__doc__
cls.__call__ = partial_apply
cls.__call__.__doc__ = op.__doc__
else:
# XXX work around for python 2
partial_apply.__doc__ = op.__doc__
cls.__call__ = partial_apply
return cls
def register(cls):
"""
Register a given module class.
Args:
cls (type): Module class to be registered.
Returns: cls
"""
if cls.__name__ in global_config:
raise ValueError("Module class already registered: {}".format(
cls.__name__))
if hasattr(cls, '__op__'):
cls = make_partial(cls)
global_config[cls.__name__] = extract_schema(cls)
return cls
def create(cls_or_name, **kwargs):
"""
Create an instance of given module class.
Args:
cls_or_name (type or str): Class of which to create instance.
Returns: instance of type `cls_or_name`
"""
assert type(cls_or_name) in [type, str
], "should be a class or name of a class"
name = type(cls_or_name) == str and cls_or_name or cls_or_name.__name__
assert name in global_config and \
isinstance(global_config[name], SchemaDict), \
"the module {} is not registered".format(name)
config = global_config[name]
cls = getattr(config.pymodule, name)
cls_kwargs = {}
cls_kwargs.update(global_config[name])
# parse `shared` annoation of registered modules
if getattr(config, 'shared', None):
for k in config.shared:
target_key = config[k]
shared_conf = config.schema[k].default
assert isinstance(shared_conf, SharedConfig)
if target_key is not None and not isinstance(target_key,
SharedConfig):
continue # value is given for the module
elif shared_conf.key in global_config:
# `key` is present in config
cls_kwargs[k] = global_config[shared_conf.key]
else:
cls_kwargs[k] = shared_conf.default_value
# parse `inject` annoation of registered modules
if getattr(cls, 'from_config', None):
cls_kwargs.update(cls.from_config(config, **kwargs))
if getattr(config, 'inject', None):
for k in config.inject:
target_key = config[k]
# optional dependency
if target_key is None:
continue
if isinstance(target_key, dict) or hasattr(target_key, '__dict__'):
if 'name' not in target_key.keys():
continue
inject_name = str(target_key['name'])
if inject_name not in global_config:
raise ValueError(
"Missing injection name {} and check it's name in cfg file".
format(k))
target = global_config[inject_name]
for i, v in target_key.items():
if i == 'name':
continue
target[i] = v
if isinstance(target, SchemaDict):
cls_kwargs[k] = create(inject_name)
elif isinstance(target_key, str):
if target_key not in global_config:
raise ValueError("Missing injection config:", target_key)
target = global_config[target_key]
if isinstance(target, SchemaDict):
cls_kwargs[k] = create(target_key)
elif hasattr(target, '__dict__'): # serialized object
cls_kwargs[k] = target
else:
raise ValueError("Unsupported injection type:", target_key)
# prevent modification of global config values of reference types
# (e.g., list, dict) from within the created module instances
#kwargs = copy.deepcopy(kwargs)
return cls(**cls_kwargs)
tqdm
typeguard ; python_version >= '3.4'
visualdl>=2.1.0 ; python_version <= '3.7'
opencv-python
PyYAML
shapely
scipy
terminaltables
Cython
pycocotools
#xtcocotools==1.6 #only for crowdpose
setuptools>=42.0.0
lap
sklearn
motmetrics
openpyxl
cython_bbox
\ No newline at end of file
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
# add python path of PadleDetection to sys.path
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
sys.path.insert(0, parent_path)
# ignore warning log
import warnings
warnings.filterwarnings('ignore')
import paddle
from lib.utils.workspace import load_config, merge_config
from lib.utils.check import check_gpu, check_version, check_config
from lib.slim import build_slim_model
from lib.utils.cli import ArgsParser
from lib.core.trainer import Trainer
from lib.utils.env import init_parallel_env
from lib.metrics.coco_utils import json_eval_results
from lib.utils.logger import setup_logger
logger = setup_logger('eval')
def parse_args():
parser = ArgsParser()
parser.add_argument(
"--output_eval",
default=None,
type=str,
help="Evaluation directory, default is current directory.")
args = parser.parse_args()
return args
def run(FLAGS, cfg):
# init parallel environment if nranks > 1
init_parallel_env()
# build trainer
trainer = Trainer(cfg, mode='eval')
# load weights
trainer.load_weights(cfg.weights)
# training
trainer.evaluate()
def main():
FLAGS = parse_args()
cfg = load_config(FLAGS.config)
cfg['output_eval'] = FLAGS.output_eval
merge_config(FLAGS.opt)
if cfg.use_gpu:
paddle.set_device('gpu')
else:
paddle.set_device('cpu')
if 'slim' in cfg:
cfg = build_slim_model(cfg, mode='eval')
check_config(cfg)
check_gpu(cfg.use_gpu)
check_version()
run(FLAGS, cfg)
if __name__ == '__main__':
main()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
# add python path of PadleDetection to sys.path
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
sys.path.insert(0, parent_path)
# ignore warning log
import warnings
warnings.filterwarnings('ignore')
import glob
import paddle
from lib.utils.workspace import load_config, merge_config
from lib.slim import build_slim_model
from lib.core.trainer import Trainer
from lib.utils.check import check_gpu, check_version, check_config
from lib.utils.cli import ArgsParser
from lib.utils.logger import setup_logger
logger = setup_logger('train')
def parse_args():
parser = ArgsParser()
parser.add_argument(
"--infer_dir",
type=str,
default=None,
help="Directory for images to perform inference on.")
parser.add_argument(
"--infer_img",
type=str,
default=None,
help="Image path, has higher priority over --infer_dir")
parser.add_argument(
"--output_dir",
type=str,
default="output",
help="Directory for storing the output visualization files.")
parser.add_argument(
"--draw_threshold",
type=float,
default=0.5,
help="Threshold to reserve the result for visualization.")
parser.add_argument(
"--use_vdl",
type=bool,
default=False,
help="Whether to record the data to VisualDL.")
parser.add_argument(
'--vdl_log_dir',
type=str,
default="vdl_log_dir/image",
help='VisualDL logging directory for image.')
parser.add_argument(
"--save_txt",
type=bool,
default=False,
help="Whether to save inference result in txt.")
args = parser.parse_args()
return args
def get_test_images(infer_dir, infer_img):
"""
Get image path list in TEST mode
"""
assert infer_img is not None or infer_dir is not None, \
"--infer_img or --infer_dir should be set"
assert infer_img is None or os.path.isfile(infer_img), \
"{} is not a file".format(infer_img)
assert infer_dir is None or os.path.isdir(infer_dir), \
"{} is not a directory".format(infer_dir)
# infer_img has a higher priority
if infer_img and os.path.isfile(infer_img):
return [infer_img]
images = set()
infer_dir = os.path.abspath(infer_dir)
assert os.path.isdir(infer_dir), \
"infer_dir {} is not a directory".format(infer_dir)
exts = ['jpg', 'jpeg', 'png', 'bmp']
exts += [ext.upper() for ext in exts]
for ext in exts:
images.update(glob.glob('{}/*.{}'.format(infer_dir, ext)))
images = list(images)
assert len(images) > 0, "no image found in {}".format(infer_dir)
logger.info("Found {} inference images in total.".format(len(images)))
return images
def run(FLAGS, cfg):
# build trainer
trainer = Trainer(cfg, mode='test')
# load weights
trainer.load_weights(cfg.weights)
# get inference images
images = get_test_images(FLAGS.infer_dir, FLAGS.infer_img)
# inference
trainer.predict(
images,
draw_threshold=FLAGS.draw_threshold,
output_dir=FLAGS.output_dir,
save_txt=FLAGS.save_txt)
def main():
FLAGS = parse_args()
cfg = load_config(FLAGS.config)
cfg['use_vdl'] = FLAGS.use_vdl
cfg['vdl_log_dir'] = FLAGS.vdl_log_dir
merge_config(FLAGS.opt)
if cfg.use_gpu:
paddle.set_device('gpu')
else:
paddle.set_device('cpu')
if 'slim' in cfg:
cfg = build_slim_model(cfg, mode='eval')
check_config(cfg)
check_gpu(cfg.use_gpu)
check_version()
run(FLAGS, cfg)
if __name__ == '__main__':
main()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
# add python path of PadleDetection to sys.path
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
sys.path.insert(0, parent_path)
# ignore warning log
import warnings
warnings.filterwarnings('ignore')
import copy
import paddle
from lib.utils.workspace import load_config
from lib.utils.env import init_parallel_env, set_random_seed, init_fleet_env
from lib.utils.logger import setup_logger
from lib.utils.checkpoint import load_pretrain_weight
from lib.utils.cli import ArgsParser
from lib.utils.check import check_config, check_gpu, check_version
from lib.core.trainer import Trainer
from lib.utils.workspace import create
from lib.models.loss import DistMSELoss
from lib.slim import build_slim_model
logger = setup_logger('train')
def build_teacher_model(config):
model = create(config.architecture)
if config.get('pretrain_weights', None):
load_pretrain_weight(model, config.pretrain_weights)
logger.debug("Load weights {} to start training".format(
config.pretrain_weights))
if config.get('weights', None):
load_pretrain_weight(model, config.weights)
logger.debug("Load weights {} to start training".format(
config.weights))
if config.get("freeze_parameters", True):
for param in model.parameters():
param.trainable = False
model.train()
return model
def build_distill_loss(config):
loss_config = copy.deepcopy(config["distill_loss"])
name = loss_config.pop("name")
dist_loss_class = eval(name)(**loss_config)
return dist_loss_class
def parse_args():
parser = ArgsParser()
parser.add_argument(
"--eval",
action='store_true',
default=False,
help="Whether to perform evaluation in train")
parser.add_argument(
"--distill_config",
default=None,
type=str,
help="Configuration file of model distillation.")
parser.add_argument(
"--enable_ce",
type=bool,
default=False,
help="If set True, enable continuous evaluation job."
"This flag is only used for internal test.")
parser.add_argument(
"--use_vdl",
type=bool,
default=False,
help="whether to record the data to VisualDL.")
parser.add_argument(
'--vdl_log_dir',
type=str,
default="vdl_log_dir/scalar",
help='VisualDL logging directory for scalar.')
args = parser.parse_args()
return args
def run(FLAGS, cfg):
init_parallel_env()
if FLAGS.enable_ce:
set_random_seed(0)
# build trainer
trainer = Trainer(cfg, mode='train')
# load weights
if 'pretrain_weights' in cfg and cfg.pretrain_weights:
trainer.load_weights(cfg.pretrain_weights)
# init config
if FLAGS.distill_config is not None:
distill_config = load_config(FLAGS.distill_config)
trainer.distill_model = build_teacher_model(distill_config)
trainer.distill_loss = build_distill_loss(distill_config)
trainer.init_optimizer()
# training
trainer.train(FLAGS.eval)
def main():
FLAGS = parse_args()
cfg = load_config(FLAGS.config)
cfg['eval'] = FLAGS.eval
cfg['enable_ce'] = FLAGS.enable_ce
cfg['distill_config'] = FLAGS.distill_config
cfg['use_vdl'] = FLAGS.use_vdl
cfg['vdl_log_dir'] = FLAGS.vdl_log_dir
cfg['distill_config'] = FLAGS.distill_config
if cfg.use_gpu:
paddle.set_device('gpu')
else:
paddle.set_device('cpu')
if 'slim' in cfg:
cfg = build_slim_model(cfg)
check_config(cfg)
check_gpu(cfg.use_gpu)
check_version()
run(FLAGS, cfg)
if __name__ == "__main__":
main()
# 飞桨训推一体全流程(TIPC)开发文档 # 产业级SOTA模型优化指南
## 1. TIPC简介 ## 1. 背景
飞桨除了基本的模型训练和预测,还提供了支持多端多平台的高性能推理部署工具。飞桨训推一体全流程(Training and Inference Pipeline Criterion(TIPC))旨在建立模型从学术研究到产业落地的桥梁,方便模型更广泛的使用。 在基于深度学习的视觉任务中,许多论文中SOTA模型所需的计算量都很大,预测耗时很长。实际场景部署过程中,希望模型大小尽可能小,模型速度尽可能快,因此需要对模型进行轻量化,实现在模型预测速度明显加速的情况下,尽量减少模型精度损失的目标,最终打造产业级的SOTA模型。
模型优化包含模型速度与精度优化,既有针对不同任务的通用优化方案,也有与任务紧密相关的模型优化方案,结构框图如下所示。
<div align="center"> <div align="center">
<img src="images/tipc_guide.png" width="800"> <img src="images/lite_model_framework.png" width = "1000" />
</div> </div>
## 2. 不同环境不同训练推理方式的开发文档
## 2. 通用模型轻量化方法
- [Linux GPU/CPU 基础训练推理开发文档](./train_infer_python/README.md)
通用模型轻量化方法需要完成下面三个部分的内容,包括轻量化骨干网络、知识蒸馏、模型量化,最终将复现的模型打造为轻量化模型。
- 更多训练方式开发文档
- [Linux GPU 多机多卡训练推理开发文档](./train_fleet_infer_python/README.md) <div align="center">
- [Linux GPU 混合精度训练推理开发文档](./train_amp_infer_python/README.md) <img src="images/general_lite_model_pipeline.png" width = "300" />
</div>
- 更多部署方式开发文档
- [Linux GPU/CPU PYTHON 服务化部署开发文档](./serving_python/README.md) 更多内容请参考:[通用模型轻量化方案指南](general_lite_model_optimization.md)
- [Linux GPU/CPU C++ 服务化部署开发文档](./serving_cpp/README.md)
- [Linux GPU/CPU C++ 推理开发文档](./infer_cpp/README.md)
- [Paddle.js 部署开发文档](./paddlejs/README.md) ## 3. 模型定制优化
- [Paddle2ONNX 开发文档](./paddle2onnx/README.md)
- ARM CPU 部署开发文档 (coming soon) CV任务应用场景广泛,下面分方向介绍更多的模型优化策略。在对轻量化模型进行精度优化时,可以参考。
- OpenCL ARM GPU 部署开发文档 (coming soon)
- Metal ARM GPU 部署开发文档 (coming soon) ### 3.1 图像分类
- Jetson 部署开发文档 (coming soon)
- XPU 部署开发文档 (coming soon) * 论文[Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/abs/1812.01187)中,介绍了关于ResNet50系列模型的改进,包括`bottleneck结构改进``Cosine学习率策略``Mixup数据增广`,最终将ResNet系列模型在ImageNet1k数据集上的精度从76.5%提升至79.2%。更多实现细节可以参考[ResNet50_vd.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml)
- 更多训练环境开发文档 ### 3.2 目标检测
- Linux XPU2 基础训练推理开发文档 (coming soon)
- Linux DCU 基础训练推理开发文档 (coming soon) * PP-YOLOv2算法中,通过对数据、模型结构、训练策略进行优化,最终精度超过YOLOv5等SOTA模型。具体地,在数据增强方面,引入了Mixup、AutoAugment方法;在模型结构方面,引入了SPP、PAN特征融合模块以及Mish激活函数,在损失函数方面,引入了IOU aware loss;在后处理方面,引入Matrix NMS;训练过程中,使用EMA,对模型权重进行指数滑动平均,提升收敛效果。更多关于PP-YOLOv2的优化技巧可以参考:[检测模型优化思路:PP-YOLOv2](./det_ppyolov2_optimization.md)
- Linux NPU 基础训练推理开发文档 (coming soon)
- Windows GPU 基础训练推理开发文档 (coming soon) ### 3.3 图像分割
- macOS CPU 基础训练推理开发文档 (coming soon)
### 3.4 视频识别
### 3.5 文本检测与识别
### 3.6 对抗生成网络
# PP-YOLOv2优化技巧
目标检测是计算机视觉领域中最常用的任务之一,除了零售商品以及行人车辆的检测,也有大量在工业生产中的应用:例如生产质检领域、设备巡检领域、厂房安全检测等领域。疫情期间,目标检测技术还被用于人脸口罩的检测,新冠肺炎检测等。同时目标检测技术整体流程较为复杂,对于不同的任务需要进行相应调整,因此目标检测模型的不断更新迭代拥有巨大的实用价值。
百度飞桨于2021年推出了业界顶尖的目标检测模型[PP-YOLOv2](https://arxiv.org/abs/2104.10419),它是基于[YOLOv3](https://arxiv.org/abs/1804.02767)的优化模型,在尽可能不引入额外计算量的前提下提升模型精度。PP-YOLOv2(R50)在COCO 2017数据集mAP达到49.5%,在 640x640 的输入尺寸下,FPS 达到 68.9FPS,采用 TensorRT 加速,FPS 高达 106.5。PP-YOLOv2(R101)的mAP达到50.3%,对比当前最好的YOLOv5模型,相同的推理速度下,精度提升1.3%;相同精度下,推理速度加速15.9%。本章节重点围绕目标检测任务的优化技巧,并重点解读PP-YOLOv2模型的优化历程。
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/232ff7ff4918482399c5f9276dc18bb4151918c4552143ca976e46cf10935a9b" width='800'/>
</div>
## 1. 目标检测算法优化思路
算法优化总体可以分为如下三部分,首先需要对目标场景进行详细分析,并充分掌握模型需求,例如模型体积、精度速度要求等。有效的前期分析有助于制定清晰合理的算法优化目标,并指导接下来高效的算法调研和迭代实验,避免出现尝试大量优化方法但是无法满足模型最终要求的情况。具体来说,调研和实验可以应用到数据模块、模型结构、训练策略三大模块。这个方法普遍的适用于深度学习模型优化,下面重点以目标检测领域为例,详细展开以上三部分的优化思路。
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/25bfeaa9206e406aa0c406f0dcfabb85c20409b51f08401488f4007612d71fc6" width='800'/>
</div>
### 1.1 数据模块
数据模块可以说是深度学习领域最重要的一环,在产业应用中,数据往往都是自定义采集的,数据量相比开源数据集规模较小,因此高质量的标注数据和不断迭代是模型优化的一大利器。数据采集方面,少数精标数据的效果会优于大量粗标或者无标注数据,制定清晰明确的标注标准并覆盖尽可能全面的场景也是十分必要的。而在成本允许的情况下,数据是多多益善的。在学术研究中,通常是对固定的公开数据集进行迭代,此时数据模块的优化主要在数据增广方面,例如颜色、翻转、随机扩充、随机裁剪,以及近些年使用较多的[MixUp](https://paddlepedia.readthedocs.io/en/latest/tutorials/computer_vision/image_augmentation/ImageAugment.html#mixup), [AutoAugment](https://paddlepedia.readthedocs.io/en/latest/tutorials/computer_vision/image_augmentation/ImageAugment.html#autoaugment), Mosaic等方法。可以将不同的数据增广方法组合以提升模型泛化能力,需要注意的是,过多的数据增广可能会使模型学习能力降低,也使得数据加载模块耗时过长导致训练迭代效率降低;另外在目标检测任务中,部分数据增广方法可能会影响真实标注框的坐标位置,需要做出相应的调整。
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/d153ee4cc94b4a4f8f2ebf86b76d18ab3b60ec6855ae4c4da2db66da1c4a79c5" width='800'/>
<center><br>AutoAugment数据增广</br></center>
</div>
### 1.2 模型结构
模型结构方面存在一系列通用的优化方案,例如损失函数优化和特征提取优化,focal loss,IoU loss等损失函数优化能够在不影响推理速度的同时提升模型精度;[SPP](https://arxiv.org/abs/1406.4729)能够在几乎不增加预测耗时的情况下加强模型多尺度特征。
此外还需要在清晰的优化目标的基础上,作出针对性的优化。对于云端和边缘部署的场景,模型结构的设计选择不同,如下表所示。
| 场景 | 特点 | 模型结构建议 |
|:---------:|:------------------:|:------------:|
| 云端场景 | 算力充足,保证效果 | 倾向于使用ResNet系列骨干网络,并引入少量的可变形卷积实现引入少量计算量的同时提升模型特征提取能力。|
| 边缘端部署场景 | 算力和功耗相对云端较低,内存较小 | 倾向于使用[MobileNet](https://arxiv.org/abs/1704.04861)系列轻量级骨干网络,同时将模型中较为耗时的卷积替换为[深度可分离卷积](https://paddlepedia.readthedocs.io/en/latest/tutorials/CNN/convolution_operator/Separable_Convolution.html?highlight=%E6%B7%B1%E5%BA%A6%E5%8F%AF%E5%88%86%E7%A6%BB%E5%8D%B7%E7%A7%AF#id4),将[反卷积](https://paddlepedia.readthedocs.io/en/latest/tutorials/CNN/convolution_operator/Transpose_Convolution.html?highlight=%E5%8F%8D%E5%8D%B7%E7%A7%AF)替换为插值的方式。 |
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/86c783f67ecf439e9eab67f145bc7321b2828f68ef9f44efaff0e6be25a9985f" width='800'/>
</div>
除此之外,还有其他落地难点问题需要在模型结构中作出相应的调整及优化,例如小目标问题,可以选择大感受野的[HRNet](https://arxiv.org/pdf/1904.04514.pdf), [DLA](https://arxiv.org/pdf/1707.06484.pdf)等网络结构作为骨干网络;并将普通卷积替换为[空洞卷积](https://paddlepedia.readthedocs.io/en/latest/tutorials/CNN/convolution_operator/Dilated_Convolution.html?highlight=%E7%A9%BA%E6%B4%9E%E5%8D%B7%E7%A7%AF)。对于数据长尾问题,可以在损失函数部分抑制样本较多的类别,提升样本较少类别的重要性。
### 1.3 训练策略
在模型迭代优化的过程中,会引入不同的优化模块,可能导致模型训练不稳定,为此需要改进训练策略,加强模型训练稳定性,同时提升模型收敛效果。所有训练策略的调整并不会对预测速度造成损失。例如调整优化器、学习率等训练参数,Synchronized batch normalization(卡间同步批归一化)能够扩充batch信息,使得网络获取更多输入信息,EMA(Exponential Moving Average)通过滑动平均的方式更新参数,避免异常值对参数的影响。在实际应用场景中,由于数据量有限,可以使用预先在COCO数据集上训练好的模型进行迁移学习,能够大幅提升模型精度。该模块中的优化策略也能够通用的提升不同计算机视觉任务模型效果。
以上优化技巧集成了多种算法结构及优化模块,需要大量的代码开发,而且不同优化技巧之间需要相互组合,对代码的模块化设计有较大挑战,接下来介绍飞桨推出的一套端到端目标检测开发套件PaddleDetection。
下面结合代码具体讲解如何使用PaddleDetection将YOLOv3模型一步步优化成为业界SOTA的PP-YOLOv2模型
## 2. PP-YOLO优化及代码实践
YOLO系列模型一直以其高性价比保持着很高的使用率和关注度,近年来关于YOLO系列模型上的优化和拓展的研究越来越多,其中有[YOLOv4](https://arxiv.org/abs/2004.10934),YOLOv5,旷视发布的[YOLOX](https://arxiv.org/abs/2107.08430)系列模型,它们整合了计算机视觉的state-of-the-art技巧,大幅提升了YOLO目标检测性能。百度飞桨通过自研的目标检测框架PaddleDetection,对YOLOv3进行细致优化,在尽可能不引入额外计算量的前提下提升模型精度,今年推出了高精度低时延的PP-YOLOv2模型。下面分别从数据增广、骨干网络、Neck&head结构、损失函数、后处理优化、训练策略几个维度详细展开。
### 2.1 数据增广
PP-YOLOv2中采用了大量数据增广方式,这里逐一进行说明
#### 2.1.1 MixUp
[MixUp](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L1574)以随机权重对图片和标签进行线性插值,在目标检测任务中标签向量gt_bbox,gt_class,is_crowd等直接连接,gt_score进行加权求和。Mixup可以提高网络在空间上的抗干扰能力,线性插值的权重满足Beta分布,表达式如下:
$$
\widetilde x = \lambda x_i + (1 - \lambda)x_j,\\
\widetilde y = \lambda y_i + (1 - \lambda)y_j \\
\lambda\in[0,1]
$$
以下图为例,将任意两张图片加权叠加作为输入。
<div align=center>
<img src="https://raw.githubusercontent.com/mls1999725/pictures/master/Mixup.png" alt="Mixup" style="zoom: 80%;"/>
</div>
#### 2.1.2 RandomDistort
[RandomDistort](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L329)操作以一定的概率对图像进行随机像素内容变换,包括色相(hue),饱和度(saturation),对比度(contrast),明亮度(brightness)。
#### 2.1.3 RandomExpand
[随机扩展](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L875)(RandomExpand)图像的操作步骤如下:
- 随机选取扩张比例(扩张比例大于1时才进行扩张)。
- 计算扩张后图像大小。
- 初始化像素值为输入填充值的图像,并将原图像随机粘贴于该图像上。
- 根据原图像粘贴位置换算出扩张后真实标注框的位置坐标。
#### 2.1.4 RandomCrop
[随机裁剪](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L1182)(RandomCrop)图像的操作步骤如下:
- 若allow_no_crop为True,则在thresholds加入’no_crop’。
- 随机打乱thresholds。
- 遍历thresholds中各元素: (1) 如果当前thresh为’no_crop’,则返回原始图像和标注信息。 (2) 随机取出aspect_ratio和scaling中的值并由此计算出候选裁剪区域的高、宽、起始点。 (3) 计算真实标注框与候选裁剪区域IoU,若全部真实标注框的IoU都小于thresh,则继续第(3)步。 (4) 如果cover_all_box为True且存在真实标注框的IoU小于thresh,则继续第(3)步。 (5) 筛选出位于候选裁剪区域内的真实标注框,若有效框的个数为0,则继续第(3)步,否则进行第(4)步。
- 换算有效真值标注框相对候选裁剪区域的位置坐标。
#### 2.1.5 RandomFlip
[随机翻转](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/data/transform/operators.py#L487)(RandomFlip)操作利用随机值决定是否对图像,真实标注框位置进行翻转。
以上数据增广方式均在[ppyolov2_reader.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_reader.yml#L5)进行配置
### 2.2 骨干网络
不同于YOLOv3的DarkNet53骨干网络,PP-YOLOv2使用更加优异的ResNet50vd-DCN作为模型的骨干网络。它可以被分为ResNet50vd和DCN两部分来看。ResNet50vd是指拥有50个卷积层的ResNet-D网络。ResNet结构如下图所示:
<div align=center>
<img src="https://raw.githubusercontent.com/mls1999725/pictures/master/ResNet-A.png" alt="ResNet-A" style="zoom: 50%;"/>
</div>
ResNet系列模型在2015年提出后,其模型结构不断被业界开发者持续改进,在经过了B、C、D三个版本的改进后,最新的ResNetvd结构能在基本不增加计算量的情况下显著提高模型精度。ResNetvd的第一个卷积层由三个卷积构成,卷积核尺寸均是3x3,步长分别为2,1,1,取代了上图的7x7卷积,在参数量基本不变的情况下增加网络深度。同时,ResNet-D在ResNet-B的基础上,在下采样模块加入了步长为2的2x2平均池化层,并将之后的卷积步长修改为1,避免了输入信息被忽略的情况。B、C、D三种结构的演化如下图所示:
<div align=center>
<img src="https://raw.githubusercontent.com/mls1999725/pictures/master/resnet结构.png" alt="resnet结构" style="zoom: 50%;"/>
</div>
ResNetvd下采样模块代码参考实现:[代码链接](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/backbones/resnet.py#L265)
ResNetvd使用方式参考[ResNetvd配置](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L13)
```
ResNet:
depth: 50 # ResNet网络深度
variant: d # ResNet变种结构,d即表示ResNetvd
return_idx: [1, 2, 3] # 骨干网络引出feature map层级
dcn_v2_stages: [3] # 引入可变形卷积层级
freeze_at: -1 # 不更新参数的层级
freeze_norm: false # 是否不更新归一化层
norm_decay: 0. # 归一化层对应的正则化系数
```
经多次实验发现,使用ResNet50vd结构作为骨干网络,相比于原始的ResNet,可以提高1%-2%的目标检测精度,且推理速度基本保持不变。而DCN(Deformable Convolution)可变形卷积的特点在于:其卷积核在每一个元素上额外增加了一个可学习的偏移参数。这样的卷积核在学习过程中可以调整卷积的感受野,从而能够更好的提取图像特征,以达到提升目标检测精度的目的。但它会在一定程度上引入额外的计算开销。经过多翻尝试,发现只在ResNet的最后一个stage增加可变形卷积,是实现引入极少计算量并提升模型精度的最佳策略。
可变形卷积的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/layers.py#L41)如下:
```python
from paddle.vision.ops import DeformConv2D
class DeformableConvV2(nn.Layer):
def __init__(self,
in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=1,
weight_attr=None,
bias_attr=None,
regularizer=None,
skip_quant=False,
dcn_bias_regularizer=L2Decay(0.),
dcn_bias_lr_scale=2.):
super().__init__()
self.offset_channel = 2 * kernel_size**2
self.mask_channel = kernel_size**2
offset_bias_attr = ParamAttr(
initializer=Constant(0.),
learning_rate=lr_scale,
regularizer=regularizer)
self.conv_offset = nn.Conv2D(
in_channels,
3 * kernel_size**2,
kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2,
weight_attr=ParamAttr(initializer=Constant(0.0)),
bias_attr=offset_bias_attr)
if bias_attr:
# in FCOS-DCN head, specifically need learning_rate and regularizer
dcn_bias_attr = ParamAttr(
initializer=Constant(value=0),
regularizer=dcn_bias_regularizer,
learning_rate=dcn_bias_lr_scale)
else:
# in ResNet backbone, do not need bias
dcn_bias_attr = False
self.conv_dcn = DeformConv2D(
in_channels,
out_channels,
kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 * dilation,
dilation=dilation,
groups=groups,
weight_attr=weight_attr,
bias_attr=dcn_bias_attr)
def forward(self, x):
offset_mask = self.conv_offset(x)
offset, mask = paddle.split(
offset_mask,
num_or_sections=[self.offset_channel, self.mask_channel],
axis=1)
mask = F.sigmoid(mask)
y = self.conv_dcn(x, offset, mask=mask)
return y
```
### 2.3 Neck&head结构
PP-YOLOv2模型中使用PAN和SPP结构来强化模型结构的Neck部分。[PAN(Path Aggregation Network)](https://arxiv.org/abs/1803.01534)结构,作为[FPN](https://arxiv.org/abs/1612.03144)的变形之一,通过从上至下和从下到上两条路径来聚合特征信息,达到更好的特征提取效果。具体结构如下图,其中C3, C4, C5为3个不同level的feature,分别对应stride为(8, 16, 32);其中Detection Block使用CSP connection方式,对应ppdet的[PPYOLODetBlockCSP模块](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/necks/yolo_fpn.py#L359)
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/eeae465462484a6a9797f779434ef721cb9882eb10374b34be43956360691521" width='600'/>
</div>
SPP在[Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/pdf/1406.4729.pdf)中提出,可以通过多个不同尺度的池化窗口提取不同尺度的池化特征,然后把特征组合在一起作为输出特征,能有效的增加特征的感受野,是一种广泛应用的特征提取优化方法。PPYOLO-v2中使用三个池化窗口分别是(5, 9, 13),得到特征通过concat拼接到一起,最后跟一个卷积操作,详见[SPP模快](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/necks/yolo_fpn.py#L114)。SPP会插入到PAN第一组计算的[中间位置](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/necks/yolo_fpn.py#L903)
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/439a29465efc4867ac4edc70d17d3ac9aa124d719b364170b08422a685045745" width='600'/>
</div>
除此之外,PP-YOLOv2 Neck部分引入了[Mish](https://arxiv.org/pdf/1908.08681.pdf)激活函数,公式如下:
$$
mish(x) = x \ast tanh(ln(1+e^x))
$$
Mish的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/ops.py#L43)如下所示:
```python
def mish(x):
return x * paddle.tanh(F.softplus(x))
```
PP-YOLOv2中PAN模块使用方式参考 [neck: PPYOLOPAN](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L9)
```
PPYOLOPAN:
act: "mish" # 默认使用mish函数
conv_block_num: 2 # 每个pan block中使用的conv block个数
drop_block: true # 是否采用drop block, 训练策略模块中介绍
block_size: 3 # DropBlock的size
keep_prob: 0.9 # DropBlock保留的概率
spp: true # 是否使用spp
```
PP-YOLOv2的Head部分在PAN输出的3个scale的feature上进行预测,PP-YOLOv2采用和[YOLO-v3](https://pjreddie.com/media/files/papers/YOLOv3.pdf)类似的结构,即使用卷积对最后的feature进行编码,最后输出的feature是四维的tensor,分别是[n, c, h, w]对应图像数量、通道数、高和宽。c是具体的形式为anchor_num ∗ (4 + 1 + 1 + num_classs),anchor_num是每个位置对应的anchor的数量(PP-YOLOv2中为3),4代表bbox的属性(对应中心点和长宽),1代表是否是物体(objectness), 1代表iou_aware(详细见损失函数计算), num_classs代表类别数量(coco数据集上为80).
使用方式参考[yolo_head: YOLOv3Head](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L28)
```
YOLOv3Head:
# anchors包含9种, 根据anchor_masks的index分为3组,分别对应到不同的scale
# [6, 7, 8]对应到stride为32的预测特征
# [3, 4, 5]对应到stride为16的预测特征
# [0, 1, 2]对应到stride为8的预测特征
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
loss: YOLOv3Loss # 采用损失函数类型,详细见损失函数模块
iou_aware: true # 是否使用iou_aware
iou_aware_factor: 0.5 # iou_aware的系数
```
### 2.4 损失函数
PP-YOLOv2使用IoU Loss和IoU Aware Loss提升定位精度。IoU Loss直接优化预测框与真实框的IoU,提升了预测框的质量。IoU Aware Loss则用于监督模型学习预测框与真实框的IoU,学习到的IoU将作为定位置信度参与到NMS的计算当中。
对于目标检测任务,IoU是我们常用评估指标。预测框与真实框的IoU越大,预测框与真实框越接近,预测框的质量越高。基于“所见即所得”的思想,PP-YOLOv2使用IoU Loss直接去优化模型的预测框与真实框的IoU。IoU Loss的表达式如下:
$$
L_{iou}=1 - iou^2
$$
IoU Loss的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/losses/iou_loss.py#L56)如下所示:
```python
iou = bbox_iou(
pbox, gbox, giou=self.giou, diou=self.diou, ciou=self.ciou)
if self.loss_square:
loss_iou = 1 - iou * iou
else:
loss_iou = 1 - iou
loss_iou = loss_iou * self.loss_weight
```
PP-YOLOv2增加了一个通道用于学习预测框与真实框的IoU,并使用IoU Aware Loss来监督这一过程。在推理过程中,将这个通道学习的IoU预测值也作为评分的因子之一,能一定程度上避免高IoU预测框被挤掉的情况,从而提升模型的精度。IoU Aware Loss为二分类交叉熵损失函数,其表达式如下:
$$
L_{iou\_aware} = -(iou * log(ioup) + (1 - iou) * log(1 - ioup))
$$
IoU Aware Loss的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/losses/iou_aware_loss.py#L41)如下:
```python
iou = bbox_iou(
pbox, gbox, giou=self.giou, diou=self.diou, ciou=self.ciou)
iou.stop_gradient = True
loss_iou_aware = F.binary_cross_entropy_with_logits(
ioup, iou, reduction='none')
loss_iou_aware = loss_iou_aware * self.loss_weight
```
### 2.5 后处理优化
在后处理的过程中,PP-YOLOv2采用了Matrix NMS和Grid Sensitive。Matrix NMS为并行化的计算[Soft NMS](https://paddlepedia.readthedocs.io/en/latest/tutorials/computer_vision/object_detection/SoftNMS.html?highlight=Soft%20NMS)的算法,Grid Sensitive解决了检测框的中心落到网格边线的情况。
Grid Sensitive是YOLOv4引入的优化方法,如下图所示,YOLO系列模型中使用sigmoid函数来预测中心点相对于grid左上角点的偏移量。
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/2c34ba7dc29b41feb09455c71ffe444f0d06c733b7384ec7bf8f56357fd6375c", width='400'/>
</div>
然而,当中心点位于grid的边线上时,使用sigmoid函数较难预测。因此,对于预测值加上一个缩放和偏移,保证预测框中心点能够有效的拟合真实框刚好落在网格边线上的情况。Grid Sensitive的表达式如下:
$$
x = scale * \sigma(x) - 0.5 * (scale - 1.) \\
y = scale * \sigma(y) - 0.5 * (scale - 1.)
$$
Matrix NMS通过一个矩阵并行运算的方式计算出任意两个框之间的IoU,从而实现并行化的计算Soft NMS,在提升检测精度的同时,避免了推理速度的下降。Matrix NMS的实现在PaddlePaddle框架的[Matrix NMS OP](https://github.com/PaddlePaddle/Paddle/blob/release/2.1/paddle/fluid/operators/detection/matrix_nms_op.cc#L169)中,在PaddleDetection中封装了[Matrix NMS API](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/layers.py#L426)
使用方式参考:[post process: MatrixNMS](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L59)
```
nms:
name: MatrixNMS # NMS类型,支持MultiClass NMS和Matrix NMS
keep_top_k: 100 # NMS输出框的最大个数
score_threshold: 0.01 # NMS计算前的分数阈值
post_threshold: 0.01 # NMS计算后的分数阈值
nms_top_k: -1 # NMS计算前,分数过滤后保留的最大个数
background_label: -1 # 背景类别
```
### 2.6 训练策略
在训练过程中,PP-YOLOv2使用了Synchronize batch normalization, EMA(Exponential Moving Average,指数滑动平均)和DropBlock和来提升模型的收敛效果以及泛化性能。
BN(Batch Normalization, 批归一化)是训练卷积神经网络时常用的归一化方法,能起到加快模型收敛,防止梯度弥散的效果。在BN的计算过程中,需要统计样本的均值和方差,通常batch size越大,统计得到的均值和方差越准确。在多卡训练时,样本被等分送入每张卡,如果使用BN进行归一化,每张卡会利用自身的样本分别计算一个均值和方差进行批处理化,而SyncBN会同步所有卡的样本信息统一计算一个均值和方差,每张卡利用这个均值和方差进行批处理化。因此,使用SyncBN替代BN,能够使计算得到的均值和方差更加准确,从而提升模型的性能。SyncBN的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/backbones/resnet.py#L104)如下所示:
```python
if norm_type == 'sync_bn':
self.norm = nn.SyncBatchNorm(
ch_out, weight_attr=param_attr, bias_attr=bias_attr)
else:
self.norm = nn.BatchNorm(
ch_out,
act=None,
param_attr=param_attr,
bias_attr=bias_attr,
use_global_stats=global_stats)
```
使用方法参考:[SyncBN](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L3)
```
norm_type: sync_bn
```
EMA是指将参数过去一段时间的均值作为新的参数。相比直接对参数进行更新,采用滑动平均的方式能让参数学习过程中变得更加平缓,能有效避免异常值对参数更新的影响,提升模型训练的收敛效果。EMA包含更新和校正两个过程,更新过程使用指数滑动平均的方式不断地更新参数$\theta$, 校正过程通过除以$(1 - decay^t)$来校正对于初值的偏移。
EMA的更新过程如下列表达式所示:
$$
\theta_0 = 0 \\
\theta_t = decay * \theta_{t - 1} + (1 - decay) * \theta_t
$$
EMA的校正过程如下列表达式所示:
$$
\tilde{\theta_t} = \frac{\theta_t}{1 - decay^t}
$$
EMA的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/optimizer.py#L261)如下所示:
```python
def update(self, model):
if self.use_thres_step:
decay = min(self.decay, (1 + self.step) / (10 + self.step))
else:
decay = self.decay
self._decay = decay
model_dict = model.state_dict()
for k, v in self.state_dict.items():
v = decay * v + (1 - decay) * model_dict[k]
v.stop_gradient = True
self.state_dict[k] = v
self.step += 1
def apply(self):
if self.step == 0:
return self.state_dict
state_dict = dict()
for k, v in self.state_dict.items():
v = v / (1 - self._decay**self.step)
v.stop_gradient = True
state_dict[k] = v
return state_dict
```
使用方式参考:[EMA](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml#L4)
```
use_ema: true
ema_decay: 0.9998
```
[Dropout](https://paddlepedia.readthedocs.io/en/latest/tutorials/deep_learning/model_tuning/regularization/dropout.html?highlight=Dropout)类似,DropBlock是一种防止过拟合的方法。因为卷积特征图的相邻点之间包含密切相关的语义信息,以特征点的形式随机Drop对于目标检测任务通常不太有效。基于此,DropBlock算法在Drop特征的时候不是以特征点的形式来Drop的,而是会集中Drop掉某一块区域,从而更适合被应用到目标检测任务中来提高网络的泛化能力,如下图(c)中所示。
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/cf257e09a8164b19bf0e6adc0eabbce0123917146c624e90a0e10f68ea38bb4b", width='600'/>
</div>
DropBlock的[代码实现](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/ppdet/modeling/necks/yolo_fpn.py#L196)如下所示:
```python
gamma = (1. - self.keep_prob) / (self.block_size**2)
if self.data_format == 'NCHW':
shape = x.shape[2:]
else:
shape = x.shape[1:3]
for s in shape:
gamma *= s / (s - self.block_size + 1)
matrix = paddle.cast(paddle.rand(x.shape, x.dtype) < gamma, x.dtype)
mask_inv = F.max_pool2d(
matrix,
self.block_size,
stride=1,
padding=self.block_size // 2,
data_format=self.data_format)
mask = 1. - mask_inv
y = x * mask * (mask.numel() / mask.sum())
```
以上是PP-YOLOv2模型优化的全部技巧,期间也实验过大量没有正向效果的方法,这些方法可能并不适用于YOLO系列的模型结构或者训练策略,在[PP-YOLOv2](https://arxiv.org/abs/2104.10419)论文中汇总了一部分,这里不详细展开了。下面分享PP-YOLOv2在实际应用中的使用技巧和模型调优经验。
## 3. 调参经验
### 3.1 配置合理的学习率
PaddleDetection提供的[学习率配置](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/optimizer_365e.yml)是使用8张GPU训练,每张卡batch_size为12时对应的学习率(base_lr=0.005),如果在实际训练时使用了其他的GPU卡数或batch_size,需要相应调整学习率设置,否则可能会出现模型训练出nan的情况。调整方法为的学习率与总batch_size,即卡数乘以每张卡batch_size,成正比,下表举例进行说明
| GPU卡数 | 每张卡batch_size | 总batch_size | 对应学习率 |
| -------- | -------- | -------- | -------- |
| 8 | 12 | 96 | 0.005 |
| 1 | 12 | 12 | 0.000625 |
| 8 | 6 | 48 | 0.0025 |
### 3.2 在资源允许的情况下增大batch_size。
在多个目标检测任务优化的过程发现,仅仅增大reader中的batch_size有助于提升模型收敛效果。
### 3.3 调整gradient clip
在PP-YOLOv2中,设置了[clip_grad_by_norm](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/_base_/optimizer_365e.yml#L15) 为35以防止模型训练梯度爆炸,对于自定义任务,如果出现了梯度爆炸可以尝试修改梯度裁剪的值。
# 模型轻量化指南
## 1. 简介
本文主要关注模型轻量化过程中的通用优化方案,需要完成下面三个部分的内容,将复现的模型打造为轻量化模型。
<div align="center">
<img src="images/general_lite_model_pipeline.png" width = "400" />
</div>
对于一个任务,达到下面二者之一的条件,即可认为核验通过。
1. `模型大小``模型精度``模型速度`三个方面达到要求(该要求与任务相关,比如在某一目标检测任务中,要求模型大小(模型动转静导出的`pdiparams``pdmodel`文件大小之和)**20M**以内,模型精度与论文中大模型精度差距在**5%**以内,同时CPU上预测速度在**100ms**以内)。
2. 在满足模型大小与速度的情况下,截止日期之前提交的模型中,模型精度最高。
## 2. 具体内容
### 2.1 更换骨干网络
#### 2.1.1 简介
视觉任务中,模型骨干网络直接影响模型大小和预测速度。
大部分论文都是基于相对较大的骨干网络进行实验,如VGG、ResNet等(模型存储大小为100M量级),本部分希望通过更换骨干网络(模型存储大小为10M量级),让模型轻量化,方便实际部署过程。
#### 2.1.2 具体内容
使用`PP-LCNet_x2_5` (`~30M`)、`MobileNetV3_large_x1_0` (`~20M`)、`MobileNetV3_small_x1_0` (`~10M`) 或者针对该任务设计(如关键点检测任务中的lite_HRNet)的轻量级骨干网络,替代原始任务中的骨干网络,训练模型。
#### 2.1.3 操作步骤
PaddleClas提供了便于下游任务使用的骨干网络以及调用接口,支持网络截断、返回网络中间层输出和修改网络中间层的功能。只需要安装whl包,便可以在自己的任务中使用骨干网络。
如果使用的是常用的feature map,则可以直接通过安装paddleclas的whl包,来直接使用骨干网络。
首先需要安装paddleclas whl包。
```bash
pip install https://paddle-model-ecology.bj.bcebos.com/whl/paddleclas-0.0.0-py3-none-any.whl
```
如果希望提取中间层特征进行训练,使用方法如下。
```python
import paddle
import paddleclas
# PPLCNet_x2_5
model = paddleclas.PPLCNet_x2_5(pretrained=True, return_stages=True)
# MobileNetV3_large
# model = paddleclas.MobileNetV3_large_x1_0(pretrained=True, return_stages=True)
# MobileNetV3_smal
# model = paddleclas.MobileNetV3_small_x1_0(pretrained=True, return_stages=True)
x = paddle.rand([1, 3, 224, 224])
y = model(x)
for key in y:
print(key, y[key].shape)
```
最终会同时返回logits以及中间层特征,可以根据自己的任务,选择合适分辨率的特征图进行训练。
以PP-LCNet为例,输出信息与特征图分辨率如下所示,在检测任务中,一般抽取出`blocks3`, `blocks4`, `blocks5`, `blocks5`, 4个特征图,即可用于下游任务的训练。
```
logits [1, 1000]
blocks2 [1, 80, 112, 112]
blocks3 [1, 160, 56, 56]
blocks4 [1, 320, 28, 28]
blocks5 [1, 640, 14, 14]
blocks6 [1, 1280, 7, 7]
```
更多关于该接口的功能介绍和使用可以参考[theseus_layer使用教程](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/advanced_tutorials/theseus_layer.md)
在某些任务中,对需要对骨干网络的batch norm等参数状态进行修改(比如freeze norm或者stop grad等),此时建议直接拷贝骨干网络代码,修改代码并添加到自己的项目中。飞桨骨干网络代码地址:[常用backbone参考链接](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/arch/backbone/legendary_models)
#### 2.1.4 实战
关键点检测任务对高低层特征融合的要求较高,这里使用`lite_hrnet`网络作为该任务的轻量化骨干网络:[lite_hrnet.py](HRNet-Keypoint/lib/models/lite_hrnet.py)
训练方法如下所示。
```shell
python tools/train.py -c configs/lite_hrnet_30_256x192_coco.yml
```
COCO数据集上结果对比如下所示。
| Model | Input Size | AP(coco val) | Model Download | Model size | Config File |
| :---------- | -------- | :--------: |:--------: | :----------: | ----------- |
| HRNet-w32 | 256x192 | 76.9 | [hrnet_w32_256x192.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/hrnet_w32_256x192.pdparams) | 165M | [config](./configs/hrnet_w32_256x192.yml) |
| LiteHRNet-30 | 256x192 | 69.4 | [lite_hrnet_30_256x192_coco.pdparams](https://paddle-model-ecology.bj.bcebos.com/model/hrnet_pose/lite_hrnet_30_256x192_coco.pdparams) | 7.1M | [config](./configs/lite_hrnet_30_256x192_coco.yml)
#### 2.1.5 核验点
(1)基于轻量化骨干网络训练模型,提供模型训练结果与模型,**模型精度/速度指标满足该项任务的要求**
(2)在提交的文档中补充轻量化骨干网络的模型精度、存储大小、训练日志以及模型下载地址。
(3)文档中补充轻量化骨干网络模型的训练方法。
### 2.2 模型蒸馏
#### 2.2.1 简介
模型蒸馏指的是用大模型指导小模型的训练过程,让小模型的精度更高,因此在相同精度情况下,所需模型更小,从而达到模型轻量化的目的。
后续又衍生出两个完全相同的模型互相学习,这种模式称之为DML(互学习策略)
大小模型蒸馏中也可以使用DML的loss,唯一的区别是在大小模型蒸馏中,教师模型的参数不需要更新。
模型蒸馏有2种主要损失函数:
* 对于分类输出,对于最终的回归输出,使用,JSDIV loss,具体实现可以参考:[链接](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/ppcls/loss/dmlloss.py#L20#L20)
* 对于回归输出,使用距离loss(l2、l1、smooth l1等)
#### 2.2.2 具体内容
(1)如果有大模型,则使用大模型(默认的骨干网络)指导小模型(超轻量骨干网络)学习,根据任务输出,选择用于计算的loss,加入训练loss中,训练得到最终模型,保存模型、日志与最终精度。
(2)如果没有大模型,建议使用2个完全相同的小模型互相学习,根据任务输出,选择用于计算的loss,加入训练loss中,训练得到最终模型,保存模型、日志与最终精度。
#### 2.2.3 操作步骤
(1)定义蒸馏模型:蒸馏模型中包含教师与学生模型,教师模型基于默认骨干网络搭建,学生模型基于超轻量骨干网络搭建。如果默认骨干网络已经是超量网骨干网络,则可以使用结构相同的模型进行互学习。
(2)定义损失函数:蒸馏任务中,包含3个损失函数:
* 教师模型输出与groundtruth的损失函数
* 学生模型与groundtruth之间的损失函数
* 教师模型与学生模型输出之间的损失函数
(3)加载预训练模型:
* 如果教师模型是大模型,则需要加载大模型的训练结果,并且将教师模型的参数状态设置为`trainable=False`,停止参数更新,使用示例可以参考:[链接](https://github.com/PaddlePaddle/PaddleClas/blob/1358e3f647e12b9ee6c5d6450291983b2d5ac382/ppcls/arch/__init__.py#L117)
* 如果是教师模型是小模型,则与学生模型的加载逻辑相同
(4)蒸馏训练:和该任务的默认训练过程保持一致。
#### 2.2.4 实战
关键点检测任务中,教师模型直接使用HRNet骨干网络训练得到的模型,学生模型使用`lite_hrnet`作为骨干网络。
* 教师模型构建(通过传入教师模型的结构配置与预训练模型路径,初始化教师模型):[build_teacher_model函数](HRNet-Keypoint/tools/train.py#46)
* 损失函数构建:[DistMSELoss](HRNet-Keypoint/lib/models/loss.py#L67),由于关键点检测任务是回归任务,这里选用了MSE loss作为蒸馏的损失函数。
最终使用知识蒸馏训练轻量化模型的命令如下所示。
```bash
python tools/train.py -c configs/lite_hrnet_30_256x192_coco.yml --distill_config=./configs/hrnet_w32_256x192_teacher.yml
```
最终在模型大小不变的情况下,精度从`69.4%`提升至`69.9%`
#### 2.2.5 核验点
(1)提供轻量化骨干网络的蒸馏结果精度,**模型精度指标满足该项任务的要求(如果有)**
(2)提供知识蒸馏训练后的模型下载地址以及训练日志。
(3)文档中补充知识蒸馏训练的说明文档与命令。
### 2.3 模型量化
#### 2.3.1 简介
Paddle 量化训练(Quant-aware Training, QAT)是指在训练过程中对模型的权重及激活做模拟量化,并且产出量化训练校准后的量化模型,使用该量化模型进行预测,可以减少计算量、降低计算内存、减小模型大小。
#### 2.3.2 具体内容
添加模型PACT训练代码与训练脚本,并且提供训练日志、模型与精度对比。
**注意:**量化模型在导出为用于端侧部署的Lite模型时,才会以int8的形式保存模型,这里保存的预训练模型仍然以FP32的形式保存,因此不会小于使用fp32训练得到的模型。
#### 2.3.3 操作步骤
向代码中添加PACT量化代码包含以下5个步骤。
<div align="center">
<img src="../tipc/images/quant_aware_training_guide.png" width = "500" />
</div>
具体内容请参考[Linux GPU/CPU PACT量化训练功能开发文档](../tipc/train_pact_infer_python/train_pact_infer_python.md)
#### 2.3.4 实战
在关键点检测任务中,首先在配置中添加PACT量化的配置文件:[lite_hrnet_30_256x192_coco_pact.yml#L19](HRNet-Keypoint/configs/lite_hrnet_30_256x192_coco_pact.yml#L19)
```yaml
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams
slim: QAT
# 这里的PACT量化配置适用于大多数任务,包括分类、检测、OCR等,一般无需改动
QAT:
quant_config: {
'activation_preprocess_type': 'PACT',
'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9,
'quantizable_layer_type': ['Conv2D', 'Linear']}
print_model: True
```
在代码中,基于配置文件创建PACT的量化类,之后再将fp32的`nn.Layer`模型传入PACT量化类中,得到量化后的模型,用于训练。代码如下所示。
```python
def build_slim_model(cfg, mode='train'):
assert cfg.slim == 'QAT', 'Only QAT is supported now'
model = create(cfg.architecture)
if mode == 'train':
load_pretrain_weight(model, cfg.pretrain_weights)
slim = create(cfg.slim)
cfg['slim_type'] = cfg.slim
# TODO: fix quant export model in framework.
if mode == 'test' and cfg.slim == 'QAT':
slim.quant_config['activation_preprocess_type'] = None
cfg['model'] = slim(model)
cfg['slim'] = slim
if mode != 'train':
load_pretrain_weight(cfg['model'], cfg.weights)
return cfg
```
#### 2.3.5 核验点
(1)提供量化后的模型精度,**模型精度指标满足该项任务的要求(如果有)**
(2)提供量化训练后的模型下载地址以及训练日志。
(3)文档中补充量化训练的说明文档与命令。
## 3. FAQ
### 3.1 轻量化骨干网络
* 关于模型大小的定义如下:将训练得到的动态图模型使用`paddle.jit.save`接口,保存为静态图模型,得到模型参数文件`*.pdiparams`和结构文件`*.pdmodel`,二者的存储大小之和。
* 2.1章节中提供的骨干网络为推荐使用,具体不做限制,最终模型大小/速度/精度满足验收条件即可。
* 在部分模型不方便直接替换骨干网络的情况下(比如该模型是针对该任务设计的,通用的骨干网络无法满足要求等),可以通过对大模型的通道或者层数进行裁剪,来实现模型轻量化,最终保证模型大小满足要求即可。
### 3.2 知识蒸馏
* 不同任务中有针对该任务的定制知识蒸馏训练策略,2.2章节中内容仅供建议参考,蒸馏策略不做限制,模型精度满足要求验收条件即可。
### 3.3 模型量化
* 量化时,加载训练得到的fp32模型之后,可以将初始学习率修改为fp32训练时的`0.2~0.5`倍,迭代轮数也可以缩短为之前的`0.25~0.5`倍。
* 对于大多数CV任务,模型量化的精度损失在0.3%~1.0%左右,如果量化后精度大幅降低(超过3%),则需要仔细排查量化细节,建议仅对`conv`以及`linear`参数进行量化。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册