未验证 提交 be47e28b 编写于 作者: C cuicheng01 提交者: GitHub

Merge branch 'develop' into Add_PULC_demo

...@@ -7,6 +7,12 @@ ...@@ -7,6 +7,12 @@
飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。
**近期更新** **近期更新**
- 🔥️ 2022.5.26 [飞桨产业实践范例直播课](http://aglc.cn/v-c4FAR),解读**超轻量重点区域人员出入管理方案**,欢迎报名来交流。
<div align="center">
<img src="https://user-images.githubusercontent.com/80816848/170166458-767a01ca-1429-437f-a628-dd184732ef53.png" width = "150" />
</div>
- 2022.5.23 新增[人员出入管理范例库](https://aistudio.baidu.com/aistudio/projectdetail/4094475),具体内容可以在 AI Stuio 上体验。
- 2022.5.20 上线[PP-HGNet](./docs/zh_CN/models/PP-HGNet.md), [PP-LCNet v2](./docs/zh_CN/models/PP-LCNetV2.md)
- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) - 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)
- 2022.1.27 全面升级文档;新增[PaddleServing C++ pipeline部署方式](./deploy/paddleserving)[18M图像识别安卓部署Demo](./deploy/lite_shitu) - 2022.1.27 全面升级文档;新增[PaddleServing C++ pipeline部署方式](./deploy/paddleserving)[18M图像识别安卓部署Demo](./deploy/lite_shitu)
- 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf),新增饮料识别demo - 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf),新增饮料识别demo
......
Global:
infer_imgs: "./images/PULC/person/objects365_02035329.jpg"
inference_model_dir: "./models/person_cls_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: False
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: ThreshOutput
ThreshOutput:
threshold: 0.9
label_0: nobody
label_1: someone
SavePreLabel:
save_dir: ./pre_label/
Global:
infer_imgs: "./images/Pedestrain_Attr.jpg"
inference_model_dir: "../inference/"
batch_size: 1
use_gpu: True
enable_mkldnn: False
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
size: [192, 256]
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: Attribute
Attribute:
threshold: 0.5 #default threshold
glasses_threshold: 0.3 #threshold only for glasses
hold_threshold: 0.6 #threshold only for hold
\ No newline at end of file
...@@ -53,6 +53,34 @@ class PostProcesser(object): ...@@ -53,6 +53,34 @@ class PostProcesser(object):
return rtn return rtn
class ThreshOutput(object):
def __init__(self, threshold, label_0="0", label_1="1"):
self.threshold = threshold
self.label_0 = label_0
self.label_1 = label_1
def __call__(self, x, file_names=None):
y = []
for idx, probs in enumerate(x):
score = probs[1]
if score < self.threshold:
result = {
"class_ids": [0],
"scores": [1 - score],
"label_names": [self.label_0]
}
else:
result = {
"class_ids": [1],
"scores": [score],
"label_names": [self.label_1]
}
if file_names is not None:
result["file_name"] = file_names[idx]
y.append(result)
return y
class Topk(object): class Topk(object):
def __init__(self, topk=1, class_id_map_file=None): def __init__(self, topk=1, class_id_map_file=None):
assert isinstance(topk, (int, )) assert isinstance(topk, (int, ))
...@@ -159,3 +187,96 @@ class Binarize(object): ...@@ -159,3 +187,96 @@ class Binarize(object):
byte[:, i:i + 1] = np.dot(x[:, i * 8:(i + 1) * 8], self.unit) byte[:, i:i + 1] = np.dot(x[:, i * 8:(i + 1) * 8], self.unit)
return byte return byte
class Attribute(object):
def __init__(self,
threshold=0.5,
glasses_threshold=0.3,
hold_threshold=0.6):
self.threshold = threshold
self.glasses_threshold = glasses_threshold
self.hold_threshold = hold_threshold
def __call__(self, batch_preds, file_names=None):
# postprocess output of predictor
age_list = ['AgeLess18', 'Age18-60', 'AgeOver60']
direct_list = ['Front', 'Side', 'Back']
bag_list = ['HandBag', 'ShoulderBag', 'Backpack']
upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice']
lower_list = [
'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts',
'Skirt&Dress'
]
batch_res = []
for res in batch_preds:
res = res.tolist()
label_res = []
# gender
gender = 'Female' if res[22] > self.threshold else 'Male'
label_res.append(gender)
# age
age = age_list[np.argmax(res[19:22])]
label_res.append(age)
# direction
direction = direct_list[np.argmax(res[23:])]
label_res.append(direction)
# glasses
glasses = 'Glasses: '
if res[1] > self.glasses_threshold:
glasses += 'True'
else:
glasses += 'False'
label_res.append(glasses)
# hat
hat = 'Hat: '
if res[0] > self.threshold:
hat += 'True'
else:
hat += 'False'
label_res.append(hat)
# hold obj
hold_obj = 'HoldObjectsInFront: '
if res[18] > self.hold_threshold:
hold_obj += 'True'
else:
hold_obj += 'False'
label_res.append(hold_obj)
# bag
bag = bag_list[np.argmax(res[15:18])]
bag_score = res[15 + np.argmax(res[15:18])]
bag_label = bag if bag_score > self.threshold else 'No bag'
label_res.append(bag_label)
# upper
upper_res = res[4:8]
upper_label = 'Upper:'
sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve'
upper_label += ' {}'.format(sleeve)
for i, r in enumerate(upper_res):
if r > self.threshold:
upper_label += ' {}'.format(upper_list[i])
label_res.append(upper_label)
# lower
lower_res = res[8:14]
lower_label = 'Lower: '
has_lower = False
for i, l in enumerate(lower_res):
if l > self.threshold:
lower_label += ' {}'.format(lower_list[i])
has_lower = True
if not has_lower:
lower_label += ' {}'.format(lower_list[np.argmax(lower_res)])
label_res.append(lower_label)
# shoe
shoe = 'Boots' if res[14] > self.threshold else 'No boots'
label_res.append(shoe)
threshold_list = [0.5] * len(res)
threshold_list[1] = self.glasses_threshold
threshold_list[18] = self.hold_threshold
pred_res = (np.array(res) > np.array(threshold_list)
).astype(np.int8).tolist()
batch_res.append([label_res, pred_res])
return batch_res
...@@ -49,10 +49,15 @@ class ClsPredictor(Predictor): ...@@ -49,10 +49,15 @@ class ClsPredictor(Predictor):
pid = os.getpid() pid = os.getpid()
size = config["PreProcess"]["transform_ops"][1]["CropImage"][ size = config["PreProcess"]["transform_ops"][1]["CropImage"][
"size"] "size"]
if config["Global"].get("use_int8", False):
precision = "int8"
elif config["Global"].get("use_fp16", False):
precision = "fp16"
else:
precision = "fp32"
self.auto_logger = auto_log.AutoLogger( self.auto_logger = auto_log.AutoLogger(
model_name=config["Global"].get("model_name", "cls"), model_name=config["Global"].get("model_name", "cls"),
model_precision='fp16' model_precision=precision,
if config["Global"]["use_fp16"] else 'fp32',
batch_size=config["Global"].get("batch_size", 1), batch_size=config["Global"].get("batch_size", 1),
data_shape=[3, size, size], data_shape=[3, size, size],
save_path=config["Global"].get("save_log_path", save_path=config["Global"].get("save_log_path",
...@@ -133,13 +138,21 @@ def main(config): ...@@ -133,13 +138,21 @@ def main(config):
continue continue
batch_results = cls_predictor.predict(batch_imgs) batch_results = cls_predictor.predict(batch_imgs)
for number, result_dict in enumerate(batch_results): for number, result_dict in enumerate(batch_results):
filename = batch_names[number] if "Attribute" in config["PostProcess"]:
clas_ids = result_dict["class_ids"] filename = batch_names[number]
scores_str = "[{}]".format(", ".join("{:.2f}".format( attr_message = result_dict[0]
r) for r in result_dict["scores"])) pred_res = result_dict[1]
label_names = result_dict["label_names"] print("{}:\t attributes: {}, \npredict output: {}".format(
print("{}:\tclass id(s): {}, score(s): {}, label_name(s): {}". filename, attr_message, pred_res))
format(filename, clas_ids, scores_str, label_names)) else:
filename = batch_names[number]
clas_ids = result_dict["class_ids"]
scores_str = "[{}]".format(", ".join("{:.2f}".format(
r) for r in result_dict["scores"]))
label_names = result_dict["label_names"]
print(
"{}:\tclass id(s): {}, score(s): {}, label_name(s): {}".
format(filename, clas_ids, scores_str, label_names))
batch_imgs = [] batch_imgs = []
batch_names = [] batch_names = []
if cls_predictor.benchmark: if cls_predictor.benchmark:
......
...@@ -42,8 +42,22 @@ class Predictor(object): ...@@ -42,8 +42,22 @@ class Predictor(object):
def create_paddle_predictor(self, args, inference_model_dir=None): def create_paddle_predictor(self, args, inference_model_dir=None):
if inference_model_dir is None: if inference_model_dir is None:
inference_model_dir = args.inference_model_dir inference_model_dir = args.inference_model_dir
params_file = os.path.join(inference_model_dir, "inference.pdiparams") if "inference_int8.pdiparams" in os.listdir(inference_model_dir):
model_file = os.path.join(inference_model_dir, "inference.pdmodel") params_file = os.path.join(inference_model_dir,
"inference_int8.pdiparams")
model_file = os.path.join(inference_model_dir,
"inference_int8.pdmodel")
assert args.get(
"use_fp16", False
) is False, "fp16 mode is not supported for int8 model inference, please set use_fp16 as False during inference."
else:
params_file = os.path.join(inference_model_dir,
"inference.pdiparams")
model_file = os.path.join(inference_model_dir, "inference.pdmodel")
assert args.get(
"use_int8", False
) is False, "int8 mode is not supported for fp32 model inference, please set use_int8 as False during inference."
config = Config(model_file, params_file) config = Config(model_file, params_file)
if args.use_gpu: if args.use_gpu:
...@@ -63,12 +77,18 @@ class Predictor(object): ...@@ -63,12 +77,18 @@ class Predictor(object):
config.disable_glog_info() config.disable_glog_info()
config.switch_ir_optim(args.ir_optim) # default true config.switch_ir_optim(args.ir_optim) # default true
if args.use_tensorrt: if args.use_tensorrt:
precision = Config.Precision.Float32
if args.get("use_int8", False):
precision = Config.Precision.Int8
elif args.get("use_fp16", False):
precision = Config.Precision.Half
config.enable_tensorrt_engine( config.enable_tensorrt_engine(
precision_mode=Config.Precision.Half precision_mode=precision,
if args.use_fp16 else Config.Precision.Float32,
max_batch_size=args.batch_size, max_batch_size=args.batch_size,
workspace_size=1 << 30, workspace_size=1 << 30,
min_subgraph_size=30) min_subgraph_size=30,
use_calib_mode=False)
config.enable_memory_optim() config.enable_memory_optim()
# use zero copy # use zero copy
......
# PaddleClas构建有人/无人分类案例
此处提供了用户使用 PaddleClas 快速构建轻量级、高精度、可落地的有人/无人的分类模型教程,主要基于有人/无人场景的数据,融合了轻量级骨干网络PPLCNet、SSLD预训练权重、EDA数据增强策略、SKL-UGI知识蒸馏策略、SHAS超参数搜索策略,得到精度高、速度快、易于部署的二分类模型。
------
## 目录
- [1. 环境配置](#1)
- [2. 有人/无人场景推理预测](#2)
- [2.1 下载模型](#2.1)
- [2.2 模型推理预测](#2.2)
- [2.2.1 预测单张图像](#2.2.1)
- [2.2.2 基于文件夹的批量预测](#2.2.2)
- [3.有人/无人场景训练](#3)
- [3.1 数据准备](#3.1)
- [3.2 模型训练](#3.2)
- [3.2.1 基于默认超参数训练](#3.2.1)
- [3.2.1.1 基于默认超参数训练轻量级模型](#3.2.1.1)
- [3.2.1.2 基于默认超参数训练教师模型](#3.2.1.2)
- [3.2.1.3 基于默认超参数进行蒸馏训练](#3.2.1.3)
- [3.2.2 超参数搜索训练](#3.2)
- [4. 模型评估与推理](#4)
- [4.1 模型评估](#3.1)
- [4.2 模型预测](#3.2)
- [4.3 使用 inference 模型进行推理](#4.3)
- [4.3.1 导出 inference 模型](#4.3.1)
- [4.3.2 模型推理预测](#4.3.2)
<a name="1"></a>
## 1. 环境配置
* 安装:请先参考 [Paddle 安装教程](../installation/install_paddle.md) 以及 [PaddleClas 安装教程](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
<a name="2"></a>
## 2. 有人/无人场景推理预测
<a name="2.1"></a>
### 2.1 下载模型
* 进入 `deploy` 运行目录。
```
cd deploy
```
下载有人/无人分类的模型。
```
mkdir models
cd models
# 下载inference 模型并解压
wget https://paddleclas.bj.bcebos.com/models/PULC/person_cls_infer.tar && tar -xf person_cls_infer.tar
```
解压完毕后,`models` 文件夹下应有如下文件结构:
```
├── person_cls_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
```
<a name="2.2"></a>
### 2.2 模型推理预测
<a name="2.2.1"></a>
#### 2.2.1 预测单张图像
返回 `deploy` 目录:
```
cd ../
```
运行下面的命令,对图像 `./images/PULC/person/objects365_02035329.jpg` 进行有人/无人分类。
```shell
# 使用下面的命令使用 GPU 进行预测
python3.7 python/predict_cls.py -c configs/PULC/person/inference_person_cls.yaml -o PostProcess.ThreshOutput.threshold=0.9794
# 使用下面的命令使用 CPU 进行预测
python3.7 python/predict_cls.py -c configs/PULC/person/inference_person_cls.yaml -o PostProcess.ThreshOutput.threshold=0.9794 -o Global.use_gpu=False
```
输出结果如下。
```
objects365_02035329.jpg: class id(s): [1], score(s): [1.00], label_name(s): ['someone']
```
**备注:** 真实场景中往往需要在假正类率(Fpr)小于某一个指标下求真正类率(Tpr),该场景中的`val`数据集在千分之一Fpr下得到的最佳Tpr所得到的阈值为`0.9794`,故此处的`threshold``0.9794`。该阈值的确定方法可以参考[3.2节](#3.2)
<a name="2.2.2"></a>
#### 2.2.2 基于文件夹的批量预测
如果希望预测文件夹内的图像,可以直接修改配置文件中的 `Global.infer_imgs` 字段,也可以通过下面的 `-o` 参数修改对应的配置。
```shell
# 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False
python3.7 python/predict_cls.py -c configs/PULC/person/inference_person_cls.yaml -o Global.infer_imgs="./images/PULC/person/"
```
终端中会输出该文件夹内所有图像的分类结果,如下所示。
```
objects365_01780782.jpg: class id(s): [0], score(s): [1.00], label_name(s): ['nobody']
objects365_02035329.jpg: class id(s): [1], score(s): [1.00], label_name(s): ['someone']
```
其中,`someone` 表示该图里存在人,`nobody` 表示该图里不存在人。
<a name="3"></a>
## 3.有人/无人场景训练
<a name="3.1"></a>
### 3.1 数据准备
进入 PaddleClas 目录。
```
cd path_to_PaddleClas
```
进入 `dataset/` 目录,下载并解压有人/无人场景的数据。
```shell
cd dataset
wget https://paddleclas.bj.bcebos.com/data/cls_demo/person.tar
tar -xf person.tar
cd ../
```
执行上述命令后,`dataset/`下存在`person`目录,该目录中具有以下数据:
```
├── train
│   ├── 000000000009.jpg
│   ├── 000000000025.jpg
...
├── val
│   ├── objects365_01780637.jpg
│   ├── objects365_01780640.jpg
...
├── ImageNet_val
│   ├── ILSVRC2012_val_00000001.JPEG
│   ├── ILSVRC2012_val_00000002.JPEG
...
├── train_list.txt
├── train_list.txt.debug
├── train_list_for_distill.txt
├── val_list.txt
└── val_list.txt.debug
```
其中`train/``val/`分别为训练集和验证集。`train_list.txt``val_list.txt`分别为训练集和验证集的标签文件,`train_list.txt.debug``val_list.txt.debug`分别为训练集和验证集的`debug`标签文件,其分别是`train_list.txt``val_list.txt`的子集,用该文件可以快速体验本案例的流程。`ImageNet_val/`是ImageNet的验证集,该集合和`train`集合的混合数据用于本案例的`SKL-UGI知识蒸馏策略`,对应的训练标签文件为`train_list_for_distill.txt`
* **注意**:
* 本案例中所使用的所有数据集均为开源数据,`train`集合为[MS-COCO数据](https://cocodataset.org/#overview)的训练集的子集,`val`集合为[Object365数据](https://www.objects365.org/overview.html)的训练集的子集,`ImageNet_val`[ImageNet数据](https://www.image-net.org/)的验证集。数据集的筛选流程可以参考[有人/无人场景数据集筛选方法]()。
<a name="3.2"></a>
### 3.2 模型训练
<a name="3.2.1"></a>
#### 3.2.1 基于默认超参数训练
<a name="3.2.1.1"></a>
##### 3.2.1.1 基于默认超参数训练轻量级模型
`ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml`中提供了基于该场景的训练配置,可以通过如下脚本启动训练:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml
```
验证集的最佳指标在0.94-0.95之间(数据集较小,容易造成波动)。
**备注:**
* 此时使用的指标为Tpr,该指标描述了在假正类率(Fpr)小于某一个指标时的真正类率(Tpr),是产业中二分类问题常用的指标之一。在本案例中,Fpr为千分之一。关于Fpr和Tpr的更多介绍,可以参考[这里](https://baike.baidu.com/item/AUC/19282953)
* 在eval时,会打印出来当前最佳的TprAtFpr指标,具体地,其会打印当前的`Fpr``Tpr`值,以及当前的`threshold`值,`Tpr`值反映了在当前`Fpr`值下的召回率,该值越高,代表模型越好。`threshold` 表示当前最佳`Fpr`所对应的分类阈值,可用于后续模型部署落地等。
<a name="3.2.1.2"></a>
##### 3.2.1.2 基于默认超参数训练教师模型
复用`ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml`中的超参数,训练教师模型,训练脚本如下:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml \
-o Arch.name=ResNet101_vd
```
验证集的最佳指标为0.96-0.98之间,当前教师模型最好的权重保存在`output/ResNet101_vd/best_model.pdparams`
<a name="3.2.1.3"></a>
##### 3.2.1.3 基于默认超参数进行蒸馏训练
配置文件`ppcls/configs/PULC/PULC/Distillation/PPLCNet_x1_0_distillation.yaml`提供了`SKL-UGI知识蒸馏策略`的配置。该配置将`ResNet101_vd`当作教师模型,`PPLCNet_x1_0`当作学生模型,使用ImageNet数据集的验证集作为新增的无标签数据。训练脚本如下:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person/Distillation/PPLCNet_x1_0_distillation.yaml \
-o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
```
验证集的最佳指标为0.95-0.97之间,当前模型最好的权重保存在`output/DistillationModel/best_model_student.pdparams`
<a name="3.2.2"></a>
#### 3.2.2 超参数搜索训练
[3.2 小节](#3.2) 提供了在已经搜索并得到的超参数上进行了训练,此部分内容提供了搜索的过程,此过程是为了得到更好的训练超参数。
* 搜索运行脚本如下:
```shell
python tools/search_strategy.py -c ppcls/configs/StrategySearch/person.yaml
```
`ppcls/configs/StrategySearch/person.yaml`中指定了具体的 GPU id 号和搜索配置, 默认搜索的训练日志和模型存放于`output/search_person`中,最终的蒸馏模型存放于`output/search_person/search_res/DistillationModel/best_model_student.pdparams`
* **注意**:
* 3.1小节提供的默认配置已经经过了搜索,所以此过程不是必要的过程,如果自己的训练数据集有变化,可以尝试此过程。
* 此过程基于当前数据集在 V100 4 卡上大概需要耗时 10 小时,如果缺少机器资源,希望体验搜索过程,可以将`ppcls/configs/cls_demo/person/PPLCNet/PPLCNet_x1_0_search.yaml`中的`train_list.txt``val_list.txt`分别替换为`train_list.txt.debug``val_list.txt.debug`。替换list只是为了加速跑通整个搜索过程,由于数据量较小,其搜素的结果没有参考性。另外,搜索空间可以根据当前的机器资源来调整,如果机器资源有限,可以尝试缩小搜索空间,如果机器资源较充足,可以尝试扩大搜索空间。
* 如果此过程搜索的得到的超参数与[3.2.1小节](#3.2.1)提供的超参数不一致,主要是由于训练数据较小造成的波动导致,可以忽略。
<a name="4"></a>
## 4. 模型评估与推理
<a name="4.1"></a>
### 4.1 模型评估
训练好模型之后,可以通过以下命令实现对模型指标的评估。
```bash
python3 tools/eval.py \
-c ./ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml \
-o Global.pretrained_model="output/DistillationModel/best_model_student"
```
<a name="4.2"></a>
### 4.2 模型预测
模型训练完成之后,可以加载训练得到的预训练模型,进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例,只需执行下述命令即可完成模型预测:
```python
python3 tools/infer.py \
-c ./ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0.yaml \
-o Infer.infer_imgs=./dataset/person/val/objects365_01780637.jpg \
-o Global.pretrained_model=output/DistillationModel/best_model_student \
-o Global.pretrained_model=Infer.PostProcess.threshold=0.9794
```
输出结果如下:
```
[{'class_ids': [0], 'scores': [0.9878496769815683], 'label_names': ['nobody'], 'file_name': './dataset/person/val/objects365_01780637.jpg'}]
```
**备注:** 这里的`Infer.PostProcess.threshold`的值需要根据实际场景来确定,此处的`0.9794`是在该场景中的`val`数据集在千分之一Fpr下得到的最佳Tpr所得到的。
<a name="4.3"></a>
### 4.3 使用 inference 模型进行推理
<a name="4.3.1"></a>
### 4.3.1 导出 inference 模型
通过导出 inference 模型,PaddlePaddle 支持使用预测引擎进行预测推理。接下来介绍如何用预测引擎进行推理:
首先,对训练好的模型进行转换:
```bash
python3 tools/export_model.py \
-c ./ppcls/configs/cls_demo/PULC/PPLCNet/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/DistillationModel/best_model_student \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person
```
执行完该脚本后会在`deploy/models/`下生成`PPLCNet_x1_0_person`文件夹,该文件夹中的模型与 2.2 节下载的推理预测模型格式一致。
<a name="4.3.2"></a>
### 4.3.2 基于 inference 模型推理预测
推理预测的脚本为:
```
python3.7 python/predict_cls.py -c configs/PULC/person/inference_person_cls.yaml -o Global.inference_model_dir="models/PPLCNet_x1_0_person" -o PostProcess.ThreshOutput.threshold=0.9794
```
**备注:**
- 此处的`PostProcess.ThreshOutput.threshold`由eval时的最佳`threshold`来确定。
- 更多关于推理的细节,可以参考[2.2节](#2.2)
# PP-HGNet 系列
---
## 目录
* [1. 概述](#1)
* [2. 结构信息](#2)
* [3. 实验结果](#3)
<a name='1'></a>
## 1. 概述
PP-HGNet(High Performance GPU Net) 是百度飞桨视觉团队自研的更适用于 GPU 平台的高性能骨干网络,该网络在 VOVNet 的基础上使用了可学习的下采样层(LDS Layer),融合了 ResNet_vd、PPLCNet 等模型的优点,该模型在 GPU 平台上与其他 SOTA 模型在相同的速度下有着更高的精度。在同等速度下,该模型高于 ResNet34-D 模型 3.8 个百分点,高于 ResNet50-D 模型 2.4 个百分点,在使用百度自研 SSLD 蒸馏策略后,超越 ResNet50-D 模型 4.7 个百分点。与此同时,在相同精度下,其推理速度也远超主流 VisionTransformer 的推理速度。
<a name='2'></a>
## 2. 结构信息
PP-HGNet 作者针对 GPU 设备,对目前 GPU 友好的网络做了分析和归纳,尽可能多的使用 3x3 标准卷积(计算密度最高)。在此将 VOVNet 作为基准模型,将主要的有利于 GPU 推理的改进点进行融合。从而得到一个有利于 GPU 推理的骨干网络,同样速度下,精度大幅超越其他 CNN 或者 VisionTransformer 模型。
PP-HGNet 骨干网络的整体结构如下:
![](../../images/PP-HGNet/PP-HGNet.png)
其中,PP-HGNet是由多个HG-Block组成,HG-Block的细节如下:
![](../../images/PP-HGNet/PP-HGNet-block.png)
<a name='3'></a>
## 3. 实验结果
PP-HGNet 与其他模型的比较如下,其中测试机器为 NVIDIA® Tesla® V100,开启 TensorRT 引擎,精度类型为 FP32。在相同速度下,PP-HGNet 精度均超越了其他 SOTA CNN 模型,在与 SwinTransformer 模型的比较中,在更高精度的同时,速度快 2 倍以上。
| Model | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
|-------|---------------|---------------|-------------|
| ResNet34 | 74.57 | 92.14 | 1.97 |
| ResNet34_vd | 75.98 | 92.98 | 2.00 |
| EfficientNetB0 | 77.38 | 93.31 | 1.96 |
| <b>PPHGNet_tiny<b> | <b>79.83<b> | <b>95.04<b> | <b>1.77<b> |
| <b>PPHGNet_tiny_ssld<b> | <b>81.95<b> | <b>96.12<b> | <b>1.77<b> |
| ResNet50 | 76.50 | 93.00 | 2.54 |
| ResNet50_vd | 79.12 | 94.44 | 2.60 |
| ResNet50_rsb | 80.40 | | 2.54 |
| EfficientNetB1 | 79.15 | 94.41 | 2.88 |
| SwinTransformer_tiny | 81.2 | 95.5 | 6.59 |
| <b>PPHGNet_small<b> | <b>81.51<b>| <b>95.82<b> | <b>2.52<b> |
| <b>PPHGNet_small_ssld<b> | <b>83.82<b>| <b>96.81<b> | <b>2.52<b> |
关于更多 PP-HGNet 的介绍以及下游任务的表现,敬请期待。
# PP-LCNetV2 系列 # PP-LCNetV2
--- ---
## 概述 ## 1. 概述
PP-LCNetV2 是在 [PP-LCNet 系列模型](./PP-LCNet.md)的基础上,所提出的针对 Intel CPU 硬件平台设计的计算机视觉骨干网络,该模型更为 骨干网络对计算机视觉下游任务的影响不言而喻,不仅对下游模型的性能影响很大,而且模型效率也极大地受此影响,但现有的大多骨干网络在真实应用中的效率并不理想,特别是缺乏针对 Intel CPU 平台所优化的骨干网络,我们测试了现有的主流轻量级模型,发现在 Intel CPU 平台上的效率并不理想,然而目前 Intel CPU 平台在工业界仍有大量使用场景,因此我们提出了 PP-LCNet 系列模型,PP-LCNetV2 是在 [PP-LCNetV1](./PP-LCNet.md) 基础上所改进的。
在不使用额外数据的前提下,PPLCNetV2_base 模型在图像分类 ImageNet 数据集上能够取得超过 77% 的 Top1 Acc,同时在 Intel CPU 平台仅有 4.4 ms 以下的延迟,如下表所示,其中延时测试基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台,OpenVINO 2021.4.2推理平台。 ## 2. 设计细节
![](../../images/PP-LCNetV2/net.png)
PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 PP-LCNetV1 的基础上优化而来,主要使用重参数化策略组合了不同大小卷积核的深度卷积,并优化了点卷积、Shortcut等。
### 2.1 Rep 策略
卷积核的大小决定了卷积层感受野的大小,通过组合使用不同大小的卷积核,能够获取不同尺度的特征,因此 PPLCNetV2 在 Stage3、Stage4 中,在同一层组合使用 kernel size 分别为 5、3、1 的 DW 卷积,同时为了避免对模型效率的影响,使用重参数化(Re parameterization,Rep)策略对同层的 DW 卷积进行融合,如下图所示。
![](../../images/PP-LCNetV2/rep.png)
### 2.2 PW 卷积
深度可分离卷积通常由一层 DW 卷积和一层 PW 卷积组成,用以替换标准卷积,为了使深度可分离卷积具有更强的拟合能力,我们尝试使用两层 PW 卷积,同时为了控制模型效率不受影响,两层 PW 卷积设置为:第一个在通道维度对特征图压缩,第二个再通过放大还原特征图通道,如下图所示。通过实验发现,该策略能够显著提高模型性能,同时为了平衡对模型效率带来的影响,PPLCNetV2 仅在 Stage4、Stage5 中使用了该策略。
![](../../images/PP-LCNetV2/split_pw.png)
### 2.3 Shortcut
残差结构(residual)自提出以来,被诸多模型广泛使用,但在轻量级卷积神经网络中,由于残差结构所带来的元素级(element-wise)加法操作,会对模型的速度造成影响,我们在 PP-LCNetV2 中,以 Stage 为单位实验了 残差结构对模型的影响,发现残差结构的使用并非一定会带来性能的提高,因此 PPLCNetV2 仅在最后一个 Stage 中的使用了残差结构:在 Block 中增加 Shortcut,如下图所示。
![](../../images/PP-LCNetV2/shortcut.png)
### 2.4 激活函数
在目前的轻量级卷积神经网络中,ReLU、Hard-Swish 激活函数最为常用,虽然在模型性能方面,Hard-Swish 通常更为优秀,然而我们发现部分推理平台对于 Hard-Swish 激活函数的效率优化并不理想,因此为了兼顾通用性,PP-LCNetV2 默认使用了 ReLU 激活函数,并且我们测试发现,ReLU 激活函数对于较大模型的性能影响较小。
### 2.5 SE 模块
虽然 SE 模块能够显著提高模型性能,但其对模型速度的影响同样不可忽视,在 PP-LCNetV1 中,我们发现在模型中后部使用 SE 模块能够获得最大化的收益。在 PP-LCNetV2 的优化过程中,我们以 Stage 为单位对 SE 模块的位置做了进一步实验,并发现在 Stage3 中使用能够取得更好的平衡。
## 3. 实验结果
在不使用额外数据的前提下,PPLCNetV2_base 模型在图像分类 ImageNet 数据集上能够取得超过 77% 的 Top1 Acc,同时在 Intel CPU 平台的推理时间在 4.4 ms 以下,如下表所示,其中推理时间基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台,OpenVINO 推理平台。
| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | | Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
|-------|-----------|----------|---------------|---------------|-------------| |-------|-----------|----------|---------------|---------------|-------------|
| PPLCNetV2_base | 6.6 | 604 | 77.04 | 93.27 | 4.32 | | MobileNetV3_Large_x1_25 | 7.4 | 714 | 76.4 | 93.00 | 5.19 |
| PPLCNetV2_x2_5 | 9 | 906 | 76.60 | 93.00 | 7.25 |
| <b>PPLCNetV2_base<b> | <b>6.6<b> | <b>604<b> | <b>77.04<b> | <b>93.27<b> | <b>4.32<b> |
关于 PP-LCNetV2 系列模型的更多信息,敬请关注。 关于 PP-LCNetV2 模型的更多信息,敬请关注。
## 人员出入管理
近几年,AI视觉技术在安防、工业制造等场景在产业智能化升级进程中发挥着举足轻重的作用。【进出管控】作为各行业中的关键场景,应用需求十分迫切。 如在居家防盗、机房管控以及景区危险告警等场景中,存在大量对异常目标(人、车或其他物体)不经允许擅自进入规定区域的及时检测需求。利用深度学习视觉技术,可以及时准确地对闯入行为进行识别并发出告警信息。切实保障人员的生命财产安全。相比传统人力监管的方式,不仅可以实现7*24小时不间断的全方位保护,还能极大地降低管理成本,解放劳动力。
但在真实产业中,要实现高精度的人员进出识别不是一件容易的事,在实际场景中存在着各种各样的问题:
**摄像头采集到的图像会受到建筑、机器、车辆等遮挡的影响**
**天气多种多样,要适应白天、黑夜、雾天和雨天等**
针对上述场景,本次飞桨产业实践范例库推出了重点区域人员进出管控实践示例,提供从数据准备、技术方案、模型训练优化,到模型部署的全流程可复用方案,有效解决了不同光照、不同天气等室外复杂环境下的图像分类问题,并且极大地降低了数据标注和算力成本,适用于厂区巡检、家居防盗、景区管理等多个产业应用。
![result](./imgs/someone.gif)
**注**: AI Studio在线运行代码请参考[人员出入管理](https://aistudio.baidu.com/aistudio/projectdetail/4094475)
...@@ -32,14 +32,19 @@ from ppcls.arch.distill.afd_attention import LinearTransformStudent, LinearTrans ...@@ -32,14 +32,19 @@ from ppcls.arch.distill.afd_attention import LinearTransformStudent, LinearTrans
__all__ = ["build_model", "RecModel", "DistillationModel", "AttentionModel"] __all__ = ["build_model", "RecModel", "DistillationModel", "AttentionModel"]
def build_model(config): def build_model(config, mode="train"):
arch_config = copy.deepcopy(config["Arch"]) arch_config = copy.deepcopy(config["Arch"])
model_type = arch_config.pop("name") model_type = arch_config.pop("name")
use_sync_bn = arch_config.pop("use_sync_bn", False)
mod = importlib.import_module(__name__) mod = importlib.import_module(__name__)
arch = getattr(mod, model_type)(**arch_config) arch = getattr(mod, model_type)(**arch_config)
if use_sync_bn:
arch = nn.SyncBatchNorm.convert_sync_batchnorm(arch)
if isinstance(arch, TheseusLayer): if isinstance(arch, TheseusLayer):
prune_model(config, arch) prune_model(config, arch)
quantize_model(config, arch) quantize_model(config, arch, mode)
return arch return arch
...@@ -50,6 +55,7 @@ def apply_to_static(config, model): ...@@ -50,6 +55,7 @@ def apply_to_static(config, model):
specs = None specs = None
if 'image_shape' in config['Global']: if 'image_shape' in config['Global']:
specs = [InputSpec([None] + config['Global']['image_shape'])] specs = [InputSpec([None] + config['Global']['image_shape'])]
specs[0].stop_gradient = True
model = to_static(model, input_spec=specs) model = to_static(model, input_spec=specs)
logger.info("Successfully to apply @to_static with specs: {}".format( logger.info("Successfully to apply @to_static with specs: {}".format(
specs)) specs))
......
...@@ -24,6 +24,7 @@ from ppcls.arch.backbone.legendary_models.hrnet import HRNet_W18_C, HRNet_W30_C, ...@@ -24,6 +24,7 @@ from ppcls.arch.backbone.legendary_models.hrnet import HRNet_W18_C, HRNet_W30_C,
from ppcls.arch.backbone.legendary_models.pp_lcnet import PPLCNet_x0_25, PPLCNet_x0_35, PPLCNet_x0_5, PPLCNet_x0_75, PPLCNet_x1_0, PPLCNet_x1_5, PPLCNet_x2_0, PPLCNet_x2_5 from ppcls.arch.backbone.legendary_models.pp_lcnet import PPLCNet_x0_25, PPLCNet_x0_35, PPLCNet_x0_5, PPLCNet_x0_75, PPLCNet_x1_0, PPLCNet_x1_5, PPLCNet_x2_0, PPLCNet_x2_5
from ppcls.arch.backbone.legendary_models.pp_lcnet_v2 import PPLCNetV2_base from ppcls.arch.backbone.legendary_models.pp_lcnet_v2 import PPLCNetV2_base
from ppcls.arch.backbone.legendary_models.esnet import ESNet_x0_25, ESNet_x0_5, ESNet_x0_75, ESNet_x1_0 from ppcls.arch.backbone.legendary_models.esnet import ESNet_x0_25, ESNet_x0_5, ESNet_x0_75, ESNet_x1_0
from ppcls.arch.backbone.legendary_models.pp_hgnet import PPHGNet_tiny, PPHGNet_small, PPHGNet_base
from ppcls.arch.backbone.model_zoo.resnet_vc import ResNet50_vc from ppcls.arch.backbone.model_zoo.resnet_vc import ResNet50_vc
from ppcls.arch.backbone.model_zoo.resnext import ResNeXt50_32x4d, ResNeXt50_64x4d, ResNeXt101_32x4d, ResNeXt101_64x4d, ResNeXt152_32x4d, ResNeXt152_64x4d from ppcls.arch.backbone.model_zoo.resnext import ResNeXt50_32x4d, ResNeXt50_64x4d, ResNeXt101_32x4d, ResNeXt101_64x4d, ResNeXt152_32x4d, ResNeXt152_64x4d
...@@ -51,7 +52,7 @@ from ppcls.arch.backbone.model_zoo.darknet import DarkNet53 ...@@ -51,7 +52,7 @@ from ppcls.arch.backbone.model_zoo.darknet import DarkNet53
from ppcls.arch.backbone.model_zoo.regnet import RegNetX_200MF, RegNetX_4GF, RegNetX_32GF, RegNetY_200MF, RegNetY_4GF, RegNetY_32GF from ppcls.arch.backbone.model_zoo.regnet import RegNetX_200MF, RegNetX_4GF, RegNetX_32GF, RegNetY_200MF, RegNetY_4GF, RegNetY_32GF
from ppcls.arch.backbone.model_zoo.vision_transformer import ViT_small_patch16_224, ViT_base_patch16_224, ViT_base_patch16_384, ViT_base_patch32_384, ViT_large_patch16_224, ViT_large_patch16_384, ViT_large_patch32_384 from ppcls.arch.backbone.model_zoo.vision_transformer import ViT_small_patch16_224, ViT_base_patch16_224, ViT_base_patch16_384, ViT_base_patch32_384, ViT_large_patch16_224, ViT_large_patch16_384, ViT_large_patch32_384
from ppcls.arch.backbone.model_zoo.distilled_vision_transformer import DeiT_tiny_patch16_224, DeiT_small_patch16_224, DeiT_base_patch16_224, DeiT_tiny_distilled_patch16_224, DeiT_small_distilled_patch16_224, DeiT_base_distilled_patch16_224, DeiT_base_patch16_384, DeiT_base_distilled_patch16_384 from ppcls.arch.backbone.model_zoo.distilled_vision_transformer import DeiT_tiny_patch16_224, DeiT_small_patch16_224, DeiT_base_patch16_224, DeiT_tiny_distilled_patch16_224, DeiT_small_distilled_patch16_224, DeiT_base_distilled_patch16_224, DeiT_base_patch16_384, DeiT_base_distilled_patch16_384
from ppcls.arch.backbone.model_zoo.swin_transformer import SwinTransformer_tiny_patch4_window7_224, SwinTransformer_small_patch4_window7_224, SwinTransformer_base_patch4_window7_224, SwinTransformer_base_patch4_window12_384, SwinTransformer_large_patch4_window7_224, SwinTransformer_large_patch4_window12_384 from ppcls.arch.backbone.legendary_models.swin_transformer import SwinTransformer_tiny_patch4_window7_224, SwinTransformer_small_patch4_window7_224, SwinTransformer_base_patch4_window7_224, SwinTransformer_base_patch4_window12_384, SwinTransformer_large_patch4_window7_224, SwinTransformer_large_patch4_window12_384
from ppcls.arch.backbone.model_zoo.cswin_transformer import CSWinTransformer_tiny_224, CSWinTransformer_small_224, CSWinTransformer_base_224, CSWinTransformer_large_224, CSWinTransformer_base_384, CSWinTransformer_large_384 from ppcls.arch.backbone.model_zoo.cswin_transformer import CSWinTransformer_tiny_224, CSWinTransformer_small_224, CSWinTransformer_base_224, CSWinTransformer_large_224, CSWinTransformer_base_384, CSWinTransformer_large_384
from ppcls.arch.backbone.model_zoo.mixnet import MixNet_S, MixNet_M, MixNet_L from ppcls.arch.backbone.model_zoo.mixnet import MixNet_S, MixNet_M, MixNet_L
from ppcls.arch.backbone.model_zoo.rexnet import ReXNet_1_0, ReXNet_1_3, ReXNet_1_5, ReXNet_2_0, ReXNet_3_0 from ppcls.arch.backbone.model_zoo.rexnet import ReXNet_1_0, ReXNet_1_3, ReXNet_1_5, ReXNet_2_0, ReXNet_3_0
...@@ -69,6 +70,7 @@ from ppcls.arch.backbone.model_zoo.van import VAN_tiny ...@@ -69,6 +70,7 @@ from ppcls.arch.backbone.model_zoo.van import VAN_tiny
from ppcls.arch.backbone.variant_models.resnet_variant import ResNet50_last_stage_stride1 from ppcls.arch.backbone.variant_models.resnet_variant import ResNet50_last_stage_stride1
from ppcls.arch.backbone.variant_models.vgg_variant import VGG19Sigmoid from ppcls.arch.backbone.variant_models.vgg_variant import VGG19Sigmoid
from ppcls.arch.backbone.variant_models.pp_lcnet_variant import PPLCNet_x2_5_Tanh from ppcls.arch.backbone.variant_models.pp_lcnet_variant import PPLCNet_x2_5_Tanh
from ppcls.arch.backbone.model_zoo.adaface_ir_net import AdaFace_IR_18, AdaFace_IR_34, AdaFace_IR_50, AdaFace_IR_101, AdaFace_IR_152, AdaFace_IR_SE_50, AdaFace_IR_SE_101, AdaFace_IR_SE_152, AdaFace_IR_SE_200
# help whl get all the models' api (class type) and components' api (func type) # help whl get all the models' api (class type) and components' api (func type)
......
# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.initializer import KaimingNormal, Constant
from paddle.nn import Conv2D, BatchNorm2D, ReLU, AdaptiveAvgPool2D, MaxPool2D
from paddle.regularizer import L2Decay
from paddle import ParamAttr
from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
MODEL_URLS = {
"PPHGNet_tiny":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_tiny_pretrained.pdparams",
"PPHGNet_small":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPHGNet_small_pretrained.pdparams"
}
__all__ = list(MODEL_URLS.keys())
kaiming_normal_ = KaimingNormal()
zeros_ = Constant(value=0.)
ones_ = Constant(value=1.)
class ConvBNAct(TheseusLayer):
def __init__(self,
in_channels,
out_channels,
kernel_size,
stride,
groups=1,
use_act=True):
super().__init__()
self.use_act = use_act
self.conv = Conv2D(
in_channels,
out_channels,
kernel_size,
stride,
padding=(kernel_size - 1) // 2,
groups=groups,
bias_attr=False)
self.bn = BatchNorm2D(
out_channels,
weight_attr=ParamAttr(regularizer=L2Decay(0.0)),
bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
if self.use_act:
self.act = ReLU()
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
if self.use_act:
x = self.act(x)
return x
class ESEModule(TheseusLayer):
def __init__(self, channels):
super().__init__()
self.avg_pool = AdaptiveAvgPool2D(1)
self.conv = Conv2D(
in_channels=channels,
out_channels=channels,
kernel_size=1,
stride=1,
padding=0)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
identity = x
x = self.avg_pool(x)
x = self.conv(x)
x = self.sigmoid(x)
return paddle.multiply(x=identity, y=x)
class HG_Block(TheseusLayer):
def __init__(
self,
in_channels,
mid_channels,
out_channels,
layer_num,
identity=False, ):
super().__init__()
self.identity = identity
self.layers = nn.LayerList()
self.layers.append(
ConvBNAct(
in_channels=in_channels,
out_channels=mid_channels,
kernel_size=3,
stride=1))
for _ in range(layer_num - 1):
self.layers.append(
ConvBNAct(
in_channels=mid_channels,
out_channels=mid_channels,
kernel_size=3,
stride=1))
# feature aggregation
total_channels = in_channels + layer_num * mid_channels
self.aggregation_conv = ConvBNAct(
in_channels=total_channels,
out_channels=out_channels,
kernel_size=1,
stride=1)
self.att = ESEModule(out_channels)
def forward(self, x):
identity = x
output = []
output.append(x)
for layer in self.layers:
x = layer(x)
output.append(x)
x = paddle.concat(output, axis=1)
x = self.aggregation_conv(x)
x = self.att(x)
if self.identity:
x += identity
return x
class HG_Stage(TheseusLayer):
def __init__(self,
in_channels,
mid_channels,
out_channels,
block_num,
layer_num,
downsample=True):
super().__init__()
self.downsample = downsample
if downsample:
self.downsample = ConvBNAct(
in_channels=in_channels,
out_channels=in_channels,
kernel_size=3,
stride=2,
groups=in_channels,
use_act=False)
blocks_list = []
blocks_list.append(
HG_Block(
in_channels,
mid_channels,
out_channels,
layer_num,
identity=False))
for _ in range(block_num - 1):
blocks_list.append(
HG_Block(
out_channels,
mid_channels,
out_channels,
layer_num,
identity=True))
self.blocks = nn.Sequential(*blocks_list)
def forward(self, x):
if self.downsample:
x = self.downsample(x)
x = self.blocks(x)
return x
class PPHGNet(TheseusLayer):
"""
PPHGNet
Args:
stem_channels: list. Stem channel list of PPHGNet.
stage_config: dict. The configuration of each stage of PPHGNet. such as the number of channels, stride, etc.
layer_num: int. Number of layers of HG_Block.
use_last_conv: boolean. Whether to use a 1x1 convolutional layer before the classification layer.
class_expand: int=2048. Number of channels for the last 1x1 convolutional layer.
dropout_prob: float. Parameters of dropout, 0.0 means dropout is not used.
class_num: int=1000. The number of classes.
Returns:
model: nn.Layer. Specific PPHGNet model depends on args.
"""
def __init__(self,
stem_channels,
stage_config,
layer_num,
use_last_conv=True,
class_expand=2048,
dropout_prob=0.0,
class_num=1000):
super().__init__()
self.use_last_conv = use_last_conv
self.class_expand = class_expand
# stem
stem_channels.insert(0, 3)
self.stem = nn.Sequential(* [
ConvBNAct(
in_channels=stem_channels[i],
out_channels=stem_channels[i + 1],
kernel_size=3,
stride=2 if i == 0 else 1) for i in range(
len(stem_channels) - 1)
])
self.pool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
# stages
self.stages = nn.LayerList()
for k in stage_config:
in_channels, mid_channels, out_channels, block_num, downsample = stage_config[
k]
self.stages.append(
HG_Stage(in_channels, mid_channels, out_channels, block_num,
layer_num, downsample))
self.avg_pool = AdaptiveAvgPool2D(1)
if self.use_last_conv:
self.last_conv = Conv2D(
in_channels=out_channels,
out_channels=self.class_expand,
kernel_size=1,
stride=1,
padding=0,
bias_attr=False)
self.act = nn.ReLU()
self.dropout = nn.Dropout(
p=dropout_prob, mode="downscale_in_infer")
self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
self.fc = nn.Linear(self.class_expand
if self.use_last_conv else out_channels, class_num)
self._init_weights()
def _init_weights(self):
for m in self.sublayers():
if isinstance(m, nn.Conv2D):
kaiming_normal_(m.weight)
elif isinstance(m, (nn.BatchNorm2D)):
ones_(m.weight)
zeros_(m.bias)
elif isinstance(m, nn.Linear):
zeros_(m.bias)
def forward(self, x):
x = self.stem(x)
x = self.pool(x)
for stage in self.stages:
x = stage(x)
x = self.avg_pool(x)
if self.use_last_conv:
x = self.last_conv(x)
x = self.act(x)
x = self.dropout(x)
x = self.flatten(x)
x = self.fc(x)
return x
def _load_pretrained(pretrained, model, model_url, use_ssld):
if pretrained is False:
pass
elif pretrained is True:
load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
elif isinstance(pretrained, str):
load_dygraph_pretrain(model, pretrained)
else:
raise RuntimeError(
"pretrained type is not available. Please use `string` or `boolean` type."
)
def PPHGNet_tiny(pretrained=False, use_ssld=False, **kwargs):
"""
PPHGNet_tiny
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPHGNet_tiny` model depends on args.
"""
stage_config = {
# in_channels, mid_channels, out_channels, blocks, downsample
"stage1": [96, 96, 224, 1, False],
"stage2": [224, 128, 448, 1, True],
"stage3": [448, 160, 512, 2, True],
"stage4": [512, 192, 768, 1, True],
}
model = PPHGNet(
stem_channels=[48, 48, 96],
stage_config=stage_config,
layer_num=5,
**kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPHGNet_tiny"], use_ssld)
return model
def PPHGNet_small(pretrained=False, use_ssld=False, **kwargs):
"""
PPHGNet_small
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPHGNet_small` model depends on args.
"""
stage_config = {
# in_channels, mid_channels, out_channels, blocks, downsample
"stage1": [128, 128, 256, 1, False],
"stage2": [256, 160, 512, 1, True],
"stage3": [512, 192, 768, 2, True],
"stage4": [768, 224, 1024, 1, True],
}
model = PPHGNet(
stem_channels=[64, 64, 128],
stage_config=stage_config,
layer_num=6,
**kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPHGNet_small"], use_ssld)
return model
def PPHGNet_base(pretrained=False, use_ssld=False, **kwargs):
"""
PPHGNet_base
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPHGNet_base` model depends on args.
"""
stage_config = {
# in_channels, mid_channels, out_channels, blocks, downsample
"stage1": [160, 192, 320, 1, False],
"stage2": [320, 224, 640, 2, True],
"stage3": [640, 256, 960, 3, True],
"stage4": [960, 288, 1280, 2, True],
}
model = PPHGNet(
stem_channels=[96, 96, 160],
stage_config=stage_config,
layer_num=7,
dropout_prob=0.2,
**kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPHGNet_base"], use_ssld)
return model
...@@ -132,6 +132,7 @@ class DepthwiseSeparable(TheseusLayer): ...@@ -132,6 +132,7 @@ class DepthwiseSeparable(TheseusLayer):
lr_mult=lr_mult) lr_mult=lr_mult)
if use_se: if use_se:
self.se = SEModule(num_channels, lr_mult=lr_mult) self.se = SEModule(num_channels, lr_mult=lr_mult)
self.pw_conv = ConvBNLayer( self.pw_conv = ConvBNLayer(
num_channels=num_channels, num_channels=num_channels,
filter_size=1, filter_size=1,
......
...@@ -188,7 +188,7 @@ class RepDepthwiseSeparable(TheseusLayer): ...@@ -188,7 +188,7 @@ class RepDepthwiseSeparable(TheseusLayer):
def forward(self, x): def forward(self, x):
if self.use_rep: if self.use_rep:
input_x = x input_x = x
if not self.training: if self.is_repped:
x = self.act(self.dw_conv(x)) x = self.act(self.dw_conv(x))
else: else:
y = self.dw_conv_list[0](x) y = self.dw_conv_list[0](x)
...@@ -209,14 +209,12 @@ class RepDepthwiseSeparable(TheseusLayer): ...@@ -209,14 +209,12 @@ class RepDepthwiseSeparable(TheseusLayer):
x = x + input_x x = x + input_x
return x return x
def eval(self): def rep(self):
if self.use_rep: if self.use_rep:
self.is_repped = True
kernel, bias = self._get_equivalent_kernel_bias() kernel, bias = self._get_equivalent_kernel_bias()
self.dw_conv.weight.set_value(kernel) self.dw_conv.weight.set_value(kernel)
self.dw_conv.bias.set_value(bias) self.dw_conv.bias.set_value(bias)
self.training = False
for layer in self.sublayers():
layer.eval()
def _get_equivalent_kernel_bias(self): def _get_equivalent_kernel_bias(self):
kernel_sum = 0 kernel_sum = 0
......
...@@ -20,7 +20,7 @@ import numpy as np ...@@ -20,7 +20,7 @@ import numpy as np
import paddle import paddle
from paddle import ParamAttr from paddle import ParamAttr
import paddle.nn as nn import paddle.nn as nn
from paddle.nn import Conv2D, BatchNorm, Linear from paddle.nn import Conv2D, BatchNorm, Linear, BatchNorm2D
from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
from paddle.nn.initializer import Uniform from paddle.nn.initializer import Uniform
from paddle.regularizer import L2Decay from paddle.regularizer import L2Decay
...@@ -395,7 +395,10 @@ def _load_pretrained(pretrained, model, model_url, use_ssld): ...@@ -395,7 +395,10 @@ def _load_pretrained(pretrained, model, model_url, use_ssld):
elif pretrained is True: elif pretrained is True:
load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
elif isinstance(pretrained, str): elif isinstance(pretrained, str):
load_dygraph_pretrain(model, pretrained) if 'http' in pretrained:
load_dygraph_pretrain_from_url(model, pretrained, use_ssld=False)
else:
load_dygraph_pretrain(model, pretrained)
else: else:
raise RuntimeError( raise RuntimeError(
"pretrained type is not available. Please use `string` or `boolean` type." "pretrained type is not available. Please use `string` or `boolean` type."
......
...@@ -21,8 +21,8 @@ import paddle.nn as nn ...@@ -21,8 +21,8 @@ import paddle.nn as nn
import paddle.nn.functional as F import paddle.nn.functional as F
from paddle.nn.initializer import TruncatedNormal, Constant from paddle.nn.initializer import TruncatedNormal, Constant
from .vision_transformer import trunc_normal_, zeros_, ones_, to_2tuple, DropPath, Identity from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
from ppcls.arch.backbone.model_zoo.vision_transformer import trunc_normal_, zeros_, ones_, to_2tuple, DropPath, Identity
from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
MODEL_URLS = { MODEL_URLS = {
...@@ -589,7 +589,7 @@ class PatchEmbed(nn.Layer): ...@@ -589,7 +589,7 @@ class PatchEmbed(nn.Layer):
return flops return flops
class SwinTransformer(nn.Layer): class SwinTransformer(TheseusLayer):
""" Swin Transformer """ Swin Transformer
A PaddlePaddle impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows` - A PaddlePaddle impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows` -
https://arxiv.org/pdf/2103.14030 https://arxiv.org/pdf/2103.14030
......
# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# this code is based on AdaFace(https://github.com/mk-minchul/AdaFace)
from collections import namedtuple
import paddle
import paddle.nn as nn
from paddle.nn import Dropout
from paddle.nn import MaxPool2D
from paddle.nn import Sequential
from paddle.nn import Conv2D, Linear
from paddle.nn import BatchNorm1D, BatchNorm2D
from paddle.nn import ReLU, Sigmoid
from paddle.nn import Layer
from paddle.nn import PReLU
# from ppcls.arch.backbone.legendary_models.resnet import _load_pretrained
class Flatten(Layer):
""" Flat tensor
"""
def forward(self, input):
return paddle.reshape(input, [input.shape[0], -1])
class LinearBlock(Layer):
""" Convolution block without no-linear activation layer
"""
def __init__(self,
in_c,
out_c,
kernel=(1, 1),
stride=(1, 1),
padding=(0, 0),
groups=1):
super(LinearBlock, self).__init__()
self.conv = Conv2D(
in_c,
out_c,
kernel,
stride,
padding,
groups=groups,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=None)
weight_attr = paddle.ParamAttr(
regularizer=None, initializer=nn.initializer.Constant(value=1.0))
bias_attr = paddle.ParamAttr(
regularizer=None, initializer=nn.initializer.Constant(value=0.0))
self.bn = BatchNorm2D(
out_c, weight_attr=weight_attr, bias_attr=bias_attr)
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
return x
class GNAP(Layer):
""" Global Norm-Aware Pooling block
"""
def __init__(self, in_c):
super(GNAP, self).__init__()
self.bn1 = BatchNorm2D(in_c, weight_attr=False, bias_attr=False)
self.pool = nn.AdaptiveAvgPool2D((1, 1))
self.bn2 = BatchNorm1D(in_c, weight_attr=False, bias_attr=False)
def forward(self, x):
x = self.bn1(x)
x_norm = paddle.norm(x, 2, 1, True)
x_norm_mean = paddle.mean(x_norm)
weight = x_norm_mean / x_norm
x = x * weight
x = self.pool(x)
x = x.view(x.shape[0], -1)
feature = self.bn2(x)
return feature
class GDC(Layer):
""" Global Depthwise Convolution block
"""
def __init__(self, in_c, embedding_size):
super(GDC, self).__init__()
self.conv_6_dw = LinearBlock(
in_c,
in_c,
groups=in_c,
kernel=(7, 7),
stride=(1, 1),
padding=(0, 0))
self.conv_6_flatten = Flatten()
self.linear = Linear(
in_c,
embedding_size,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False)
self.bn = BatchNorm1D(
embedding_size, weight_attr=False, bias_attr=False)
def forward(self, x):
x = self.conv_6_dw(x)
x = self.conv_6_flatten(x)
x = self.linear(x)
x = self.bn(x)
return x
class SELayer(Layer):
""" SE block
"""
def __init__(self, channels, reduction):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2D(1)
weight_attr = paddle.ParamAttr(
initializer=paddle.nn.initializer.XavierUniform())
self.fc1 = Conv2D(
channels,
channels // reduction,
kernel_size=1,
padding=0,
weight_attr=weight_attr,
bias_attr=False)
self.relu = ReLU()
self.fc2 = Conv2D(
channels // reduction,
channels,
kernel_size=1,
padding=0,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False)
self.sigmoid = Sigmoid()
def forward(self, x):
module_input = x
x = self.avg_pool(x)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.sigmoid(x)
return module_input * x
class BasicBlockIR(Layer):
""" BasicBlock for IRNet
"""
def __init__(self, in_channel, depth, stride):
super(BasicBlockIR, self).__init__()
weight_attr = paddle.ParamAttr(
regularizer=None, initializer=nn.initializer.Constant(value=1.0))
bias_attr = paddle.ParamAttr(
regularizer=None, initializer=nn.initializer.Constant(value=0.0))
if in_channel == depth:
self.shortcut_layer = MaxPool2D(1, stride)
else:
self.shortcut_layer = Sequential(
Conv2D(
in_channel,
depth, (1, 1),
stride,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False),
BatchNorm2D(
depth, weight_attr=weight_attr, bias_attr=bias_attr))
self.res_layer = Sequential(
BatchNorm2D(
in_channel, weight_attr=weight_attr, bias_attr=bias_attr),
Conv2D(
in_channel,
depth, (3, 3), (1, 1),
1,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False),
BatchNorm2D(
depth, weight_attr=weight_attr, bias_attr=bias_attr),
PReLU(depth),
Conv2D(
depth,
depth, (3, 3),
stride,
1,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False),
BatchNorm2D(
depth, weight_attr=weight_attr, bias_attr=bias_attr))
def forward(self, x):
shortcut = self.shortcut_layer(x)
res = self.res_layer(x)
return res + shortcut
class BottleneckIR(Layer):
""" BasicBlock with bottleneck for IRNet
"""
def __init__(self, in_channel, depth, stride):
super(BottleneckIR, self).__init__()
reduction_channel = depth // 4
weight_attr = paddle.ParamAttr(
regularizer=None, initializer=nn.initializer.Constant(value=1.0))
bias_attr = paddle.ParamAttr(
regularizer=None, initializer=nn.initializer.Constant(value=0.0))
if in_channel == depth:
self.shortcut_layer = MaxPool2D(1, stride)
else:
self.shortcut_layer = Sequential(
Conv2D(
in_channel,
depth, (1, 1),
stride,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False),
BatchNorm2D(
depth, weight_attr=weight_attr, bias_attr=bias_attr))
self.res_layer = Sequential(
BatchNorm2D(
in_channel, weight_attr=weight_attr, bias_attr=bias_attr),
Conv2D(
in_channel,
reduction_channel, (1, 1), (1, 1),
0,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False),
BatchNorm2D(
reduction_channel,
weight_attr=weight_attr,
bias_attr=bias_attr),
PReLU(reduction_channel),
Conv2D(
reduction_channel,
reduction_channel, (3, 3), (1, 1),
1,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False),
BatchNorm2D(
reduction_channel,
weight_attr=weight_attr,
bias_attr=bias_attr),
PReLU(reduction_channel),
Conv2D(
reduction_channel,
depth, (1, 1),
stride,
0,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False),
BatchNorm2D(
depth, weight_attr=weight_attr, bias_attr=bias_attr))
def forward(self, x):
shortcut = self.shortcut_layer(x)
res = self.res_layer(x)
return res + shortcut
class BasicBlockIRSE(BasicBlockIR):
def __init__(self, in_channel, depth, stride):
super(BasicBlockIRSE, self).__init__(in_channel, depth, stride)
self.res_layer.add_sublayer("se_block", SELayer(depth, 16))
class BottleneckIRSE(BottleneckIR):
def __init__(self, in_channel, depth, stride):
super(BottleneckIRSE, self).__init__(in_channel, depth, stride)
self.res_layer.add_sublayer("se_block", SELayer(depth, 16))
class Bottleneck(namedtuple('Block', ['in_channel', 'depth', 'stride'])):
'''A named tuple describing a ResNet block.'''
def get_block(in_channel, depth, num_units, stride=2):
return [Bottleneck(in_channel, depth, stride)] +\
[Bottleneck(depth, depth, 1) for i in range(num_units - 1)]
def get_blocks(num_layers):
if num_layers == 18:
blocks = [
get_block(
in_channel=64, depth=64, num_units=2), get_block(
in_channel=64, depth=128, num_units=2), get_block(
in_channel=128, depth=256, num_units=2), get_block(
in_channel=256, depth=512, num_units=2)
]
elif num_layers == 34:
blocks = [
get_block(
in_channel=64, depth=64, num_units=3), get_block(
in_channel=64, depth=128, num_units=4), get_block(
in_channel=128, depth=256, num_units=6), get_block(
in_channel=256, depth=512, num_units=3)
]
elif num_layers == 50:
blocks = [
get_block(
in_channel=64, depth=64, num_units=3), get_block(
in_channel=64, depth=128, num_units=4), get_block(
in_channel=128, depth=256, num_units=14), get_block(
in_channel=256, depth=512, num_units=3)
]
elif num_layers == 100:
blocks = [
get_block(
in_channel=64, depth=64, num_units=3), get_block(
in_channel=64, depth=128, num_units=13), get_block(
in_channel=128, depth=256, num_units=30), get_block(
in_channel=256, depth=512, num_units=3)
]
elif num_layers == 152:
blocks = [
get_block(
in_channel=64, depth=256, num_units=3), get_block(
in_channel=256, depth=512, num_units=8), get_block(
in_channel=512, depth=1024, num_units=36), get_block(
in_channel=1024, depth=2048, num_units=3)
]
elif num_layers == 200:
blocks = [
get_block(
in_channel=64, depth=256, num_units=3), get_block(
in_channel=256, depth=512, num_units=24), get_block(
in_channel=512, depth=1024, num_units=36), get_block(
in_channel=1024, depth=2048, num_units=3)
]
return blocks
class Backbone(Layer):
def __init__(self, input_size, num_layers, mode='ir'):
""" Args:
input_size: input_size of backbone
num_layers: num_layers of backbone
mode: support ir or irse
"""
super(Backbone, self).__init__()
assert input_size[0] in [112, 224], \
"input_size should be [112, 112] or [224, 224]"
assert num_layers in [18, 34, 50, 100, 152, 200], \
"num_layers should be 18, 34, 50, 100 or 152"
assert mode in ['ir', 'ir_se'], \
"mode should be ir or ir_se"
weight_attr = paddle.ParamAttr(
regularizer=None, initializer=nn.initializer.Constant(value=1.0))
bias_attr = paddle.ParamAttr(
regularizer=None, initializer=nn.initializer.Constant(value=0.0))
self.input_layer = Sequential(
Conv2D(
3,
64, (3, 3),
1,
1,
weight_attr=nn.initializer.KaimingNormal(),
bias_attr=False),
BatchNorm2D(
64, weight_attr=weight_attr, bias_attr=bias_attr),
PReLU(64))
blocks = get_blocks(num_layers)
if num_layers <= 100:
if mode == 'ir':
unit_module = BasicBlockIR
elif mode == 'ir_se':
unit_module = BasicBlockIRSE
output_channel = 512
else:
if mode == 'ir':
unit_module = BottleneckIR
elif mode == 'ir_se':
unit_module = BottleneckIRSE
output_channel = 2048
if input_size[0] == 112:
self.output_layer = Sequential(
BatchNorm2D(
output_channel,
weight_attr=weight_attr,
bias_attr=bias_attr),
Dropout(0.4),
Flatten(),
Linear(
output_channel * 7 * 7,
512,
weight_attr=nn.initializer.KaimingNormal()),
BatchNorm1D(
512, weight_attr=False, bias_attr=False))
else:
self.output_layer = Sequential(
BatchNorm2D(
output_channel,
weight_attr=weight_attr,
bias_attr=bias_attr),
Dropout(0.4),
Flatten(),
Linear(
output_channel * 14 * 14,
512,
weight_attr=nn.initializer.KaimingNormal()),
BatchNorm1D(
512, weight_attr=False, bias_attr=False))
modules = []
for block in blocks:
for bottleneck in block:
modules.append(
unit_module(bottleneck.in_channel, bottleneck.depth,
bottleneck.stride))
self.body = Sequential(*modules)
# initialize_weights(self.modules())
def forward(self, x):
# current code only supports one extra image
# it comes with a extra dimension for number of extra image. We will just squeeze it out for now
x = self.input_layer(x)
for idx, module in enumerate(self.body):
x = module(x)
x = self.output_layer(x)
# norm = paddle.norm(x, 2, 1, True)
# output = paddle.divide(x, norm)
# return output, norm
return x
def AdaFace_IR_18(input_size=(112, 112)):
""" Constructs a ir-18 model.
"""
model = Backbone(input_size, 18, 'ir')
return model
def AdaFace_IR_34(input_size=(112, 112)):
""" Constructs a ir-34 model.
"""
model = Backbone(input_size, 34, 'ir')
return model
def AdaFace_IR_50(input_size=(112, 112)):
""" Constructs a ir-50 model.
"""
model = Backbone(input_size, 50, 'ir')
return model
def AdaFace_IR_101(input_size=(112, 112)):
""" Constructs a ir-101 model.
"""
model = Backbone(input_size, 100, 'ir')
return model
def AdaFace_IR_152(input_size=(112, 112)):
""" Constructs a ir-152 model.
"""
model = Backbone(input_size, 152, 'ir')
return model
def AdaFace_IR_200(input_size=(112, 112)):
""" Constructs a ir-200 model.
"""
model = Backbone(input_size, 200, 'ir')
return model
def AdaFace_IR_SE_50(input_size=(112, 112)):
""" Constructs a ir_se-50 model.
"""
model = Backbone(input_size, 50, 'ir_se')
return model
def AdaFace_IR_SE_101(input_size=(112, 112)):
""" Constructs a ir_se-101 model.
"""
model = Backbone(input_size, 100, 'ir_se')
return model
def AdaFace_IR_SE_152(input_size=(112, 112)):
""" Constructs a ir_se-152 model.
"""
model = Backbone(input_size, 152, 'ir_se')
return model
def AdaFace_IR_SE_200(input_size=(112, 112)):
""" Constructs a ir_se-200 model.
"""
model = Backbone(input_size, 200, 'ir_se')
return model
...@@ -124,13 +124,7 @@ class RepVGGBlock(nn.Layer): ...@@ -124,13 +124,7 @@ class RepVGGBlock(nn.Layer):
groups=groups) groups=groups)
def forward(self, inputs): def forward(self, inputs):
if not self.training and not self.is_repped: if self.is_repped:
self.rep()
self.is_repped = True
if self.training and self.is_repped:
self.is_repped = False
if not self.training:
return self.nonlinearity(self.rbr_reparam(inputs)) return self.nonlinearity(self.rbr_reparam(inputs))
if self.rbr_identity is None: if self.rbr_identity is None:
...@@ -154,6 +148,7 @@ class RepVGGBlock(nn.Layer): ...@@ -154,6 +148,7 @@ class RepVGGBlock(nn.Layer):
kernel, bias = self.get_equivalent_kernel_bias() kernel, bias = self.get_equivalent_kernel_bias()
self.rbr_reparam.weight.set_value(kernel) self.rbr_reparam.weight.set_value(kernel)
self.rbr_reparam.bias.set_value(bias) self.rbr_reparam.bias.set_value(bias)
self.is_repped = True
def get_equivalent_kernel_bias(self): def get_equivalent_kernel_bias(self):
kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense) kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
......
...@@ -19,6 +19,7 @@ from .fc import FC ...@@ -19,6 +19,7 @@ from .fc import FC
from .vehicle_neck import VehicleNeck from .vehicle_neck import VehicleNeck
from paddle.nn import Tanh from paddle.nn import Tanh
from .bnneck import BNNeck from .bnneck import BNNeck
from .adamargin import AdaMargin
__all__ = ['build_gear'] __all__ = ['build_gear']
...@@ -26,7 +27,7 @@ __all__ = ['build_gear'] ...@@ -26,7 +27,7 @@ __all__ = ['build_gear']
def build_gear(config): def build_gear(config):
support_dict = [ support_dict = [
'ArcMargin', 'CosMargin', 'CircleMargin', 'FC', 'VehicleNeck', 'Tanh', 'ArcMargin', 'CosMargin', 'CircleMargin', 'FC', 'VehicleNeck', 'Tanh',
'BNNeck' 'BNNeck', 'AdaMargin'
] ]
module_name = config.pop('name') module_name = config.pop('name')
assert module_name in support_dict, Exception( assert module_name in support_dict, Exception(
......
# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This code is based on AdaFace(https://github.com/mk-minchul/AdaFace)
# Paper: AdaFace: Quality Adaptive Margin for Face Recognition
from paddle.nn import Layer
import math
import paddle
def l2_norm(input, axis=1):
norm = paddle.norm(input, 2, axis, True)
output = paddle.divide(input, norm)
return output
class AdaMargin(Layer):
def __init__(
self,
embedding_size=512,
class_num=70722,
m=0.4,
h=0.333,
s=64.,
t_alpha=1.0, ):
super(AdaMargin, self).__init__()
self.classnum = class_num
kernel_weight = paddle.uniform(
[embedding_size, class_num], min=-1, max=1)
kernel_weight_norm = paddle.norm(
kernel_weight, p=2, axis=0, keepdim=True)
kernel_weight_norm = paddle.where(kernel_weight_norm > 1e-5,
kernel_weight_norm,
paddle.ones_like(kernel_weight_norm))
kernel_weight = kernel_weight / kernel_weight_norm
self.kernel = self.create_parameter(
[embedding_size, class_num],
attr=paddle.nn.initializer.Assign(kernel_weight))
# initial kernel
# self.kernel.data.uniform_(-1, 1).renorm_(2,1,1e-5).mul_(1e5)
self.m = m
self.eps = 1e-3
self.h = h
self.s = s
# ema prep
self.t_alpha = t_alpha
self.register_buffer('t', paddle.zeros([1]), persistable=True)
self.register_buffer(
'batch_mean', paddle.ones([1]) * 20, persistable=True)
self.register_buffer(
'batch_std', paddle.ones([1]) * 100, persistable=True)
def forward(self, embbedings, label):
norms = paddle.norm(embbedings, 2, 1, True)
embbedings = paddle.divide(embbedings, norms)
kernel_norm = l2_norm(self.kernel, axis=0)
cosine = paddle.mm(embbedings, kernel_norm)
cosine = paddle.clip(cosine, -1 + self.eps,
1 - self.eps) # for stability
safe_norms = paddle.clip(norms, min=0.001, max=100) # for stability
safe_norms = safe_norms.clone().detach()
# update batchmean batchstd
with paddle.no_grad():
mean = safe_norms.mean().detach()
std = safe_norms.std().detach()
self.batch_mean = mean * self.t_alpha + (1 - self.t_alpha
) * self.batch_mean
self.batch_std = std * self.t_alpha + (1 - self.t_alpha
) * self.batch_std
margin_scaler = (safe_norms - self.batch_mean) / (
self.batch_std + self.eps) # 66% between -1, 1
margin_scaler = margin_scaler * self.h # 68% between -0.333 ,0.333 when h:0.333
margin_scaler = paddle.clip(margin_scaler, -1, 1)
# g_angular
m_arc = paddle.nn.functional.one_hot(
label.reshape([-1]), self.classnum)
g_angular = self.m * margin_scaler * -1
m_arc = m_arc * g_angular
theta = paddle.acos(cosine)
theta_m = paddle.clip(
theta + m_arc, min=self.eps, max=math.pi - self.eps)
cosine = paddle.cos(theta_m)
# g_additive
m_cos = paddle.nn.functional.one_hot(
label.reshape([-1]), self.classnum)
g_add = self.m + (self.m * margin_scaler)
m_cos = m_cos * g_add
cosine = cosine - m_cos
# scale
scaled_cosine_m = cosine * self.s
return scaled_cosine_m
...@@ -40,12 +40,14 @@ QUANT_CONFIG = { ...@@ -40,12 +40,14 @@ QUANT_CONFIG = {
} }
def quantize_model(config, model): def quantize_model(config, model, mode="train"):
if config.get("Slim", False) and config["Slim"].get("quant", False): if config.get("Slim", False) and config["Slim"].get("quant", False):
from paddleslim.dygraph.quant import QAT from paddleslim.dygraph.quant import QAT
assert config["Slim"]["quant"]["name"].lower( assert config["Slim"]["quant"]["name"].lower(
) == 'pact', 'Only PACT quantization method is supported now' ) == 'pact', 'Only PACT quantization method is supported now'
QUANT_CONFIG["activation_preprocess_type"] = "PACT" QUANT_CONFIG["activation_preprocess_type"] = "PACT"
if mode in ["infer", "export"]:
QUANT_CONFIG['activation_preprocess_type'] = None
model.quanter = QAT(config=QUANT_CONFIG) model.quanter = QAT(config=QUANT_CONFIG)
model.quanter.quantize(model) model.quanter.quantize(model)
logger.info("QAT model summary:") logger.info("QAT model summary:")
......
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: "./output/"
device: "gpu"
save_interval: 5
eval_during_train: True
eval_interval: 1
epochs: 30
print_batch_step: 20
use_visualdl: False
# used for static mode and model export
image_shape: [3, 256, 192]
save_inference_dir: "./inference"
use_multilabel: True
# model architecture
Arch:
name: "ResNet50"
pretrained: True
class_num: 26
infer_add_softmax: False
# loss function config for traing/eval process
Loss:
Train:
- MultiLabelLoss:
weight: 1.0
weight_ratio: True
size_sum: True
Eval:
- MultiLabelLoss:
weight: 1.0
weight_ratio: True
size_sum: True
Optimizer:
name: Adam
lr:
name: Piecewise
decay_epochs: [12, 18, 24, 28]
values: [0.0001, 0.00001, 0.000001, 0.0000001]
regularizer:
name: 'L2'
coeff: 0.0005
clip_norm: 10
# data loader for train and eval
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: "dataset/attribute/data/"
cls_label_path: "dataset/attribute/trainval.txt"
label_ratio: True
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [192, 256]
- Padv2:
size: [212, 276]
pad_mode: 1
fill_value: 0
- RandomCropImage:
size: [192, 256]
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: True
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/attribute/data/"
cls_label_path: "dataset/attribute/test.txt"
label_ratio: True
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [192, 256]
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Metric:
Eval:
- ATTRMetric:
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 600
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# training model under @to_static
to_static: False
use_dali: False
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
# O1: mixed fp16
level: O1
# model architecture
Arch:
name: PPHGNet_small
class_num: 1000
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.5
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m7-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.2
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 128
drop_last: False
shuffle: True
loader:
num_workers: 16
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 236
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 128
drop_last: False
shuffle: False
loader:
num_workers: 16
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 236
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 600
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# training model under @to_static
to_static: False
use_dali: False
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
# O1: mixed fp16
level: O1
# model architecture
Arch:
name: PPHGNet_tiny
class_num: 1000
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.5
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m7-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.2
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 128
drop_last: False
shuffle: True
loader:
num_workers: 16
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 232
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 128
drop_last: False
shuffle: False
loader:
num_workers: 16
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 232
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]
...@@ -105,7 +105,6 @@ DataLoader: ...@@ -105,7 +105,6 @@ DataLoader:
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ''
output_fp16: True
channel_num: *image_channel channel_num: *image_channel
sampler: sampler:
name: DistributedBatchSampler name: DistributedBatchSampler
...@@ -132,7 +131,6 @@ Infer: ...@@ -132,7 +131,6 @@ Infer:
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ''
output_fp16: True
channel_num: *image_channel channel_num: *image_channel
- ToCHWImage: - ToCHWImage:
PostProcess: PostProcess:
......
...@@ -15,6 +15,13 @@ Global: ...@@ -15,6 +15,13 @@ Global:
image_shape: [*image_channel, 224, 224] image_shape: [*image_channel, 224, 224]
save_inference_dir: ./inference save_inference_dir: ./inference
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
# O2: pure fp16
level: O2
# model architecture # model architecture
Arch: Arch:
name: SE_ResNeXt101_32x4d name: SE_ResNeXt101_32x4d
...@@ -32,13 +39,6 @@ Loss: ...@@ -32,13 +39,6 @@ Loss:
- CELoss: - CELoss:
weight: 1.0 weight: 1.0
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
# O2: pure fp16
level: O2
Optimizer: Optimizer:
name: Momentum name: Momentum
momentum: 0.9 momentum: 0.9
...@@ -99,10 +99,9 @@ DataLoader: ...@@ -99,10 +99,9 @@ DataLoader:
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ''
output_fp16: True
channel_num: *image_channel channel_num: *image_channel
sampler: sampler:
name: BatchSampler name: DistributedBatchSampler
batch_size: 64 batch_size: 64
drop_last: False drop_last: False
shuffle: False shuffle: False
...@@ -126,7 +125,6 @@ Infer: ...@@ -126,7 +125,6 @@ Infer:
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ''
output_fp16: True
channel_num: *image_channel channel_num: *image_channel
- ToCHWImage: - ToCHWImage:
PostProcess: PostProcess:
......
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output
device: gpu
save_interval: 1
eval_during_train: True
start_eval_epoch: 1
eval_interval: 1
epochs: 20
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# training model under @to_static
to_static: False
use_dali: False
# model architecture
Arch:
name: "DistillationModel"
class_num: &class_num 2
# if not null, its lengths should be same as models
pretrained_list:
# if not null, its lengths should be same as models
freeze_params_list:
- True
- False
use_sync_bn: True
models:
- Teacher:
name: ResNet101_vd
class_num: *class_num
- Student:
name: PPLCNet_x1_0
class_num: *class_num
pretrained: True
use_ssld: True
infer_model_name: "Student"
# loss function config for traing/eval process
Loss:
Train:
- DistillationDMLLoss:
weight: 1.0
model_name_pairs:
- ["Student", "Teacher"]
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.01
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/train_list_for_distill.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 192
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
prob: 0.0
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 192
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.1
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: True
loader:
num_workers: 16
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: ThreshOutput
threshold: 0.9
label_0: nobody
label_1: someone
Metric:
Train:
- DistillationTopkAcc:
model_key: "Student"
topk: [1, 2]
Eval:
- TprAtFpr:
- TopkAcc:
topk: [1, 2]
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
start_eval_epoch: 10
epochs: 20
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# training model under @to_static
to_static: False
use_dali: False
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
# O1: mixed fp16
level: O1
# model architecture
Arch:
name: MobileNetV3_large_x1_0
class_num: 2
pretrained: True
use_sync_bn: True
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.13
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00002
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: True
loader:
num_workers: 8
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: ThreshOutput
threshold: 0.9
label_0: nobody
label_1: someone
Metric:
Train:
- TopkAcc:
topk: [1, 2]
Eval:
- TprAtFpr:
- TopkAcc:
topk: [1, 2]
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
start_eval_epoch: 10
epochs: 20
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# training model under @to_static
to_static: False
use_dali: False
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
# O1: mixed fp16
level: O1
# model architecture
Arch:
name: SwinTransformer_tiny_patch4_window7_224
class_num: 2
pretrained: True
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
one_dim_param_no_weight_decay: True
lr:
name: Cosine
learning_rate: 1e-4
eta_min: 2e-6
warmup_epoch: 5
warmup_start_lr: 2e-7
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 128
drop_last: False
shuffle: True
loader:
num_workers: 8
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 8
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: ThreshOutput
threshold: 0.9
label_0: nobody
label_1: someone
Metric:
Train:
- TopkAcc:
topk: [1, 2]
Eval:
- TprAtFpr:
- TopkAcc:
topk: [1, 2]
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
start_eval_epoch: 10
epochs: 20
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# training model under @to_static
to_static: False
use_dali: False
# model architecture
Arch:
name: PPLCNet_x1_0
class_num: 2
pretrained: True
use_ssld: True
use_sync_bn: True
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.01
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 192
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
prob: 0.0
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 192
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.1
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: True
loader:
num_workers: 8
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: ThreshOutput
threshold: 0.9
label_0: nobody
label_1: someone
Metric:
Train:
- TopkAcc:
topk: [1, 2]
Eval:
- TprAtFpr:
- TopkAcc:
topk: [1, 2]
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
start_eval_epoch: 10
epochs: 20
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# training model under @to_static
to_static: False
use_dali: False
# model architecture
Arch:
name: PPLCNet_x1_0
class_num: 2
pretrained: True
use_ssld: True
use_sync_bn: True
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.01
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
prob: 0.0
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.0
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: True
loader:
num_workers: 8
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/person/
cls_label_path: ./dataset/person/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: ThreshOutput
threshold: 0.9
label_0: nobody
label_1: someone
Metric:
Train:
- TopkAcc:
topk: [1, 2]
Eval:
- TprAtFpr:
- TopkAcc:
topk: [1, 2]
base_config_file: ppcls/configs/PULC/person/PPLCNet/PPLCNet_x1_0_search.yaml
distill_config_file: ppcls/configs/PULC/person/Distillation/PPLCNet_x1_0_distillation.yaml
gpus: 0,1,2,3
output_dir: output/search_person
search_times: 1
search_dict:
- search_key: lrs
replace_config:
- Optimizer.lr.learning_rate
search_values: [0.0075, 0.01, 0.0125]
- search_key: resolutions
replace_config:
- DataLoader.Train.dataset.transform_ops.1.RandCropImage.size
- DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.img_size
search_values: [176, 192, 224]
- search_key: ra_probs
replace_config:
- DataLoader.Train.dataset.transform_ops.3.TimmAutoAugment.prob
search_values: [0.0, 0.1, 0.5]
- search_key: re_probs
replace_config:
- DataLoader.Train.dataset.transform_ops.5.RandomErasing.EPSILON
search_values: [0.0, 0.1, 0.5]
- search_key: lr_mult_list
replace_config:
- Arch.lr_mult_list
search_values:
- [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
- [0.0, 0.4, 0.4, 0.8, 0.8, 1.0]
- [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
teacher:
rm_keys:
- Arch.lr_mult_list
search_values:
- ResNet101_vd
- ResNet50_vd
final_replace:
Arch.lr_mult_list: Arch.models.1.Student.lr_mult_list
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: "./output/"
device: "gpu"
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 26
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 112, 112]
save_inference_dir: "./inference"
eval_mode: "adaface"
# model architecture
Arch:
name: "RecModel"
infer_output_key: "features"
infer_add_softmax: False
Backbone:
name: "AdaFace_IR_18"
input_size: [112, 112]
Head:
name: "AdaMargin"
embedding_size: 512
class_num: 70722
m: 0.4
s: 64
h: 0.333
t_alpha: 0.01
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [12, 20, 24]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0005
# data loader for train and eval
DataLoader:
Train:
dataset:
name: "AdaFaceDataset"
root_dir: "dataset/face/"
label_path: "dataset/face/train_filter_label.txt"
transform:
- CropWithPadding:
prob: 0.2
padding_num: 0
size: [112, 112]
scale: [0.2, 1.0]
ratio: [0.75, 1.3333333333333333]
- RandomInterpolationAugment:
prob: 0.2
- ColorJitter:
prob: 0.2
brightness: 0.5
contrast: 0.5
saturation: 0.5
hue: 0
- RandomHorizontalFlip:
- ToTensor:
- Normalize:
mean: [0.5, 0.5, 0.5]
std: [0.5, 0.5, 0.5]
sampler:
name: DistributedBatchSampler
batch_size: 256
drop_last: False
shuffle: True
loader:
num_workers: 6
use_shared_memory: True
Eval:
dataset:
name: FiveValidationDataset
val_data_path: dataset/face/faces_emore
concat_mem_file_name: dataset/face/faces_emore/concat_validation_memfile
sampler:
name: BatchSampler
batch_size: 256
drop_last: False
shuffle: True
loader:
num_workers: 6
use_shared_memory: True
Metric:
Train:
- TopkAcc:
topk: [1, 5]
\ No newline at end of file
...@@ -12,6 +12,7 @@ Global: ...@@ -12,6 +12,7 @@ Global:
use_visualdl: False use_visualdl: False
eval_mode: "retrieval" eval_mode: "retrieval"
retrieval_feature_from: "backbone" # 'backbone' or 'neck' retrieval_feature_from: "backbone" # 'backbone' or 'neck'
re_ranking: False
# used for static mode and model export # used for static mode and model export
image_shape: [3, 256, 128] image_shape: [3, 256, 128]
save_inference_dir: "./inference" save_inference_dir: "./inference"
...@@ -23,7 +24,7 @@ Arch: ...@@ -23,7 +24,7 @@ Arch:
infer_add_softmax: False infer_add_softmax: False
Backbone: Backbone:
name: "ResNet50" name: "ResNet50"
pretrained: True pretrained: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/others/resnet50-19c8e357_torch2paddle.pdparams
stem_act: null stem_act: null
BackboneStopLayer: BackboneStopLayer:
name: "flatten" name: "flatten"
......
...@@ -12,6 +12,7 @@ Global: ...@@ -12,6 +12,7 @@ Global:
use_visualdl: False use_visualdl: False
eval_mode: "retrieval" eval_mode: "retrieval"
retrieval_feature_from: "features" # 'backbone' or 'features' retrieval_feature_from: "features" # 'backbone' or 'features'
re_ranking: False
# used for static mode and model export # used for static mode and model export
image_shape: [3, 256, 128] image_shape: [3, 256, 128]
save_inference_dir: "./inference" save_inference_dir: "./inference"
...@@ -23,7 +24,7 @@ Arch: ...@@ -23,7 +24,7 @@ Arch:
infer_add_softmax: False infer_add_softmax: False
Backbone: Backbone:
name: "ResNet50_last_stage_stride1" name: "ResNet50_last_stage_stride1"
pretrained: True pretrained: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/others/resnet50-19c8e357_torch2paddle.pdparams
stem_act: null stem_act: null
BackboneStopLayer: BackboneStopLayer:
name: "flatten" name: "flatten"
......
...@@ -12,6 +12,7 @@ Global: ...@@ -12,6 +12,7 @@ Global:
use_visualdl: False use_visualdl: False
eval_mode: "retrieval" eval_mode: "retrieval"
retrieval_feature_from: "features" # 'backbone' or 'features' retrieval_feature_from: "features" # 'backbone' or 'features'
re_ranking: False
# used for static mode and model export # used for static mode and model export
image_shape: [3, 256, 128] image_shape: [3, 256, 128]
save_inference_dir: "./inference" save_inference_dir: "./inference"
...@@ -23,7 +24,7 @@ Arch: ...@@ -23,7 +24,7 @@ Arch:
infer_add_softmax: False infer_add_softmax: False
Backbone: Backbone:
name: "ResNet50_last_stage_stride1" name: "ResNet50_last_stage_stride1"
pretrained: True pretrained: https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/others/resnet50-19c8e357_torch2paddle.pdparams
stem_act: null stem_act: null
BackboneStopLayer: BackboneStopLayer:
name: "flatten" name: "flatten"
......
...@@ -30,6 +30,7 @@ from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset ...@@ -30,6 +30,7 @@ from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
from ppcls.data.dataloader.mix_dataset import MixDataset from ppcls.data.dataloader.mix_dataset import MixDataset
from ppcls.data.dataloader.multi_scale_dataset import MultiScaleDataset from ppcls.data.dataloader.multi_scale_dataset import MultiScaleDataset
from ppcls.data.dataloader.person_dataset import Market1501, MSMT17 from ppcls.data.dataloader.person_dataset import Market1501, MSMT17
from ppcls.data.dataloader.face_dataset import FiveValidationDataset, AdaFaceDataset
# sampler # sampler
...@@ -88,7 +89,7 @@ def build_dataloader(config, mode, device, use_dali=False, seed=None): ...@@ -88,7 +89,7 @@ def build_dataloader(config, mode, device, use_dali=False, seed=None):
# build sampler # build sampler
config_sampler = config[mode]['sampler'] config_sampler = config[mode]['sampler']
if "name" not in config_sampler: if config_sampler and "name" not in config_sampler:
batch_sampler = None batch_sampler = None
batch_size = config_sampler["batch_size"] batch_size = config_sampler["batch_size"]
drop_last = config_sampler["drop_last"] drop_last = config_sampler["drop_last"]
......
...@@ -10,3 +10,4 @@ from ppcls.data.dataloader.mix_sampler import MixSampler ...@@ -10,3 +10,4 @@ from ppcls.data.dataloader.mix_sampler import MixSampler
from ppcls.data.dataloader.multi_scale_sampler import MultiScaleSampler from ppcls.data.dataloader.multi_scale_sampler import MultiScaleSampler
from ppcls.data.dataloader.pk_sampler import PKSampler from ppcls.data.dataloader.pk_sampler import PKSampler
from ppcls.data.dataloader.person_dataset import Market1501, MSMT17 from ppcls.data.dataloader.person_dataset import Market1501, MSMT17
from ppcls.data.dataloader.face_dataset import AdaFaceDataset, FiveValidationDataset
...@@ -44,11 +44,11 @@ def create_operators(params): ...@@ -44,11 +44,11 @@ def create_operators(params):
class CommonDataset(Dataset): class CommonDataset(Dataset):
def __init__( def __init__(self,
self, image_root,
image_root, cls_label_path,
cls_label_path, transform_ops=None,
transform_ops=None, ): label_ratio=False):
self._img_root = image_root self._img_root = image_root
self._cls_path = cls_label_path self._cls_path = cls_label_path
if transform_ops: if transform_ops:
...@@ -56,7 +56,10 @@ class CommonDataset(Dataset): ...@@ -56,7 +56,10 @@ class CommonDataset(Dataset):
self.images = [] self.images = []
self.labels = [] self.labels = []
self._load_anno() if label_ratio:
self.label_ratio = self._load_anno(label_ratio=label_ratio)
else:
self._load_anno()
def _load_anno(self): def _load_anno(self):
pass pass
......
import os
import json
import numpy as np
from PIL import Image
import cv2
import paddle
import paddle.vision.datasets as datasets
from paddle.vision import transforms
from paddle.vision.transforms import functional as F
from paddle.io import Dataset
from .common_dataset import create_operators
from ppcls.data.preprocess import transform as transform_func
# code is based on AdaFace: https://github.com/mk-minchul/AdaFace
class AdaFaceDataset(Dataset):
def __init__(self, root_dir, label_path, transform=None):
self.root_dir = root_dir
self.transform = create_operators(transform)
with open(label_path) as fd:
lines = fd.readlines()
self.samples = []
for l in lines:
l = l.strip().split()
self.samples.append([os.path.join(root_dir, l[0]), int(l[1])])
def __len__(self):
return len(self.samples)
def __getitem__(self, index):
"""
Args:
index (int): Index
Returns:
tuple: (sample, target) where target is class_index of the target class.
"""
[path, target] = self.samples[index]
with open(path, 'rb') as f:
img = Image.open(f)
sample = img.convert('RGB')
# if 'WebFace' in self.root:
# # swap rgb to bgr since image is in rgb for webface
# sample = Image.fromarray(np.asarray(sample)[:, :, ::-1]
if self.transform is not None:
sample = transform_func(sample, self.transform)
return sample, target
class FiveValidationDataset(Dataset):
def __init__(self, val_data_path, concat_mem_file_name):
'''
concatenates all validation datasets from emore
val_data_dict = {
'agedb_30': (agedb_30, agedb_30_issame),
"cfp_fp": (cfp_fp, cfp_fp_issame),
"lfw": (lfw, lfw_issame),
"cplfw": (cplfw, cplfw_issame),
"calfw": (calfw, calfw_issame),
}
agedb_30: 0
cfp_fp: 1
lfw: 2
cplfw: 3
calfw: 4
'''
val_data = get_val_data(val_data_path)
age_30, cfp_fp, lfw, age_30_issame, cfp_fp_issame, lfw_issame, cplfw, cplfw_issame, calfw, calfw_issame = val_data
val_data_dict = {
'agedb_30': (age_30, age_30_issame),
"cfp_fp": (cfp_fp, cfp_fp_issame),
"lfw": (lfw, lfw_issame),
"cplfw": (cplfw, cplfw_issame),
"calfw": (calfw, calfw_issame),
}
self.dataname_to_idx = {
"agedb_30": 0,
"cfp_fp": 1,
"lfw": 2,
"cplfw": 3,
"calfw": 4
}
self.val_data_dict = val_data_dict
# concat all dataset
all_imgs = []
all_issame = []
all_dataname = []
key_orders = []
for key, (imgs, issame) in val_data_dict.items():
all_imgs.append(imgs)
dup_issame = [
] # hacky way to make the issame length same as imgs. [1, 1, 0, 0, ...]
for same in issame:
dup_issame.append(same)
dup_issame.append(same)
all_issame.append(dup_issame)
all_dataname.append([self.dataname_to_idx[key]] * len(imgs))
key_orders.append(key)
assert key_orders == ['agedb_30', 'cfp_fp', 'lfw', 'cplfw', 'calfw']
if isinstance(all_imgs[0], np.memmap):
self.all_imgs = read_memmap(concat_mem_file_name)
else:
self.all_imgs = np.concatenate(all_imgs)
self.all_issame = np.concatenate(all_issame)
self.all_dataname = np.concatenate(all_dataname)
def __getitem__(self, index):
x_np = self.all_imgs[index].copy()
x = paddle.to_tensor(x_np)
y = self.all_issame[index]
dataname = self.all_dataname[index]
return x, y, dataname, index
def __len__(self):
return len(self.all_imgs)
def read_memmap(mem_file_name):
# r+ mode: Open existing file for reading and writing
with open(mem_file_name + '.conf', 'r') as file:
memmap_configs = json.load(file)
return np.memmap(mem_file_name, mode='r+', \
shape=tuple(memmap_configs['shape']), \
dtype=memmap_configs['dtype'])
def get_val_pair(path, name, use_memfile=True):
# installing bcolz should set proxy to access internet
import bcolz
if use_memfile:
mem_file_dir = os.path.join(path, name, 'memfile')
mem_file_name = os.path.join(mem_file_dir, 'mem_file.dat')
if os.path.isdir(mem_file_dir):
print('laoding validation data memfile')
np_array = read_memmap(mem_file_name)
else:
os.makedirs(mem_file_dir)
carray = bcolz.carray(rootdir=os.path.join(path, name), mode='r')
np_array = np.array(carray)
# mem_array = make_memmap(mem_file_name, np_array)
# del np_array, mem_array
del np_array
np_array = read_memmap(mem_file_name)
else:
np_array = bcolz.carray(rootdir=os.path.join(path, name), mode='r')
issame = np.load(os.path.join(path, '{}_list.npy'.format(name)))
return np_array, issame
def get_val_data(data_path):
agedb_30, agedb_30_issame = get_val_pair(data_path, 'agedb_30')
cfp_fp, cfp_fp_issame = get_val_pair(data_path, 'cfp_fp')
lfw, lfw_issame = get_val_pair(data_path, 'lfw')
cplfw, cplfw_issame = get_val_pair(data_path, 'cplfw')
calfw, calfw_issame = get_val_pair(data_path, 'calfw')
return agedb_30, cfp_fp, lfw, agedb_30_issame, cfp_fp_issame, lfw_issame, cplfw, cplfw_issame, calfw, calfw_issame
...@@ -25,7 +25,7 @@ from .common_dataset import CommonDataset ...@@ -25,7 +25,7 @@ from .common_dataset import CommonDataset
class MultiLabelDataset(CommonDataset): class MultiLabelDataset(CommonDataset):
def _load_anno(self): def _load_anno(self, label_ratio=False):
assert os.path.exists(self._cls_path) assert os.path.exists(self._cls_path)
assert os.path.exists(self._img_root) assert os.path.exists(self._img_root)
self.images = [] self.images = []
...@@ -41,6 +41,8 @@ class MultiLabelDataset(CommonDataset): ...@@ -41,6 +41,8 @@ class MultiLabelDataset(CommonDataset):
self.labels.append(labels) self.labels.append(labels)
assert os.path.exists(self.images[-1]) assert os.path.exists(self.images[-1])
if label_ratio:
return np.array(self.labels).mean(0).astype("float32")
def __getitem__(self, idx): def __getitem__(self, idx):
try: try:
...@@ -50,7 +52,10 @@ class MultiLabelDataset(CommonDataset): ...@@ -50,7 +52,10 @@ class MultiLabelDataset(CommonDataset):
img = transform(img, self._transform_ops) img = transform(img, self._transform_ops)
img = img.transpose((2, 0, 1)) img = img.transpose((2, 0, 1))
label = np.array(self.labels[idx]).astype("float32") label = np.array(self.labels[idx]).astype("float32")
return (img, label) if self.label_ratio is not None:
return (img, np.array([label, self.label_ratio]))
else:
return (img, label)
except Exception as ex: except Exception as ex:
logger.error("Exception occured when parse line: {} with msg: {}". logger.error("Exception occured when parse line: {} with msg: {}".
......
...@@ -14,9 +14,10 @@ ...@@ -14,9 +14,10 @@
import copy import copy
import importlib import importlib
from . import topk from . import topk, threshoutput
from .topk import Topk, MultiLabelTopk from .topk import Topk, MultiLabelTopk
from .threshoutput import ThreshOutput
def build_postprocess(config): def build_postprocess(config):
......
# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.nn.functional as F
class ThreshOutput(object):
def __init__(self, threshold, label_0="0", label_1="1"):
self.threshold = threshold
self.label_0 = label_0
self.label_1 = label_1
def __call__(self, x, file_names=None):
y = []
x = F.softmax(x, axis=-1).numpy()
for idx, probs in enumerate(x):
score = probs[1]
if score < self.threshold:
result = {"class_ids": [0], "scores": [1 - score], "label_names": [self.label_0]}
else:
result = {"class_ids": [1], "scores": [score], "label_names": [self.label_1]}
if file_names is not None:
result["file_name"] = file_names[idx]
y.append(result)
return y
...@@ -33,11 +33,18 @@ from ppcls.data.preprocess.ops.operators import AugMix ...@@ -33,11 +33,18 @@ from ppcls.data.preprocess.ops.operators import AugMix
from ppcls.data.preprocess.ops.operators import Pad from ppcls.data.preprocess.ops.operators import Pad
from ppcls.data.preprocess.ops.operators import ToTensor from ppcls.data.preprocess.ops.operators import ToTensor
from ppcls.data.preprocess.ops.operators import Normalize from ppcls.data.preprocess.ops.operators import Normalize
from ppcls.data.preprocess.ops.operators import RandomHorizontalFlip
from ppcls.data.preprocess.ops.operators import CropWithPadding
from ppcls.data.preprocess.ops.operators import RandomInterpolationAugment
from ppcls.data.preprocess.ops.operators import ColorJitter
from ppcls.data.preprocess.ops.operators import RandomCropImage
from ppcls.data.preprocess.ops.operators import Padv2
from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator
import numpy as np import numpy as np
from PIL import Image from PIL import Image
import random
def transform(data, ops=[]): def transform(data, ops=[]):
...@@ -88,16 +95,16 @@ class RandAugment(RawRandAugment): ...@@ -88,16 +95,16 @@ class RandAugment(RawRandAugment):
class TimmAutoAugment(RawTimmAutoAugment): class TimmAutoAugment(RawTimmAutoAugment):
""" TimmAutoAugment wrapper to auto fit different img tyeps. """ """ TimmAutoAugment wrapper to auto fit different img tyeps. """
def __init__(self, *args, **kwargs): def __init__(self, prob=1.0, *args, **kwargs):
super().__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self.prob = prob
def __call__(self, img): def __call__(self, img):
if not isinstance(img, Image.Image): if not isinstance(img, Image.Image):
img = np.ascontiguousarray(img) img = np.ascontiguousarray(img)
img = Image.fromarray(img) img = Image.fromarray(img)
if random.random() < self.prob:
img = super().__call__(img) img = super().__call__(img)
if isinstance(img, Image.Image): if isinstance(img, Image.Image):
img = np.asarray(img) img = np.asarray(img)
......
...@@ -25,8 +25,8 @@ import cv2 ...@@ -25,8 +25,8 @@ import cv2
import numpy as np import numpy as np
from PIL import Image, ImageOps, __version__ as PILLOW_VERSION from PIL import Image, ImageOps, __version__ as PILLOW_VERSION
from paddle.vision.transforms import ColorJitter as RawColorJitter from paddle.vision.transforms import ColorJitter as RawColorJitter
from paddle.vision.transforms import ToTensor, Normalize from paddle.vision.transforms import ToTensor, Normalize, RandomHorizontalFlip, RandomResizedCrop
from paddle.vision.transforms import functional as F
from .autoaugment import ImageNetPolicy from .autoaugment import ImageNetPolicy
from .functional import augmentations from .functional import augmentations
from ppcls.utils import logger from ppcls.utils import logger
...@@ -93,6 +93,42 @@ class UnifiedResize(object): ...@@ -93,6 +93,42 @@ class UnifiedResize(object):
return self.resize_func(src, size) return self.resize_func(src, size)
class RandomInterpolationAugment(object):
def __init__(self, prob):
self.prob = prob
def _aug(self, img):
img_shape = img.shape
side_ratio = np.random.uniform(0.2, 1.0)
small_side = int(side_ratio * img_shape[0])
interpolation = np.random.choice([
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_AREA,
cv2.INTER_CUBIC, cv2.INTER_LANCZOS4
])
small_img = cv2.resize(
img, (small_side, small_side), interpolation=interpolation)
interpolation = np.random.choice([
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_AREA,
cv2.INTER_CUBIC, cv2.INTER_LANCZOS4
])
aug_img = cv2.resize(
small_img, (img_shape[1], img_shape[0]),
interpolation=interpolation)
return aug_img
def __call__(self, img):
if np.random.random() < self.prob:
if isinstance(img, np.ndarray):
return self._aug(img)
else:
pil_img = np.array(img)
aug_img = self._aug(pil_img)
img = Image.fromarray(aug_img.astype(np.uint8))
return img
else:
return img
class OperatorParamError(ValueError): class OperatorParamError(ValueError):
""" OperatorParamError """ OperatorParamError
""" """
...@@ -170,6 +206,52 @@ class ResizeImage(object): ...@@ -170,6 +206,52 @@ class ResizeImage(object):
return self._resize_func(img, (w, h)) return self._resize_func(img, (w, h))
class CropWithPadding(RandomResizedCrop):
"""
crop image and padding to original size
"""
def __init__(self,
prob=1,
padding_num=0,
size=224,
scale=(0.08, 1.0),
ratio=(3. / 4, 4. / 3),
interpolation='bilinear',
key=None):
super().__init__(size, scale, ratio, interpolation, key)
self.prob = prob
self.padding_num = padding_num
def __call__(self, img):
is_cv2_img = False
if isinstance(img, np.ndarray):
flag = True
if np.random.random() < self.prob:
# RandomResizedCrop augmentation
new = np.zeros_like(np.array(img)) + self.padding_num
# orig_W, orig_H = F._get_image_size(sample)
orig_W, orig_H = self._get_image_size(img)
i, j, h, w = self._get_param(img)
cropped = F.crop(img, i, j, h, w)
new[i:i + h, j:j + w, :] = np.array(cropped)
if not isinstance:
new = Image.fromarray(new.astype(np.uint8))
return new
else:
return img
def _get_image_size(self, img):
if F._is_pil_image(img):
return img.size
elif F._is_numpy_image(img):
return img.shape[:2][::-1]
elif F._is_tensor_image(img):
return img.shape[1:][::-1] # chw
else:
raise TypeError("Unexpected type {}".format(type(img)))
class CropImage(object): class CropImage(object):
""" crop image """ """ crop image """
...@@ -190,6 +272,105 @@ class CropImage(object): ...@@ -190,6 +272,105 @@ class CropImage(object):
return img[h_start:h_end, w_start:w_end, :] return img[h_start:h_end, w_start:w_end, :]
class Padv2(object):
def __init__(self,
size=None,
size_divisor=32,
pad_mode=0,
offsets=None,
fill_value=(127.5, 127.5, 127.5)):
"""
Pad image to a specified size or multiple of size_divisor.
Args:
size (int, list): image target size, if None, pad to multiple of size_divisor, default None
size_divisor (int): size divisor, default 32
pad_mode (int): pad mode, currently only supports four modes [-1, 0, 1, 2]. if -1, use specified offsets
if 0, only pad to right and bottom. if 1, pad according to center. if 2, only pad left and top
offsets (list): [offset_x, offset_y], specify offset while padding, only supported pad_mode=-1
fill_value (bool): rgb value of pad area, default (127.5, 127.5, 127.5)
"""
if not isinstance(size, (int, list)):
raise TypeError(
"Type of target_size is invalid when random_size is True. \
Must be List, now is {}".format(type(size)))
if isinstance(size, int):
size = [size, size]
assert pad_mode in [
-1, 0, 1, 2
], 'currently only supports four modes [-1, 0, 1, 2]'
if pad_mode == -1:
assert offsets, 'if pad_mode is -1, offsets should not be None'
self.size = size
self.size_divisor = size_divisor
self.pad_mode = pad_mode
self.fill_value = fill_value
self.offsets = offsets
def apply_image(self, image, offsets, im_size, size):
x, y = offsets
im_h, im_w = im_size
h, w = size
canvas = np.ones((h, w, 3), dtype=np.float32)
canvas *= np.array(self.fill_value, dtype=np.float32)
canvas[y:y + im_h, x:x + im_w, :] = image.astype(np.float32)
return canvas
def __call__(self, img):
im_h, im_w = img.shape[:2]
if self.size:
w, h = self.size
assert (
im_h <= h and im_w <= w
), '(h, w) of target size should be greater than (im_h, im_w)'
else:
h = int(np.ceil(im_h / self.size_divisor) * self.size_divisor)
w = int(np.ceil(im_w / self.size_divisor) * self.size_divisor)
if h == im_h and w == im_w:
return img.astype(np.float32)
if self.pad_mode == -1:
offset_x, offset_y = self.offsets
elif self.pad_mode == 0:
offset_y, offset_x = 0, 0
elif self.pad_mode == 1:
offset_y, offset_x = (h - im_h) // 2, (w - im_w) // 2
else:
offset_y, offset_x = h - im_h, w - im_w
offsets, im_size, size = [offset_x, offset_y], [im_h, im_w], [h, w]
return self.apply_image(img, offsets, im_size, size)
class RandomCropImage(object):
"""Random crop image only
"""
def __init__(self, size):
super(RandomCropImage, self).__init__()
if isinstance(size, int):
size = [size, size]
self.size = size
def __call__(self, img):
h, w = img.shape[:2]
tw, th = self.size
i = random.randint(0, h - th)
j = random.randint(0, w - tw)
img = img[i:i + th, j:j + tw, :]
if img.shape[0] != 256 or img.shape[1] != 192:
raise ValueError('sample: ', h, w, i, j, th, tw, img.shape)
return img
class RandCropImage(object): class RandCropImage(object):
""" random crop image """ """ random crop image """
...@@ -434,16 +615,18 @@ class ColorJitter(RawColorJitter): ...@@ -434,16 +615,18 @@ class ColorJitter(RawColorJitter):
"""ColorJitter. """ColorJitter.
""" """
def __init__(self, *args, **kwargs): def __init__(self, prob=2, *args, **kwargs):
super().__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self.prob = prob
def __call__(self, img): def __call__(self, img):
if not isinstance(img, Image.Image): if np.random.random() < self.prob:
img = np.ascontiguousarray(img) if not isinstance(img, Image.Image):
img = Image.fromarray(img) img = np.ascontiguousarray(img)
img = super()._apply_image(img) img = Image.fromarray(img)
if isinstance(img, Image.Image): img = super()._apply_image(img)
img = np.asarray(img) if isinstance(img, Image.Image):
img = np.asarray(img)
return img return img
...@@ -463,8 +646,8 @@ class Pad(object): ...@@ -463,8 +646,8 @@ class Pad(object):
# Process fill color for affine transforms # Process fill color for affine transforms
major_found, minor_found = (int(v) major_found, minor_found = (int(v)
for v in PILLOW_VERSION.split('.')[:2]) for v in PILLOW_VERSION.split('.')[:2])
major_required, minor_required = ( major_required, minor_required = (int(v) for v in
int(v) for v in min_pil_version.split('.')[:2]) min_pil_version.split('.')[:2])
if major_found < major_required or (major_found == major_required and if major_found < major_required or (major_found == major_required and
minor_found < minor_required): minor_found < minor_required):
if fill is None: if fill is None:
......
...@@ -75,8 +75,9 @@ class Engine(object): ...@@ -75,8 +75,9 @@ class Engine(object):
print_config(config) print_config(config)
# init train_func and eval_func # init train_func and eval_func
assert self.eval_mode in ["classification", "retrieval"], logger.error( assert self.eval_mode in [
"Invalid eval mode: {}".format(self.eval_mode)) "classification", "retrieval", "adaface"
], logger.error("Invalid eval mode: {}".format(self.eval_mode))
self.train_epoch_func = train_epoch self.train_epoch_func = train_epoch
self.eval_func = getattr(evaluation, self.eval_mode + "_eval") self.eval_func = getattr(evaluation, self.eval_mode + "_eval")
...@@ -115,7 +116,7 @@ class Engine(object): ...@@ -115,7 +116,7 @@ class Engine(object):
self.config["DataLoader"], "Train", self.device, self.use_dali) self.config["DataLoader"], "Train", self.device, self.use_dali)
if self.mode == "eval" or (self.mode == "train" and if self.mode == "eval" or (self.mode == "train" and
self.config["Global"]["eval_during_train"]): self.config["Global"]["eval_during_train"]):
if self.eval_mode == "classification": if self.eval_mode in ["classification", "adaface"]:
self.eval_dataloader = build_dataloader( self.eval_dataloader = build_dataloader(
self.config["DataLoader"], "Eval", self.device, self.config["DataLoader"], "Eval", self.device,
self.use_dali) self.use_dali)
...@@ -189,7 +190,7 @@ class Engine(object): ...@@ -189,7 +190,7 @@ class Engine(object):
self.eval_metric_func = None self.eval_metric_func = None
# build model # build model
self.model = build_model(self.config) self.model = build_model(self.config, self.mode)
# set @to_static for benchmark, skip this by default. # set @to_static for benchmark, skip this by default.
apply_to_static(self.config, self.model) apply_to_static(self.config, self.model)
...@@ -239,7 +240,7 @@ class Engine(object): ...@@ -239,7 +240,7 @@ class Engine(object):
self.amp_eval = self.config["AMP"].get("use_fp16_test", False) self.amp_eval = self.config["AMP"].get("use_fp16_test", False)
# TODO(gaotingquan): Paddle not yet support FP32 evaluation when training with AMPO2 # TODO(gaotingquan): Paddle not yet support FP32 evaluation when training with AMPO2
if self.config["Global"].get( if self.mode == "train" and self.config["Global"].get(
"eval_during_train", "eval_during_train",
True) and self.amp_level == "O2" and self.amp_eval == False: True) and self.amp_level == "O2" and self.amp_eval == False:
msg = "PaddlePaddle only support FP16 evaluation when training with AMP O2 now. " msg = "PaddlePaddle only support FP16 evaluation when training with AMP O2 now. "
...@@ -269,10 +270,11 @@ class Engine(object): ...@@ -269,10 +270,11 @@ class Engine(object):
save_dtype='float32') save_dtype='float32')
# paddle version >= 2.3.0 or develop # paddle version >= 2.3.0 or develop
else: else:
self.model = paddle.amp.decorate( if self.mode == "train" or self.amp_eval:
models=self.model, self.model = paddle.amp.decorate(
level=self.amp_level, models=self.model,
save_dtype='float32') level=self.amp_level,
save_dtype='float32')
if self.mode == "train" and len(self.train_loss_func.parameters( if self.mode == "train" and len(self.train_loss_func.parameters(
)) > 0: )) > 0:
...@@ -312,7 +314,7 @@ class Engine(object): ...@@ -312,7 +314,7 @@ class Engine(object):
print_batch_step = self.config['Global']['print_batch_step'] print_batch_step = self.config['Global']['print_batch_step']
save_interval = self.config["Global"]["save_interval"] save_interval = self.config["Global"]["save_interval"]
best_metric = { best_metric = {
"metric": 0.0, "metric": -1.0,
"epoch": 0, "epoch": 0,
} }
# key: # key:
...@@ -344,18 +346,18 @@ class Engine(object): ...@@ -344,18 +346,18 @@ class Engine(object):
if self.use_dali: if self.use_dali:
self.train_dataloader.reset() self.train_dataloader.reset()
metric_msg = ", ".join([ metric_msg = ", ".join(
"{}: {:.5f}".format(key, self.output_info[key].avg) [self.output_info[key].avg_info for key in self.output_info])
for key in self.output_info
])
logger.info("[Train][Epoch {}/{}][Avg]{}".format( logger.info("[Train][Epoch {}/{}][Avg]{}".format(
epoch_id, self.config["Global"]["epochs"], metric_msg)) epoch_id, self.config["Global"]["epochs"], metric_msg))
self.output_info.clear() self.output_info.clear()
# eval model and save model if possible # eval model and save model if possible
start_eval_epoch = self.config["Global"].get("start_eval_epoch",
0) - 1
if self.config["Global"][ if self.config["Global"][
"eval_during_train"] and epoch_id % self.config["Global"][ "eval_during_train"] and epoch_id % self.config["Global"][
"eval_interval"] == 0: "eval_interval"] == 0 and epoch_id > start_eval_epoch:
acc = self.eval(epoch_id) acc = self.eval(epoch_id)
if acc > best_metric["metric"]: if acc > best_metric["metric"]:
best_metric["metric"] = acc best_metric["metric"] = acc
...@@ -367,7 +369,8 @@ class Engine(object): ...@@ -367,7 +369,8 @@ class Engine(object):
self.output_dir, self.output_dir,
model_name=self.config["Arch"]["name"], model_name=self.config["Arch"]["name"],
prefix="best_model", prefix="best_model",
loss=self.train_loss_func) loss=self.train_loss_func,
save_student_model=True)
logger.info("[Eval][Epoch {}][best metric: {}]".format( logger.info("[Eval][Epoch {}][best metric: {}]".format(
epoch_id, best_metric["metric"])) epoch_id, best_metric["metric"]))
logger.scaler( logger.scaler(
...@@ -431,7 +434,17 @@ class Engine(object): ...@@ -431,7 +434,17 @@ class Engine(object):
image_file_list.append(image_file) image_file_list.append(image_file)
if len(batch_data) >= batch_size or idx == len(image_list) - 1: if len(batch_data) >= batch_size or idx == len(image_list) - 1:
batch_tensor = paddle.to_tensor(batch_data) batch_tensor = paddle.to_tensor(batch_data)
out = self.model(batch_tensor)
if self.amp and self.amp_eval:
with paddle.amp.auto_cast(
custom_black_list={
"flatten_contiguous_range", "greater_than"
},
level=self.amp_level):
out = self.model(batch_tensor)
else:
out = self.model(batch_tensor)
if isinstance(out, list): if isinstance(out, list):
out = out[0] out = out[0]
if isinstance(out, dict) and "logits" in out: if isinstance(out, dict) and "logits" in out:
...@@ -445,33 +458,40 @@ class Engine(object): ...@@ -445,33 +458,40 @@ class Engine(object):
def export(self): def export(self):
assert self.mode == "export" assert self.mode == "export"
use_multilabel = self.config["Global"].get("use_multilabel", False) use_multilabel = self.config["Global"].get(
"use_multilabel",
False) and not "ATTRMetric" in self.config["Metric"]["Eval"][0]
model = ExportModel(self.config["Arch"], self.model, use_multilabel) model = ExportModel(self.config["Arch"], self.model, use_multilabel)
if self.config["Global"]["pretrained_model"] is not None: if self.config["Global"]["pretrained_model"] is not None:
load_dygraph_pretrain(model.base_model, load_dygraph_pretrain(model.base_model,
self.config["Global"]["pretrained_model"]) self.config["Global"]["pretrained_model"])
model.eval() model.eval()
# for rep nets
for layer in self.model.sublayers():
if hasattr(layer, "rep"):
layer.rep()
save_path = os.path.join(self.config["Global"]["save_inference_dir"], save_path = os.path.join(self.config["Global"]["save_inference_dir"],
"inference") "inference")
if model.quanter:
model.quanter.save_quantized_model( model = paddle.jit.to_static(
model.base_model, model,
save_path, input_spec=[
input_spec=[ paddle.static.InputSpec(
paddle.static.InputSpec( shape=[None] + self.config["Global"]["image_shape"],
shape=[None] + self.config["Global"]["image_shape"], dtype='float32')
dtype='float32') ])
]) if hasattr(model.base_model,
"quanter") and model.base_model.quanter is not None:
model.base_model.quanter.save_quantized_model(model,
save_path + "_int8")
else: else:
model = paddle.jit.to_static(
model,
input_spec=[
paddle.static.InputSpec(
shape=[None] + self.config["Global"]["image_shape"],
dtype='float32')
])
paddle.jit.save(model, save_path) paddle.jit.save(model, save_path)
logger.info(
f"Export succeeded! The inference model exported has been saved in \"{self.config['Global']['save_inference_dir']}\"."
)
class ExportModel(TheseusLayer): class ExportModel(TheseusLayer):
......
...@@ -14,3 +14,4 @@ ...@@ -14,3 +14,4 @@
from ppcls.engine.evaluation.classification import classification_eval from ppcls.engine.evaluation.classification import classification_eval
from ppcls.engine.evaluation.retrieval import retrieval_eval from ppcls.engine.evaluation.retrieval import retrieval_eval
from ppcls.engine.evaluation.adaface import adaface_eval
\ No newline at end of file
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import time
import numpy as np
import platform
import paddle
import sklearn
from sklearn.model_selection import KFold
from sklearn.decomposition import PCA
from ppcls.utils.misc import AverageMeter
from ppcls.utils import logger
def fuse_features_with_norm(stacked_embeddings, stacked_norms):
assert stacked_embeddings.ndim == 3 # (n_features_to_fuse, batch_size, channel)
assert stacked_norms.ndim == 3 # (n_features_to_fuse, batch_size, 1)
pre_norm_embeddings = stacked_embeddings * stacked_norms
fused = pre_norm_embeddings.sum(axis=0)
norm = paddle.norm(fused, 2, 1, True)
fused = paddle.divide(fused, norm)
return fused, norm
def adaface_eval(engine, epoch_id=0):
output_info = dict()
time_info = {
"batch_cost": AverageMeter(
"batch_cost", '.5f', postfix=" s,"),
"reader_cost": AverageMeter(
"reader_cost", ".5f", postfix=" s,"),
}
print_batch_step = engine.config["Global"]["print_batch_step"]
metric_key = None
tic = time.time()
unique_dict = {}
for iter_id, batch in enumerate(engine.eval_dataloader):
images, labels, dataname, image_index = batch
if iter_id == 5:
for key in time_info:
time_info[key].reset()
time_info["reader_cost"].update(time.time() - tic)
batch_size = images.shape[0]
batch[0] = paddle.to_tensor(images)
embeddings = engine.model(images, labels)['features']
norms = paddle.divide(embeddings, paddle.norm(embeddings, 2, 1, True))
embeddings = paddle.divide(embeddings, norms)
fliped_images = paddle.flip(images, axis=[3])
flipped_embeddings = engine.model(fliped_images, labels)['features']
flipped_norms = paddle.divide(
flipped_embeddings, paddle.norm(flipped_embeddings, 2, 1, True))
flipped_embeddings = paddle.divide(flipped_embeddings, flipped_norms)
stacked_embeddings = paddle.stack(
[embeddings, flipped_embeddings], axis=0)
stacked_norms = paddle.stack([norms, flipped_norms], axis=0)
embeddings, norms = fuse_features_with_norm(stacked_embeddings,
stacked_norms)
for out, nor, label, data, idx in zip(embeddings, norms, labels,
dataname, image_index):
unique_dict[int(idx.numpy())] = {
'output': out,
'norm': nor,
'target': label,
'dataname': data
}
# calc metric
time_info["batch_cost"].update(time.time() - tic)
if iter_id % print_batch_step == 0:
time_msg = "s, ".join([
"{}: {:.5f}".format(key, time_info[key].avg)
for key in time_info
])
ips_msg = "ips: {:.5f} images/sec".format(
batch_size / time_info["batch_cost"].avg)
metric_msg = ", ".join([
"{}: {:.5f}".format(key, output_info[key].val)
for key in output_info
])
logger.info("[Eval][Epoch {}][Iter: {}/{}]{}, {}, {}".format(
epoch_id, iter_id,
len(engine.eval_dataloader), metric_msg, time_msg, ips_msg))
tic = time.time()
unique_keys = sorted(unique_dict.keys())
all_output_tensor = paddle.stack(
[unique_dict[key]['output'] for key in unique_keys], axis=0)
all_norm_tensor = paddle.stack(
[unique_dict[key]['norm'] for key in unique_keys], axis=0)
all_target_tensor = paddle.stack(
[unique_dict[key]['target'] for key in unique_keys], axis=0)
all_dataname_tensor = paddle.stack(
[unique_dict[key]['dataname'] for key in unique_keys], axis=0)
eval_result = cal_metric(all_output_tensor, all_norm_tensor,
all_target_tensor, all_dataname_tensor)
metric_msg = ", ".join([
"{}: {:.5f}".format(key, output_info[key].avg) for key in output_info
])
face_msg = ", ".join([
"{}: {:.5f}".format(key, eval_result[key])
for key in eval_result.keys()
])
logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg + ", " +
face_msg))
# return 1st metric in the dict
return eval_result['all_test_acc']
def cal_metric(all_output_tensor, all_norm_tensor, all_target_tensor,
all_dataname_tensor):
all_target_tensor = all_target_tensor.reshape([-1])
all_dataname_tensor = all_dataname_tensor.reshape([-1])
dataname_to_idx = {
"agedb_30": 0,
"cfp_fp": 1,
"lfw": 2,
"cplfw": 3,
"calfw": 4
}
idx_to_dataname = {val: key for key, val in dataname_to_idx.items()}
test_logs = {}
# _, indices = paddle.unique(all_dataname_tensor, return_index=True, return_inverse=False, return_counts=False)
for dataname_idx in all_dataname_tensor.unique():
dataname = idx_to_dataname[dataname_idx.item()]
# per dataset evaluation
embeddings = all_output_tensor[all_dataname_tensor ==
dataname_idx].numpy()
labels = all_target_tensor[all_dataname_tensor == dataname_idx].numpy()
issame = labels[0::2]
tpr, fpr, accuracy, best_thresholds = evaluate_face(
embeddings, issame, nrof_folds=10)
acc, best_threshold = accuracy.mean(), best_thresholds.mean()
num_test_samples = len(embeddings)
test_logs[f'{dataname}_test_acc'] = acc
test_logs[f'{dataname}_test_best_threshold'] = best_threshold
test_logs[f'{dataname}_num_test_samples'] = num_test_samples
test_acc = np.mean([
test_logs[f'{dataname}_test_acc']
for dataname in dataname_to_idx.keys()
if f'{dataname}_test_acc' in test_logs
])
test_logs['all_test_acc'] = test_acc
return test_logs
def evaluate_face(embeddings, actual_issame, nrof_folds=10, pca=0):
# Calculate evaluation metrics
thresholds = np.arange(0, 4, 0.01)
embeddings1 = embeddings[0::2]
embeddings2 = embeddings[1::2]
tpr, fpr, accuracy, best_thresholds = calculate_roc(
thresholds,
embeddings1,
embeddings2,
np.asarray(actual_issame),
nrof_folds=nrof_folds,
pca=pca)
return tpr, fpr, accuracy, best_thresholds
def calculate_roc(thresholds,
embeddings1,
embeddings2,
actual_issame,
nrof_folds=10,
pca=0):
assert (embeddings1.shape[0] == embeddings2.shape[0])
assert (embeddings1.shape[1] == embeddings2.shape[1])
nrof_pairs = min(len(actual_issame), embeddings1.shape[0])
nrof_thresholds = len(thresholds)
k_fold = KFold(n_splits=nrof_folds, shuffle=False)
tprs = np.zeros((nrof_folds, nrof_thresholds))
fprs = np.zeros((nrof_folds, nrof_thresholds))
accuracy = np.zeros((nrof_folds))
best_thresholds = np.zeros((nrof_folds))
indices = np.arange(nrof_pairs)
# print('pca', pca)
dist = None
if pca == 0:
diff = np.subtract(embeddings1, embeddings2)
dist = np.sum(np.square(diff), 1)
for fold_idx, (train_set, test_set) in enumerate(k_fold.split(indices)):
# print('train_set', train_set)
# print('test_set', test_set)
if pca > 0:
print('doing pca on', fold_idx)
embed1_train = embeddings1[train_set]
embed2_train = embeddings2[train_set]
_embed_train = np.concatenate((embed1_train, embed2_train), axis=0)
# print(_embed_train.shape)
pca_model = PCA(n_components=pca)
pca_model.fit(_embed_train)
embed1 = pca_model.transform(embeddings1)
embed2 = pca_model.transform(embeddings2)
embed1 = sklearn.preprocessing.normalize(embed1)
embed2 = sklearn.preprocessing.normalize(embed2)
# print(embed1.shape, embed2.shape)
diff = np.subtract(embed1, embed2)
dist = np.sum(np.square(diff), 1)
# Find the best threshold for the fold
acc_train = np.zeros((nrof_thresholds))
for threshold_idx, threshold in enumerate(thresholds):
_, _, acc_train[threshold_idx] = calculate_accuracy(
threshold, dist[train_set], actual_issame[train_set])
best_threshold_index = np.argmax(acc_train)
best_thresholds[fold_idx] = thresholds[best_threshold_index]
for threshold_idx, threshold in enumerate(thresholds):
tprs[fold_idx, threshold_idx], fprs[
fold_idx, threshold_idx], _ = calculate_accuracy(
threshold, dist[test_set], actual_issame[test_set])
_, _, accuracy[fold_idx] = calculate_accuracy(
thresholds[best_threshold_index], dist[test_set],
actual_issame[test_set])
tpr = np.mean(tprs, 0)
fpr = np.mean(fprs, 0)
return tpr, fpr, accuracy, best_thresholds
def calculate_accuracy(threshold, dist, actual_issame):
predict_issame = np.less(dist, threshold)
tp = np.sum(np.logical_and(predict_issame, actual_issame))
fp = np.sum(np.logical_and(predict_issame, np.logical_not(actual_issame)))
tn = np.sum(
np.logical_and(
np.logical_not(predict_issame), np.logical_not(actual_issame)))
fn = np.sum(np.logical_and(np.logical_not(predict_issame), actual_issame))
tpr = 0 if (tp + fn == 0) else float(tp) / float(tp + fn)
fpr = 0 if (fp + tn == 0) else float(fp) / float(fp + tn)
acc = float(tp + tn) / dist.size
return tpr, fpr, acc
...@@ -23,6 +23,8 @@ from ppcls.utils import logger ...@@ -23,6 +23,8 @@ from ppcls.utils import logger
def classification_eval(engine, epoch_id=0): def classification_eval(engine, epoch_id=0):
if hasattr(engine.eval_metric_func, "reset"):
engine.eval_metric_func.reset()
output_info = dict() output_info = dict()
time_info = { time_info = {
"batch_cost": AverageMeter( "batch_cost": AverageMeter(
...@@ -80,6 +82,7 @@ def classification_eval(engine, epoch_id=0): ...@@ -80,6 +82,7 @@ def classification_eval(engine, epoch_id=0):
# gather Tensor when distributed # gather Tensor when distributed
if paddle.distributed.get_world_size() > 1: if paddle.distributed.get_world_size() > 1:
label_list = [] label_list = []
paddle.distributed.all_gather(label_list, batch[1]) paddle.distributed.all_gather(label_list, batch[1])
labels = paddle.concat(label_list, 0) labels = paddle.concat(label_list, 0)
...@@ -121,18 +124,10 @@ def classification_eval(engine, epoch_id=0): ...@@ -121,18 +124,10 @@ def classification_eval(engine, epoch_id=0):
output_info[key] = AverageMeter(key, '7.5f') output_info[key] = AverageMeter(key, '7.5f')
output_info[key].update(loss_dict[key].numpy()[0], output_info[key].update(loss_dict[key].numpy()[0],
current_samples) current_samples)
# calc metric # calc metric
if engine.eval_metric_func is not None: if engine.eval_metric_func is not None:
metric_dict = engine.eval_metric_func(preds, labels) engine.eval_metric_func(preds, labels)
for key in metric_dict:
if metric_key is None:
metric_key = key
if key not in output_info:
output_info[key] = AverageMeter(key, '7.5f')
output_info[key].update(metric_dict[key].numpy()[0],
current_samples)
time_info["batch_cost"].update(time.time() - tic) time_info["batch_cost"].update(time.time() - tic)
if iter_id % print_batch_step == 0: if iter_id % print_batch_step == 0:
...@@ -144,10 +139,14 @@ def classification_eval(engine, epoch_id=0): ...@@ -144,10 +139,14 @@ def classification_eval(engine, epoch_id=0):
ips_msg = "ips: {:.5f} images/sec".format( ips_msg = "ips: {:.5f} images/sec".format(
batch_size / time_info["batch_cost"].avg) batch_size / time_info["batch_cost"].avg)
metric_msg = ", ".join([ if "ATTRMetric" in engine.config["Metric"]["Eval"][0]:
"{}: {:.5f}".format(key, output_info[key].val) metric_msg = ""
for key in output_info else:
]) metric_msg = ", ".join([
"{}: {:.5f}".format(key, output_info[key].val)
for key in output_info
])
metric_msg += ", {}".format(engine.eval_metric_func.avg_info)
logger.info("[Eval][Epoch {}][Iter: {}/{}]{}, {}, {}".format( logger.info("[Eval][Epoch {}][Iter: {}/{}]{}, {}, {}".format(
epoch_id, iter_id, epoch_id, iter_id,
len(engine.eval_dataloader), metric_msg, time_msg, ips_msg)) len(engine.eval_dataloader), metric_msg, time_msg, ips_msg))
...@@ -155,13 +154,29 @@ def classification_eval(engine, epoch_id=0): ...@@ -155,13 +154,29 @@ def classification_eval(engine, epoch_id=0):
tic = time.time() tic = time.time()
if engine.use_dali: if engine.use_dali:
engine.eval_dataloader.reset() engine.eval_dataloader.reset()
metric_msg = ", ".join([
"{}: {:.5f}".format(key, output_info[key].avg) for key in output_info if "ATTRMetric" in engine.config["Metric"]["Eval"][0]:
]) metric_msg = ", ".join([
logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg)) "evalres: ma: {:.5f} label_f1: {:.5f} label_pos_recall: {:.5f} label_neg_recall: {:.5f} instance_f1: {:.5f} instance_acc: {:.5f} instance_prec: {:.5f} instance_recall: {:.5f}".
format(*engine.eval_metric_func.attr_res())
# do not try to save best eval.model ])
if engine.eval_metric_func is None: logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg))
return -1
# return 1st metric in the dict # do not try to save best eval.model
return output_info[metric_key].avg if engine.eval_metric_func is None:
return -1
# return 1st metric in the dict
return engine.eval_metric_func.attr_res()[0]
else:
metric_msg = ", ".join([
"{}: {:.5f}".format(key, output_info[key].avg)
for key in output_info
])
metric_msg += ", {}".format(engine.eval_metric_func.avg_info)
logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg))
# do not try to save best eval.model
if engine.eval_metric_func is None:
return -1
# return 1st metric in the dict
return engine.eval_metric_func.avg
...@@ -3,16 +3,29 @@ import paddle.nn as nn ...@@ -3,16 +3,29 @@ import paddle.nn as nn
import paddle.nn.functional as F import paddle.nn.functional as F
def ratio2weight(targets, ratio):
pos_weights = targets * (1. - ratio)
neg_weights = (1. - targets) * ratio
weights = paddle.exp(neg_weights + pos_weights)
# for RAP dataloader, targets element may be 2, with or without smooth, some element must great than 1
weights = weights - weights * (targets > 1)
return weights
class MultiLabelLoss(nn.Layer): class MultiLabelLoss(nn.Layer):
""" """
Multi-label loss Multi-label loss
""" """
def __init__(self, epsilon=None): def __init__(self, epsilon=None, size_sum=False, weight_ratio=False):
super().__init__() super().__init__()
if epsilon is not None and (epsilon <= 0 or epsilon >= 1): if epsilon is not None and (epsilon <= 0 or epsilon >= 1):
epsilon = None epsilon = None
self.epsilon = epsilon self.epsilon = epsilon
self.weight_ratio = weight_ratio
self.size_sum = size_sum
def _labelsmoothing(self, target, class_num): def _labelsmoothing(self, target, class_num):
if target.ndim == 1 or target.shape[-1] != class_num: if target.ndim == 1 or target.shape[-1] != class_num:
...@@ -24,13 +37,21 @@ class MultiLabelLoss(nn.Layer): ...@@ -24,13 +37,21 @@ class MultiLabelLoss(nn.Layer):
return soft_target return soft_target
def _binary_crossentropy(self, input, target, class_num): def _binary_crossentropy(self, input, target, class_num):
if self.weight_ratio:
target, label_ratio = target[:, 0, :], target[:, 1, :]
if self.epsilon is not None: if self.epsilon is not None:
target = self._labelsmoothing(target, class_num) target = self._labelsmoothing(target, class_num)
cost = F.binary_cross_entropy_with_logits( cost = F.binary_cross_entropy_with_logits(
logit=input, label=target) logit=input, label=target, reduction='none')
else:
cost = F.binary_cross_entropy_with_logits( if self.weight_ratio:
logit=input, label=target) targets_mask = paddle.cast(target > 0.5, 'float32')
weight = ratio2weight(targets_mask, paddle.to_tensor(label_ratio))
weight = weight * (target > -1)
cost = cost * weight
if self.size_sum:
cost = cost.sum(1).mean() if self.size_sum else cost.mean()
return cost return cost
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -439,8 +439,7 @@ def run(dataloader, ...@@ -439,8 +439,7 @@ def run(dataloader,
logger.info("END {:s} {:s} {:s}".format(mode, end_str, ips_info)) logger.info("END {:s} {:s} {:s}".format(mode, end_str, ips_info))
else: else:
end_epoch_str = "END epoch:{:<3d}".format(epoch) end_epoch_str = "END epoch:{:<3d}".format(epoch)
logger.info("{:s} {:s} {:s} {:s}".format(end_epoch_str, mode, end_str, logger.info("{:s} {:s} {:s}".format(end_epoch_str, mode, end_str))
ips_info))
if use_dali: if use_dali:
dataloader.reset() dataloader.reset()
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册