提交 4c631565 编写于 作者: J jack

Merge branch 'develop' of github.com:joey12300/PaddleX into develop

...@@ -6,6 +6,7 @@ ...@@ -6,6 +6,7 @@
[![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleX.svg)](https://github.com/PaddlePaddle/PaddleX/releases) [![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleX.svg)](https://github.com/PaddlePaddle/PaddleX/releases)
![python version](https://img.shields.io/badge/python-3.6+-orange.svg) ![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg) ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
![QQGroup](https://img.shields.io/badge/QQ_Group-1045148026-52B6EF?style=social&logo=tencent-qq&logoColor=000&logoWidth=20)
PaddleX是基于飞桨核心框架、开发套件和工具组件的深度学习全流程开发工具。具备**全流程打通****融合产业实践****易用易集成**三大特点。 PaddleX是基于飞桨核心框架、开发套件和工具组件的深度学习全流程开发工具。具备**全流程打通****融合产业实践****易用易集成**三大特点。
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
> 可以使用模型裁剪,参考文档[模型裁剪使用教程](slim/prune.md),通过调整裁剪参数,可以控制模型裁剪后的大小,在实际实验中,如VOC检测数据,使用yolov3-mobilenet,原模型大小为XXM,裁剪后为XX M,精度基本保持不变 > 可以使用模型裁剪,参考文档[模型裁剪使用教程](slim/prune.md),通过调整裁剪参数,可以控制模型裁剪后的大小,在实际实验中,如VOC检测数据,使用yolov3-mobilenet,原模型大小为XXM,裁剪后为XX M,精度基本保持不变
## 4. 如何配置训练时GPU的卡数 ## 4. 如何配置训练时GPU的卡数
> 通过在终端export环境变量,或在Python代码中设置,可参考文档[CPU/多卡GPU训练](gpu_configure.md) > 通过在终端export环境变量,或在Python代码中设置,可参考文档[CPU/多卡GPU训练](appendix/gpu_configure.md)
## 5. 想将之前训练的模型参数上继续训练 ## 5. 想将之前训练的模型参数上继续训练
> 在训练调用`train`接口时,将`pretrain_weights`设为之前的模型保存路径即可 > 在训练调用`train`接口时,将`pretrain_weights`设为之前的模型保存路径即可
...@@ -52,7 +52,7 @@ ...@@ -52,7 +52,7 @@
> 1. 用户自行训练时,如不确定迭代的轮数,可以将轮数设高一些,同时注意设置`save_interval_epochs`,这样模型迭代每间隔相应轮数就会在验证集上进行评估和保存,可以根据不同轮数模型在验证集上的评估指标,判断模型是否已经收敛,若模型已收敛,可以自行结束训练进程 > 1. 用户自行训练时,如不确定迭代的轮数,可以将轮数设高一些,同时注意设置`save_interval_epochs`,这样模型迭代每间隔相应轮数就会在验证集上进行评估和保存,可以根据不同轮数模型在验证集上的评估指标,判断模型是否已经收敛,若模型已收敛,可以自行结束训练进程
> >
## 9. 只有CPU,没有GPU,如何提升训练速度 ## 9. 只有CPU,没有GPU,如何提升训练速度
> 当没有GPU时,可以根据自己的CPU配置,选择是否使用多CPU进行训练,具体配置方式可以参考文档[多卡CPU/GPU训练](gpu_configure.md) > 当没有GPU时,可以根据自己的CPU配置,选择是否使用多CPU进行训练,具体配置方式可以参考文档[多卡CPU/GPU训练](appendix/gpu_configure.md)
> >
## 10. 电脑不能联网,训练时因为下载预训练模型失败,如何解决 ## 10. 电脑不能联网,训练时因为下载预训练模型失败,如何解决
> 可以预先通过其它方式准备好预训练模型,然后训练时自定义`pretrain_weights`即可,可参考文档[无联网模型训练](how_to_offline_run.md) > 可以预先通过其它方式准备好预训练模型,然后训练时自定义`pretrain_weights`即可,可参考文档[无联网模型训练](how_to_offline_run.md)
...@@ -61,8 +61,8 @@ ...@@ -61,8 +61,8 @@
> 1.可以按照9的方式来解决这个问题 > 1.可以按照9的方式来解决这个问题
> 2.每次训练前都设定`paddlex.pretrain_dir`路径,如设定`paddlex.pretrain_dir='/usrname/paddlex`,如此下载完的预训练模型会存放至`/usrname/paddlex`目录下,而已经下载在该目录的模型也不会再次重复下载 > 2.每次训练前都设定`paddlex.pretrain_dir`路径,如设定`paddlex.pretrain_dir='/usrname/paddlex`,如此下载完的预训练模型会存放至`/usrname/paddlex`目录下,而已经下载在该目录的模型也不会再次重复下载
## 12. 程序启动时提示"Failed to execute script PaddleX",如何解决? ## 12. PaddleX GUI启动时提示"Failed to execute script PaddleX",如何解决?
> 1. 请检查目标机器上PaddleX程序所在路径是否包含中文。目前暂不支持中文路径,请尝试将程序移动到英文目录。 > 1. 请检查目标机器上PaddleX程序所在路径是否包含中文。目前暂不支持中文路径,请尝试将程序移动到英文目录。
> 2. 如果您的系统是Windows 7或者Windows Server 2012时,原因是缺少MFPlat.DLL/MF.dll/MFReadWrite.dll等OpenCV依赖的DLL,请按如下方式安装桌面体验:通过“我的电脑”-->“属性”-->"管理"打开服务器管理器,点击右上角“管理”选择“添加角色和功能”。点击“服务器选择”-->“功能”,拖动滚动条到最下端,点开“用户界面和基础结构”,勾选“桌面体验”后点击“安装”,等安装完成尝试再次运行PaddleX。 > 2. 如果您的系统是Windows 7或者Windows Server 2012时,原因是缺少MFPlat.DLL/MF.dll/MFReadWrite.dll等OpenCV依赖的DLL,请按如下方式安装桌面体验:通过“我的电脑”-->“属性”-->"管理"打开服务器管理器,点击右上角“管理”选择“添加角色和功能”。点击“服务器选择”-->“功能”,拖动滚动条到最下端,点开“用户界面和基础结构”,勾选“桌面体验”后点击“安装”,等安装完成尝试再次运行PaddleX。
> 3. 请检查目标机器上是否有其他的PaddleX程序或者进程在运行中,如有请退出或者重启机器看是否解决 > 3. 请检查目标机器上是否有其他的PaddleX程序或者进程在运行中,如有请退出或者重启机器看是否解决
> 4. 请确认运行程序的用户是否有管理员权限,如非管理员权限用户请尝试使用管理员运行看是否成功 > 4. 请确认运行程序的用户是否有管理员权限,如非管理员权限用户请尝试使用管理员运行看是否成功
\ No newline at end of file
...@@ -80,7 +80,7 @@ predict(self, img_file, transforms=None, topk=5) ...@@ -80,7 +80,7 @@ predict(self, img_file, transforms=None, topk=5)
## 其它分类器类 ## 其它分类器类
PaddleX提供了共计22种分类器,所有分类器均提供同`ResNet50`相同的训练`train`,评估`evaluate`和预测`predict`接口,各模型效果可参考[模型库](../appendix/model_zoo.md) PaddleX提供了共计22种分类器,所有分类器均提供同`ResNet50`相同的训练`train`,评估`evaluate`和预测`predict`接口,各模型效果可参考[模型库](https://paddlex.readthedocs.io/zh_CN/latest/appendix/model_zoo.html)
### ResNet18 ### ResNet18
```python ```python
......
...@@ -186,10 +186,10 @@ paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=Fal ...@@ -186,10 +186,10 @@ paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=Fal
> **参数** > **参数**
> > - **num_classes** (int): 类别数。 > > - **num_classes** (int): 类别数。
> > - **width** (int): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64] > > - **width** (int|str): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64, '18_small_v1']。'18_small_v1'是18的轻量级版本
> > - **use_bce_loss** (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。 > > - **use_bce_loss** (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
> > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。 > > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
> > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。 > > - **class_weight** (list|str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
> > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。 > > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
### train 训练接口 ### train 训练接口
......
...@@ -200,7 +200,7 @@ ComposedSegTransforms.add_augmenters(augmenters) ...@@ -200,7 +200,7 @@ ComposedSegTransforms.add_augmenters(augmenters)
import paddlex as pdx import paddlex as pdx
from paddlex.seg import transforms from paddlex.seg import transforms
train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[512, 512]) train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[512, 512])
eval_transforms = transforms.ComposedYOLOTransforms(mode='eval') eval_transforms = transforms.ComposedSegTransforms(mode='eval')
# 添加数据增强 # 添加数据增强
import imgaug.augmenters as iaa import imgaug.augmenters as iaa
......
...@@ -146,10 +146,11 @@ paddlex.interpret.normlime(img_file, ...@@ -146,10 +146,11 @@ paddlex.interpret.normlime(img_file,
dataset=None, dataset=None,
num_samples=3000, num_samples=3000,
batch_size=50, batch_size=50,
save_dir='./') save_dir='./',
normlime_weights_file=None)
``` ```
使用NormLIME算法将模型预测结果的可解释性可视化。 使用NormLIME算法将模型预测结果的可解释性可视化。
NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会提前计算一定数量的测试样本的LIME结果,然后对相同的特征进行权重的归一化,这样来得到一个全局的输入和输出的关系 NormLIME是利用一定数量的样本来出一个全局的解释。由于NormLIME计算量较大,此处采用一种简化的方式:使用一定数量的测试样本(目前默认使用所有测试样本),对每个样本进行特征提取,映射到同一个特征空间;然后以此特征做为输入,以模型输出做为输出,使用线性回归对其进行拟合,得到一个全局的输入和输出的关系。之后,对一测试样本进行解释时,使用NormLIME全局的解释,来对LIME的结果进行滤波,使最终的可视化结果更加稳定
**注意:** 可解释性结果可视化目前只支持分类模型。 **注意:** 可解释性结果可视化目前只支持分类模型。
...@@ -159,9 +160,10 @@ NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会 ...@@ -159,9 +160,10 @@ NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会
>* **dataset** (paddlex.datasets): 数据集读取器,默认为None。 >* **dataset** (paddlex.datasets): 数据集读取器,默认为None。
>* **num_samples** (int): LIME用于学习线性模型的采样数,默认为3000。 >* **num_samples** (int): LIME用于学习线性模型的采样数,默认为3000。
>* **batch_size** (int): 预测数据batch大小,默认为50。 >* **batch_size** (int): 预测数据batch大小,默认为50。
>* **save_dir** (str): 可解释性可视化结果(保存为png格式文件)和中间文件存储路径。 >* **save_dir** (str): 可解释性可视化结果(保存为png格式文件)和中间文件存储路径。
>* **normlime_weights_file** (str): NormLIME初始化文件名,若不存在,则计算一次,保存于该路径;若存在,则直接载入。
**注意:** dataset`读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。 **注意:** dataset`读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。NormLIME可解释性结果可视化目前只支持分类模型。
### 使用示例 ### 使用示例
> 对预测可解释性结果可视化的过程可参见[代码](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/interpret/normlime.py)。 > 对预测可解释性结果可视化的过程可参见[代码](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/interpret/normlime.py)。
...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
:caption: 目录: :caption: 目录:
model_zoo.md model_zoo.md
slim_model_zoo.md
metrics.md metrics.md
interpret.md interpret.md
parameters.md parameters.md
......
...@@ -20,9 +20,20 @@ LIME的使用方式可参见[代码示例](https://github.com/PaddlePaddle/Paddl ...@@ -20,9 +20,20 @@ LIME的使用方式可参见[代码示例](https://github.com/PaddlePaddle/Paddl
## NormLIME ## NormLIME
NormLIME是在LIME上的改进,LIME的解释是局部性的,是针对当前样本给的特定解释,而NormLIME是利用一定数量的样本对当前样本的一个全局性的解释,有一定的降噪效果。其实现步骤如下所示: NormLIME是在LIME上的改进,LIME的解释是局部性的,是针对当前样本给的特定解释,而NormLIME是利用一定数量的样本对当前样本的一个全局性的解释,有一定的降噪效果。其实现步骤如下所示:
1. 下载Kmeans模型参数和ResNet50_vc网络前三层参数。(ResNet50_vc的参数是在ImageNet上训练所得网络的参数;使用ImageNet图像作为数据集,每张图像从ResNet50_vc的第三层输出提取对应超象素位置上的平均特征和质心上的特征,训练将得到此处的Kmeans模型) 1. 下载Kmeans模型参数和ResNet50_vc网络前三层参数。(ResNet50_vc的参数是在ImageNet上训练所得网络的参数;使用ImageNet图像作为数据集,每张图像从ResNet50_vc的第三层输出提取对应超象素位置上的平均特征和质心上的特征,训练将得到此处的Kmeans模型)
2. 计算测试集中每张图像的LIME结果。(如无测试集,可用验证集代替) 2. 使用测试集中的数据计算normlime的权重信息(如无测试集,可用验证集代替):
3. 使用Kmeans模型对所有图像中的所有像素进行聚类。 对每张图像的处理:
4. 对在同一个簇的超像素(相同的特征)进行权重的归一化,得到每个超像素的权重,以此来解释模型。 (1) 获取图像的超像素。
(2) 使用ResNet50_vc获取第三层特征,针对每个超像素位置,组合质心特征和均值特征`F`
(3) 把`F`作为Kmeans模型的输入,计算每个超像素位置的聚类中心。
(4) 使用训练好的分类模型,预测该张图像的`label`
对所有图像的处理:
(1) 以每张图像的聚类中心信息组成的向量(若某聚类中心出现在盖章途中设置为1,反之为0)为输入,
预测的`label`为输出,构建逻辑回归函数`regression_func`
(2) 由`regression_func`可获得每个聚类中心不同类别下的权重,并对权重进行归一化。
3. 使用Kmeans模型获取需要可视化图像的每个超像素的聚类中心。
4. 对需要可视化的图像的超像素进行随机遮掩构成新的图像。
5. 对每张构造的图像使用预测模型预测label。
6. 根据normlime的权重信息,每个超像素可获不同的权重,选取最高的权重为最终的权重,以此来解释模型。
NormLIME的使用方式可参见[代码示例](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/interpret/normlime.py)[api介绍](../apis/visualize.html#normlime)。在使用时,参数中的`num_samples`设置尤为重要,其表示上述步骤2中的随机采样的个数,若设置过小会影响可解释性结果的稳定性,若设置过大则将在上述步骤3耗费较长时间;参数`batch_size`则表示在计算上述步骤3时,预测的batch size,若设置过小将在上述步骤3耗费较长时间,而上限则根据机器配置决定;而`dataset`则是由测试集或验证集构造的数据。 NormLIME的使用方式可参见[代码示例](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/interpret/normlime.py)[api介绍](../apis/visualize.html#normlime)。在使用时,参数中的`num_samples`设置尤为重要,其表示上述步骤2中的随机采样的个数,若设置过小会影响可解释性结果的稳定性,若设置过大则将在上述步骤3耗费较长时间;参数`batch_size`则表示在计算上述步骤3时,预测的batch size,若设置过小将在上述步骤3耗费较长时间,而上限则根据机器配置决定;而`dataset`则是由测试集或验证集构造的数据。
......
...@@ -40,8 +40,8 @@ ...@@ -40,8 +40,8 @@
|[FasterRCNN-ResNet101](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_1x.tar)| 212.5MB | 582.911 | 38.3 | |[FasterRCNN-ResNet101](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_1x.tar)| 212.5MB | 582.911 | 38.3 |
|[FasterRCNN-ResNet50-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_fpn_1x.tar)| 167.7MB | 83.189 | 37.2 | |[FasterRCNN-ResNet50-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_fpn_1x.tar)| 167.7MB | 83.189 | 37.2 |
|[FasterRCNN-ResNet50_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_2x.tar)|167.8MB | 128.277 | 38.9 | |[FasterRCNN-ResNet50_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_2x.tar)|167.8MB | 128.277 | 38.9 |
|[FasterRCNN-ResNet101-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_fpn_1x.tar)| 244.2MB | 156.097 | 38.7 | |[FasterRCNN-ResNet101-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_fpn_1x.tar)| 244.2MB | 119.788 | 38.7 |
|[FasterRCNN-ResNet101_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_2x.tar) |244.3MB | 119.788 | 40.5 | |[FasterRCNN-ResNet101_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_2x.tar) |244.3MB | 156.097 | 40.5 |
|[FasterRCNN-HRNet_W18-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) |115.5MB | 81.592 | 36 | |[FasterRCNN-HRNet_W18-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) |115.5MB | 81.592 | 36 |
|[YOLOv3-DarkNet53](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar)|249.2MB | 42.672 | 38.9 | |[YOLOv3-DarkNet53](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar)|249.2MB | 42.672 | 38.9 |
|[YOLOv3-MobileNetV1](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar) |99.2MB | 15.442 | 29.3 | |[YOLOv3-MobileNetV1](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar) |99.2MB | 15.442 | 29.3 |
......
# PaddleX压缩模型库
## 图像分类
数据集:ImageNet-1000
### 量化
| 模型 | 压缩策略 | Top-1准确率 | 存储体积 | TensorRT时延(V100, ms) |
|:--:|:---:|:--:|:--:|:--:|
|MobileNetV1| 无 |70.99%| 17MB | -|
|MobileNetV1| 量化 |70.18% (-0.81%)| 4.4MB | - |
| MobileNetV2 | 无 |72.15%| 15MB | - |
| MobileNetV2 | 量化 | 71.15% (-1%)| 4.0MB | - |
|ResNet50| 无 |76.50%| 99MB | 2.71 |
|ResNet50| 量化 |76.33% (-0.17%)| 25.1MB | 1.19 |
分类模型Lite时延(ms)
| 设备 | 模型类型 | 压缩策略 | armv7 Thread 1 | armv7 Thread 2 | armv7 Thread 4 | armv8 Thread 1 | armv8 Thread 2 | armv8 Thread 4 |
| ------- | ----------- | ------------- | -------------- | -------------- | -------------- | -------------- | -------------- | -------------- |
| 高通835 | MobileNetV1 | 无 | 96.1942 | 53.2058 | 32.4468 | 88.4955 | 47.95 | 27.5189 |
| 高通835 | MobileNetV1 | 量化 | 60.5615 | 32.4016 | 16.6596 | 56.5266 | 29.7178 | 15.1459 |
| 高通835 | MobileNetV2 | 无 | 65.715 | 38.1346 | 25.155 | 61.3593 | 36.2038 | 22.849 |
| 高通835 | MobileNetV2 | 量化 | 48.3495 | 30.3069 | 22.1506 | 45.8715 | 27.4105 | 18.2223 |
| 高通835 | ResNet50 | 无 | 526.811 | 319.6486 | 205.8345 | 506.1138 | 335.1584 | 214.8936 |
| 高通835 | ResNet50 | 量化 | 476.0507 | 256.5963 | 139.7266 | 461.9176 | 248.3795 | 149.353 |
| 高通855 | MobileNetV1 | 无 | 33.5086 | 19.5773 | 11.7534 | 31.3474 | 18.5382 | 10.0811 |
| 高通855 | MobileNetV1 | 量化 | 37.0498 | 21.7081 | 11.0779 | 14.0947 | 8.1926 | 4.2934 |
| 高通855 | MobileNetV2 | 无 | 25.0396 | 15.2862 | 9.6609 | 22.909 | 14.1797 | 8.8325 |
| 高通855 | MobileNetV2 | 量化 | 28.1631 | 18.3917 | 11.8333 | 16.9399 | 11.1772 | 7.4176 |
| 高通855 | ResNet50 | 无 | 185.3705 | 113.0825 | 87.0741 | 177.7367 | 110.0433 | 74.4114 |
| 高通855 | ResNet50 | 量化 | 328.2683 | 201.9937 | 106.744 | 242.6397 | 150.0338 | 79.8659 |
| 麒麟970 | MobileNetV1 | 无 | 101.2455 | 56.4053 | 35.6484 | 94.8985 | 51.7251 | 31.9511 |
| 麒麟970 | MobileNetV1 | 量化 | 62.4412 | 32.2585 | 16.6215 | 57.825 | 29.2573 | 15.1206 |
| 麒麟970 | MobileNetV2 | 无 | 70.4176 | 42.0795 | 25.1939 | 68.9597 | 39.2145 | 22.6617 |
| 麒麟970 | MobileNetV2 | 量化 | 53.0961 | 31.7987 | 21.8334 | 49.383 | 28.2358 | 18.3642 |
| 麒麟970 | ResNet50 | 无 | 586.8943 | 344.0858 | 228.2293 | 573.3344 | 351.4332 | 225.8006 |
| 麒麟970 | ResNet50 | 量化 | 489.6188 | 258.3279 | 142.6063 | 480.0064 | 249.5339 | 138.5284 |
### 剪裁
PaddleLite推理耗时说明:
环境:Qualcomm SnapDragon 845 + armv8
速度指标:Thread1/Thread2/Thread4耗时
| 模型 | 压缩策略 | Top-1 | 存储体积 |PaddleLite推理耗时|TensorRT推理速度(FPS)|
|:--:|:---:|:--:|:--:|:--:|:--:|
| MobileNetV1 | 无 | 70.99% | 17MB | 66.052\35.8014\19.5762|-|
| MobileNetV1 | 剪裁 -30% | 70.4% (-0.59%) | 12MB | 46.5958\25.3098\13.6982|-|
| MobileNetV1 | 剪裁 -50% | 69.8% (-1.19%) | 9MB | 37.9892\20.7882\11.3144|-|
## 目标检测
### 量化
数据集: COCO2017
| 模型 | 压缩策略 | 数据集 | Image/GPU | 输入608 Box AP | 存储体积 | TensorRT时延(V100, ms) |
| :----------------------------: | :---------: | :----: | :-------: | :------------: | :------------: | :----------: |
| MobileNet-V1-YOLOv3 | 无 | COCO | 8 | 29.3 | 95MB | - |
| MobileNet-V1-YOLOv3 | 量化 | COCO | 8 | 27.9 (-1.4)| 25MB | - |
| R34-YOLOv3 | 无 | COCO | 8 | 36.2 | 162MB | - |
| R34-YOLOv3 | 量化 | COCO | 8 | 35.7 (-0.5) | 42.7MB | - |
### 剪裁
数据集:Pasacl VOC & COCO2017
PaddleLite推理耗时说明:
环境:Qualcomm SnapDragon 845 + armv8
速度指标:Thread1/Thread2/Thread4耗时
| 模型 | 压缩策略 | 数据集 | Image/GPU | 输入608 Box mmAP | 存储体积 | PaddleLite推理耗时(ms)(608*608) | TensorRT推理速度(FPS)(608*608) |
| :----------------------------: | :---------------: | :--------: | :-------: | :------------: | :----------: | :--------------: | :--------------: |
| MobileNet-V1-YOLOv3 | 无 | Pascal VOC | 8 | 76.2 | 94MB | 1238\796.943\520.101|60.04|
| MobileNet-V1-YOLOv3 | 剪裁 -52.88% | Pascal VOC | 8 | 77.6 (+1.4) | 31MB | 602.497\353.759\222.427 |99.36|
| MobileNet-V1-YOLOv3 | 无 | COCO | 8 | 29.3 | 95MB |-|-|
| MobileNet-V1-YOLOv3 | 剪裁 -51.77% | COCO | 8 | 26.0 (-3.3) | 32MB |-|73.93|
## 语义分割
数据集:Cityscapes
### 量化
| 模型 | 压缩策略 | mIoU | 存储体积 |
| :--------------------: | :---------: | :-----------: | :------------: |
| DeepLabv3-MobileNetv2 | 无 | 69.81 | 7.4MB |
| DeepLabv3-MobileNetv2 | 量化 | 67.59 (-2.22) | 2.1MB |
图像分割模型Lite时延(ms), 输入尺寸769 x 769
| 设备 | 模型类型 | 压缩策略 | armv7 Thread 1 | armv7 Thread 2 | armv7 Thread 4 | armv8 Thread 1 | armv8 Thread 2 | armv8 Thread 4 |
| ------- | ---------------------- | ------------- | -------------- | -------------- | -------------- | -------------- | -------------- | -------------- |
| 高通835 | Deeplabv3-MobileNetV2 | 无 | 1282.8126 | 793.2064 | 653.6538 | 1193.9908 | 737.1827 | 593.4522 |
| 高通835 | Deeplabv3-MobileNetV2 | 量化 | 981.44 | 658.4969 | 538.6166 | 885.3273 | 586.1284 | 484.0018 |
| 高通855 | Deeplabv3-MobileNetV2 | 无 | 639.4425 | 390.1851 | 322.7014 | 477.7667 | 339.7411 | 262.2847 |
| 高通855 | Deeplabv3-MobileNetV2 | 量化 | 705.7589 | 474.4076 | 427.2951 | 394.8352 | 297.4035 | 264.6724 |
| 麒麟970 | Deeplabv3-MobileNetV2 | 无 | 1771.1301 | 1746.0569 | 1222.4805 | 1448.9739 | 1192.4491 | 760.606 |
| 麒麟970 | Deeplabv3-MobileNetV2 | 量化 | 1320.386 | 918.5328 | 672.2481 | 1020.753 | 820.094 | 591.4114 |
### 剪裁
PaddleLite推理耗时说明:
环境:Qualcomm SnapDragon 845 + armv8
速度指标:Thread1/Thread2/Thread4耗时
| 模型 | 压缩方法 | mIoU | 存储体积 | PaddleLite推理耗时 | TensorRT推理速度(FPS) |
| :-------: | :---------------: | :-----------: | :------: | :------------: | :----: |
| FastSCNN | 无 | 69.64 | 11MB | 1226.36\682.96\415.664 |39.53|
| FastSCNN | 剪裁 -47.60% | 66.68 (-2.96) | 5.7MB | 866.693\494.467\291.748 |51.48|
# PaddleX视觉方案介绍 # PaddleX视觉方案介绍
PaddleX目前提供了4种视觉任务解决方案,分别为图像分类、目标检测、实例分割和语义分割。用户可以根据自己的任务类型按需选取 PaddleX针对图像分类、目标检测、实例分割和语义分割4种视觉任务提供了包含模型选择、压缩策略选择、部署方案选择在内的解决方案。用户根据自己的需求选择合适的模型,选择合适的压缩策略来减小模型的计算量和存储体积、加速模型预测推理,最后选择合适的部署方案将模型部署在移动端或者服务器端
## 图像分类 ## 模型选择
### 图像分类
图像分类任务指的是输入一张图片,模型预测图片的类别,如识别为风景、动物、车等。 图像分类任务指的是输入一张图片,模型预测图片的类别,如识别为风景、动物、车等。
![](./images/image_classification.png) ![](./images/image_classification.png)
对于图像分类任务,针对不同的应用场景,PaddleX提供了百度改进的模型,见下表所示 对于图像分类任务,针对不同的应用场景,PaddleX提供了百度改进的模型,见下表所示:
> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
> 表中CPU预测速度 (测试CPU型号为)。
> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
> 测速时模型输入大小为224 x 224,Top1准确率为ImageNet-1000数据集上评估所得。
| 模型 | 模型大小 | GPU预测速度 | CPU预测速度 | ARM芯片预测速度 | 准确率 | 备注 | | 模型 | 模型特点 | 存储体积 | GPU预测速度(毫秒) | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| Top1准确率 |
| :--------- | :------ | :---------- | :-----------| :------------- | :----- | :--- | | :--------- | :------ | :---------- | :-----------| :------------- | :------------- |:--- |
| MobileNetV3_small_ssld | 12M | - | - | - | 71.3% |适用于移动端场景 | | MobileNetV3_small_ssld | 轻量高速,适用于追求高速的实时移动端场景 | 12.5MB | 7.08837 | - | 6.546 | 71.3.0% |
| MobileNetV3_large_ssld | 21M | - | - | - | 79.0% | 适用于移动端/服务端场景 | | ShuffleNetV2 | 轻量级模型,精度相对偏低,适用于要求更小存储体积的实时移动端场景 | 10.2MB | 15.40 | - | 10.941 | 68.8% |
| ResNet50_vd_ssld | 102.8MB | - | - | - | 82.4% | 适用于服务端场景 | | MobileNetV3_large_ssld | 轻量级模型,在存储方面优势不大,在速度和精度上表现适中,适合于移动端场景 | 22.8MB | 8.06651 | - | 19.803 | 79.0% |
| ResNet101_vd_ssld | 179.2MB | - | - | - |83.7% | 适用于服务端场景 | | MobileNetV2 | 轻量级模型,适用于使用GPU预测的移动端场景 | 15.0MB | 5.92667 | - | 23.318| 72.2 % |
| ResNet50_vd_ssld | 高精度模型,预测时间较短,适用于大多数的服务器端场景 | 103.5MB | 7.79264 | - | - | 82.4% |
| ResNet101_vd_ssld | 超高精度模型,预测时间相对较长,适用于有大数据量时的服务器端场景 | 180.5MB | 13.34580 | - | -| 83.7% |
| Xception65 | 超高精度模型,预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 161.6MB | 13.87017 | - | - | 80.3% |
除上述模型外,PaddleX还支持近20种图像分类模型,模型列表可参考[PaddleX模型库](../appendix/model_zoo.md) 包括上述模型,PaddleX支持近20种图像分类模型,其余模型可参考[PaddleX模型库](../appendix/model_zoo.md)
## 目标检测 ### 目标检测
目标检测任务指的是输入图像,模型识别出图像中物体的位置(用矩形框框出来,并给出框的位置),和物体的类别,如在手机等零件质检中,用于检测外观上的瑕疵等。 目标检测任务指的是输入图像,模型识别出图像中物体的位置(用矩形框框出来,并给出框的位置),和物体的类别,如在手机等零件质检中,用于检测外观上的瑕疵等。
![](./images/object_detection.png) ![](./images/object_detection.png)
对于目标检测,针对不同的应用场景,PaddleX提供了主流的YOLOv3模型和Faster-RCNN模型,见下表所示 对于目标检测,针对不同的应用场景,PaddleX提供了主流的YOLOv3模型和Faster-RCNN模型,见下表所示
> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
| 模型 | 模型大小 | GPU预测速度 | CPU预测速度 |ARM芯片预测速度 | BoxMAP | 备注 | > 表中CPU预测速度 (测试CPU型号为)。
| :------- | :------- | :--------- | :---------- | :------------- | :----- | :--- | > 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
| YOLOv3-MobileNetV1 | 101.2M | - | - | - | 29.3 | | > 测速时YOLOv3的输入大小为608 x 608,FasterRCNN的输入大小为800 x 1333,Box mmAP为COCO2017数据集上评估所得。
| YOLOv3-MobileNetV3 | 94.6M | - | - | - | 31.6 | |
| YOLOv3-ResNet34 | 169.7M | - | - | - | 36.2 | | | 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| Box mmAP |
| YOLOv3-DarkNet53 | 252.4 | - | - | - | 38.9 | | | :------- | :------- | :--------- | :---------- | :------------- | :------------- |:--- |
| YOLOv3-MobileNetV3_larget | 适用于追求高速预测的移动端场景 | 100.7MB | 143.322 | - | - | 31.6 |
除YOLOv3模型外,PaddleX同时也支持FasterRCNN模型,支持FPN结构和5种backbone网络,详情可参考[PaddleX模型库](../appendix/model_zoo.md) | YOLOv3-MobileNetV1 | 精度相对偏低,适用于追求高速预测的服务器端场景 | 99.2MB| 15.422 | - | - | 29.3 |
| YOLOv3-DarkNet53 | 在预测速度和模型精度上都有较好的表现,适用于大多数的服务器端场景| 249.2MB | 42.672 | - | - | 38.9 |
## 实例分割 | FasterRCNN-ResNet50-FPN | 经典的二阶段检测器,预测速度相对较慢,适用于重视模型精度的服务器端场景 | 167.MB | 83.189 | - | -| 37.2 |
| FasterRCNN-HRNet_W18-FPN | 适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景 | 115.5MB | 81.592 | - | - | 36 |
| FasterRCNN-ResNet101_vd-FPN | 超高精度模型,预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 244.3MB | 156.097 | - | - | 40.5 |
除上述模型外,YOLOv3和Faster RCNN还支持其他backbone,详情可参考[PaddleX模型库](../appendix/model_zoo.md)
### 实例分割
在目标检测中,模型识别出图像中物体的位置和物体的类别。而实例分割则是在目标检测的基础上,做了像素级的分类,将框内的属于目标物体的像素识别出来。 在目标检测中,模型识别出图像中物体的位置和物体的类别。而实例分割则是在目标检测的基础上,做了像素级的分类,将框内的属于目标物体的像素识别出来。
![](./images/instance_segmentation.png) ![](./images/instance_segmentation.png)
PaddleX目前提供了实例分割MaskRCNN模型,支持5种不同的backbone网络,详情可参考[PaddleX模型库](../appendix/model_zoo.md) PaddleX目前提供了实例分割MaskRCNN模型,支持5种不同的backbone网络,详情可参考[PaddleX模型库](../appendix/model_zoo.md)
> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
| 模型 | 模型大小 | GPU预测速度 | CPU预测速度 | ARM芯片预测速度 | BoxMAP | SegMAP | 备注 | > 表中CPU预测速度 (测试CPU型号为)。
| :---- | :------- | :---------- | :---------- | :------------- | :----- | :----- | :--- | > 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
| MaskRCNN-ResNet50_vd-FPN | 185.5M | - | - | - | 39.8 | 35.4 | | > 测速时MaskRCNN的输入大小为800 x 1333,Box mmAP和Seg mmAP为COCO2017数据集上评估所得。
| MaskRCNN-ResNet101_vd-FPN | 268.6M | - | - | - | 41.4 | 36.8 | |
| 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| Box mmAP | Seg mmAP |
| :---- | :------- | :---------- | :---------- | :----- | :----- | :--- |:--- |
## 语义分割 | MaskRCNN-HRNet_W18-FPN | 适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景 | - | - | - | - | 37.0 | 33.4 |
| MaskRCNN-ResNet50-FPN | 精度较高,适合大多数的服务器端场景| 185.5M | - | - | - | 37.9 | 34.2 |
| MaskRCNN-ResNet101_vd-FPN | 高精度但预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 268.6M | - | - | - | 41.4 | 36.8 |
### 语义分割
语义分割用于对图像做像素级的分类,应用在人像分类、遥感图像识别等场景。 语义分割用于对图像做像素级的分类,应用在人像分类、遥感图像识别等场景。
![](./images/semantic_segmentation.png) ![](./images/semantic_segmentation.png)
对于语义分割,PaddleX也针对不同的应用场景,提供了不同的模型选择,如下表所示 对于语义分割,PaddleX也针对不同的应用场景,提供了不同的模型选择,如下表所示
> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
> 表中CPU预测速度 (测试CPU型号为)。
> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
> 测速时模型的输入大小为1024 x 2048,mIOU为Cityscapes数据集上评估所得。
| 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| mIOU |
| :---- | :------- | :---------- | :---------- | :----- | :----- |:--- |
| DeepLabv3p-MobileNetV2_x1.0 | 轻量级模型,适用于移动端场景| - | - | - | 69.8% |
| HRNet_W18_Small_v1 | 轻量高速,适用于移动端场景 | - | - | - | - |
| FastSCNN | 轻量高速,适用于追求高速预测的移动端或服务器端场景 | - | - | - | 69.64 |
| HRNet_W18 | 高精度模型,适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景| - | - | - | 79.36 |
| DeepLabv3p-Xception65 | 高精度但预测时间更长,在处理较大数据量时有较高的精度,适用于服务器且背景复杂的场景| - | - | - | 79.3% |
## 压缩策略选择
PaddleX提供包含模型剪裁、定点量化的模型压缩策略来减小模型的计算量和存储体积,加快模型部署后的预测速度。使用不同压缩策略在图像分类、目标检测和语义分割模型上的模型精度和预测速度详见以下内容,用户可以选择根据自己的需求选择合适的压缩策略,进一步优化模型的性能。
| 压缩策略 | 策略特点 |
| :---- | :------- |
| 量化 | 较为显著地减少模型的存储体积,适用于移动端或服务期端TensorRT部署,在移动端对于MobileNet系列模型有明显的加速效果 |
| 剪裁 | 能够去除冗余的参数,达到显著减少参数计算量和模型体积的效果,提升模型的预测性能,适用于CPU部署或移动端部署(GPU上无明显加速效果) |
| 先剪裁后量化 | 可以进一步提升模型的预测性能,适用于移动端或服务器端TensorRT部署 |
### 性能对比
* 表中各指标的格式为XXX/YYY,XXX表示未采取压缩策略时的指标,YYY表示压缩后的指标
* 分类模型的准确率指的是ImageNet-1000数据集上的Top1准确率(模型输入大小为224x224),检测模型的准确率指的是COCO2017数据集上的mmAP(模型输入大小为608x608),分割模型的准确率指的是Cityscapes数据集上mIOU(模型输入大小为769x769)
* 量化策略中,PaddleLiter推理环境为Qualcomm SnapDragon 855 + armv8,速度指标为Thread4耗时
* 剪裁策略中,PaddleLiter推理环境为Qualcomm SnapDragon 845 + armv8,速度指标为Thread4耗时
| 模型 | 压缩策略 | 存储体积(MB) | 准确率(%) | PaddleLite推理耗时(ms) |
| :--: | :------: | :------: | :----: | :----------------: |
| MobileNetV1 | 量化 | 17/4.4 | 70.99/70.18 | 10.0811/4.2934 |
| MobileNetV1 | 剪裁 -30% | 17/12 | 70.99/70.4 | 19.5762/13.6982 |
| YOLOv3-MobileNetV1 | 量化 | 95/25 | 29.3/27.9 | - |
| YOLOv3-MobileNetV1 | 剪裁 -51.77% | 95/25 | 29.3/26 | - |
| Deeplabv3-MobileNetV2 | 量化 | 7.4/1.8 | 63.26/62.03 | 593.4522/484.0018 |
| FastSCNN | 剪裁 -47.60% | 11/5.7 | 69.64/66.68 | 415.664/291.748 |
更多模型在不同设备上压缩前后的指标对比详见[PaddleX压缩模型库](appendix/slim_model_zoo.md)
压缩策略的具体使用流程详见[模型压缩](tutorials/compress)
**注意:PaddleX中全部图像分类模型和语义分割模型都支持量化和剪裁操作,目标检测仅有YOLOv3支持量化和剪裁操作。**
## 模型部署
PaddleX提供服务器端python部署、服务器端c++部署、服务器端加密部署、OpenVINO部署、移动端部署共5种部署方案,用户可以根据自己的需求选择合适的部署方案,点击以下链接了解部署的具体流程。
| 模型 | 模型大小 | GPU预测速度 | CPU预测速度 | ARM芯片预测速度 | mIOU | 备注 | | 部署方案 | 部署流程 |
| :---- | :------- | :---------- | :---------- | :------------- | :----- | :----- | | :------: | :------: |
| DeepLabv3p-MobileNetV2_x0.25 | | - | - | - | - | - | | 服务器端python部署 | [部署流程](tutorials/deploy/deploy_server/deploy_python.html)|
| DeepLabv3p-MobileNetV2_x1.0 | | - | - | - | - | - | | 服务器端c++部署 | [部署流程](tutorials/deploy/deploy_server/deploy_cpp/) |
| DeepLabv3p-Xception65 | | - | - | - | - | - | | 服务器端加密部署 | [部署流程](tutorials/deploy/deploy_server/encryption.html) |
| UNet | | - | - | - | - | - | | OpenVINO部署 | [部署流程](tutorials/deploy/deploy_openvino.html) |
| 移动端部署 | [部署流程](tutorials/deploy/deploy_lite.html) |
docs/images/lime.png

802.8 KB | W: | H:

docs/images/lime.png

423.0 KB | W: | H:

docs/images/lime.png
docs/images/lime.png
docs/images/lime.png
docs/images/lime.png
  • 2-up
  • Swipe
  • Onion skin
docs/images/normlime.png

1.5 MB | W: | H:

docs/images/normlime.png

546.2 KB | W: | H:

docs/images/normlime.png
docs/images/normlime.png
docs/images/normlime.png
docs/images/normlime.png
  • 2-up
  • Swipe
  • Onion skin
...@@ -21,7 +21,7 @@ step 2: 将PaddleX模型导出为inference模型 ...@@ -21,7 +21,7 @@ step 2: 将PaddleX模型导出为inference模型
step 3: 将inference模型转换成PaddleLite模型 step 3: 将inference模型转换成PaddleLite模型
``` ```
python /path/to/PaddleX/deploy/lite/export_lite.py --model_dir /path/to/inference_model --save_file /path/to/onnx_model --place place/to/run python /path/to/PaddleX/deploy/lite/export_lite.py --model_dir /path/to/inference_model --save_file /path/to/lite_model --place place/to/run
``` ```
......
...@@ -30,7 +30,7 @@ PaddlePaddle C++ 预测库针对不同的`CPU`,`CUDA`,以及是否支持Tens ...@@ -30,7 +30,7 @@ PaddlePaddle C++ 预测库针对不同的`CPU`,`CUDA`,以及是否支持Tens
| ubuntu14.04_cuda10.0_cudnn7_avx_mkl | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.8.2-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz ) | | ubuntu14.04_cuda10.0_cudnn7_avx_mkl | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.8.2-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz ) |
| ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6 | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.8.2-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz) | | ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6 | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.8.2-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz) |
更多和更新的版本,请根据实际情况下载: [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/windows_cpp_inference.html#id1) 更多和更新的版本,请根据实际情况下载: [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)
下载并解压后`/root/projects/fluid_inference`目录包含内容为: 下载并解压后`/root/projects/fluid_inference`目录包含内容为:
``` ```
...@@ -42,7 +42,7 @@ fluid_inference ...@@ -42,7 +42,7 @@ fluid_inference
└── version.txt # 版本和编译信息 └── version.txt # 版本和编译信息
``` ```
**注意:** 预编译版本除`nv-jetson-cuda10-cudnn7.5-trt5` 以外其它包都是基于`GCC 4.8.5`编译,使用高版本`GCC`可能存在 `ABI`兼容性问题,建议降级或[自行编译预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12) **注意:** 预编译版本除`nv-jetson-cuda10-cudnn7.5-trt5` 以外其它包都是基于`GCC 4.8.5`编译,使用高版本`GCC`可能存在 `ABI`兼容性问题,建议降级或[自行编译预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12)
### Step4: 编译 ### Step4: 编译
......
# HumanSeg人像分割模型
本教程基于PaddleX核心分割网络,提供针对人像分割场景从预训练模型、Fine-tune、视频分割预测部署的全流程应用指南。
## 安装
**前置依赖**
* paddlepaddle >= 1.8.0
* python >= 3.5
```
pip install paddlex -i https://mirror.baidu.com/pypi/simple
```
安装的相关问题参考[PaddleX安装](https://paddlex.readthedocs.io/zh_CN/latest/install.html)
## 预训练模型
HumanSeg开放了在大规模人像数据上训练的两个预训练模型,满足多种使用场景的需求
| 模型类型 | Checkpoint Parameter | Inference Model | Quant Inference Model | 备注 |
| --- | --- | --- | ---| --- |
| HumanSeg-server | [humanseg_server_params](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_server.pdparams) | [humanseg_server_inference](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_server_inference.zip) | -- | 高精度模型,适用于服务端GPU且背景复杂的人像场景, 模型结构为Deeplabv3+/Xcetion65, 输入大小(512, 512) |
| HumanSeg-mobile | [humanseg_mobile_params](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_mobile.pdparams) | [humanseg_mobile_inference](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_mobile_inference.zip) | [humanseg_mobile_quant](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_mobile_quant.zip) | 轻量级模型, 适用于移动端或服务端CPU的前置摄像头场景,模型结构为HRNet_w18_samll_v1,输入大小(192, 192) |
模型性能
| 模型 | 模型大小 | 计算耗时 |
| --- | --- | --- |
|humanseg_server_inference| 158M | - |
|humanseg_mobile_inference | 5.8 M | 42.35ms |
|humanseg_mobile_quant | 1.6M | 24.93ms |
计算耗时运行环境: 小米,cpu:骁龙855, 内存:6GB, 图片大小:192*192
**NOTE:**
其中Checkpoint Parameter为模型权重,用于Fine-tuning场景。
* Inference Model和Quant Inference Model为预测部署模型,包含`__model__`计算图结构、`__params__`模型参数和`model.yaml`基础的模型配置信息。
* 其中Inference Model适用于服务端的CPU和GPU预测部署,Qunat Inference Model为量化版本,适用于通过Paddle Lite进行移动端等端侧设备部署。
执行以下脚本进行HumanSeg预训练模型的下载
```bash
python pretrain_weights/download_pretrain_weights.py
```
## 下载测试数据
我们提供了[supervise.ly](https://supervise.ly/)发布人像分割数据集**Supervisely Persons**, 从中随机抽取一小部分并转化成PaddleX可直接加载数据格式。通过运行以下代码进行快速下载,其中包含手机前置摄像头的人像测试视频`video_test.mp4`.
```bash
python data/download_data.py
```
## 快速体验视频流人像分割
结合DIS(Dense Inverse Search-basedmethod)光流算法预测结果与分割结果,改善视频流人像分割
```bash
# 通过电脑摄像头进行实时分割处理
python video_infer.py --model_dir pretrain_weights/humanseg_mobile_inference
# 对人像视频进行分割处理
python video_infer.py --model_dir pretrain_weights/humanseg_mobile_inference --video_path data/video_test.mp4
```
视频分割结果如下:
<img src="https://paddleseg.bj.bcebos.com/humanseg/data/video_test.gif" width="20%" height="20%"><img src="https://paddleseg.bj.bcebos.com/humanseg/data/result.gif" width="20%" height="20%">
根据所选背景进行背景替换,背景可以是一张图片,也可以是一段视频。
```bash
# 通过电脑摄像头进行实时背景替换处理, 也可通过'--background_video_path'传入背景视频
python bg_replace.py --model_dir pretrain_weights/humanseg_mobile_inference --background_image_path data/background.jpg
# 对人像视频进行背景替换处理, 也可通过'--background_video_path'传入背景视频
python bg_replace.py --model_dir pretrain_weights/humanseg_mobile_inference --video_path data/video_test.mp4 --background_image_path data/background.jpg
# 对单张图像进行背景替换
python bg_replace.py --model_dir pretrain_weights/humanseg_mobile_inference --image_path data/human_image.jpg --background_image_path data/background.jpg
```
背景替换结果如下:
<img src="https://paddleseg.bj.bcebos.com/humanseg/data/video_test.gif" width="20%" height="20%"><img src="https://paddleseg.bj.bcebos.com/humanseg/data/bg_replace.gif" width="20%" height="20%">
**NOTE**:
视频分割处理时间需要几分钟,请耐心等待。
提供的模型适用于手机摄像头竖屏拍摄场景,宽屏效果会略差一些。
## 训练
使用下述命令基于与训练模型进行Fine-tuning,请确保选用的模型结构`model_type`与模型参数`pretrain_weights`匹配。
```bash
# 指定GPU卡号(以0号卡为例)
export CUDA_VISIBLE_DEVICES=0
# 若不使用GPU,则将CUDA_VISIBLE_DEVICES指定为空
# export CUDA_VISIBLE_DEVICES=
python train.py --model_type HumanSegMobile \
--save_dir output/ \
--data_dir data/mini_supervisely \
--train_list data/mini_supervisely/train.txt \
--val_list data/mini_supervisely/val.txt \
--pretrain_weights pretrain_weights/humanseg_mobile_params \
--batch_size 8 \
--learning_rate 0.001 \
--num_epochs 10 \
--image_shape 192 192
```
其中参数含义如下:
* `--model_type`: 模型类型,可选项为:HumanSegServer和HumanSegMobile
* `--save_dir`: 模型保存路径
* `--data_dir`: 数据集路径
* `--train_list`: 训练集列表路径
* `--val_list`: 验证集列表路径
* `--pretrain_weights`: 预训练模型路径
* `--batch_size`: 批大小
* `--learning_rate`: 初始学习率
* `--num_epochs`: 训练轮数
* `--image_shape`: 网络输入图像大小(w, h)
更多命令行帮助可运行下述命令进行查看:
```bash
python train.py --help
```
**NOTE**
可通过更换`--model_type`变量与对应的`--pretrain_weights`使用不同的模型快速尝试。
## 评估
使用下述命令进行评估
```bash
python eval.py --model_dir output/best_model \
--data_dir data/mini_supervisely \
--val_list data/mini_supervisely/val.txt \
--image_shape 192 192
```
其中参数含义如下:
* `--model_dir`: 模型路径
* `--data_dir`: 数据集路径
* `--val_list`: 验证集列表路径
* `--image_shape`: 网络输入图像大小(w, h)
## 预测
使用下述命令进行预测, 预测结果默认保存在`./output/result/`文件夹中。
```bash
python infer.py --model_dir output/best_model \
--data_dir data/mini_supervisely \
--test_list data/mini_supervisely/test.txt \
--save_dir output/result \
--image_shape 192 192
```
其中参数含义如下:
* `--model_dir`: 模型路径
* `--data_dir`: 数据集路径
* `--test_list`: 测试集列表路径
* `--image_shape`: 网络输入图像大小(w, h)
## 模型导出
```bash
paddlex --export_inference --model_dir output/best_model \
--save_dir output/export
```
其中参数含义如下:
* `--model_dir`: 模型路径
* `--save_dir`: 导出模型保存路径
## 离线量化
```bash
python quant_offline.py --model_dir output/best_model \
--data_dir data/mini_supervisely \
--quant_list data/mini_supervisely/val.txt \
--save_dir output/quant_offline \
--image_shape 192 192
```
其中参数含义如下:
* `--model_dir`: 待量化模型路径
* `--data_dir`: 数据集路径
* `--quant_list`: 量化数据集列表路径,一般直接选择训练集或验证集
* `--save_dir`: 量化模型保存路径
* `--image_shape`: 网络输入图像大小(w, h)
# coding: utf8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os
import os.path as osp
import cv2
import numpy as np
from postprocess import postprocess, threshold_mask
import paddlex as pdx
import paddlex.utils.logging as logging
from paddlex.seg import transforms
def parse_args():
parser = argparse.ArgumentParser(
description='HumanSeg inference for video')
parser.add_argument(
'--model_dir',
dest='model_dir',
help='Model path for inference',
type=str)
parser.add_argument(
'--image_path',
dest='image_path',
help='Image including human',
type=str,
default=None)
parser.add_argument(
'--background_image_path',
dest='background_image_path',
help='Background image for replacing',
type=str,
default=None)
parser.add_argument(
'--video_path',
dest='video_path',
help='Video path for inference',
type=str,
default=None)
parser.add_argument(
'--background_video_path',
dest='background_video_path',
help='Background video path for replacing',
type=str,
default=None)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the inference results',
type=str,
default='./output')
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
return parser.parse_args()
def bg_replace(label_map, img, bg):
h, w, _ = img.shape
bg = cv2.resize(bg, (w, h))
label_map = np.repeat(label_map[:, :, np.newaxis], 3, axis=2)
comb = (label_map * img + (1 - label_map) * bg).astype(np.uint8)
return comb
def recover(img, im_info):
if im_info[0] == 'resize':
w, h = im_info[1][1], im_info[1][0]
img = cv2.resize(img, (w, h), cv2.INTER_LINEAR)
elif im_info[0] == 'padding':
w, h = im_info[1][0], im_info[1][0]
img = img[0:h, 0:w, :]
return img
def infer(args):
resize_h = args.image_shape[1]
resize_w = args.image_shape[0]
test_transforms = transforms.Compose([transforms.Normalize()])
model = pdx.load_model(args.model_dir)
if not osp.exists(args.save_dir):
os.makedirs(args.save_dir)
# 图像背景替换
if args.image_path is not None:
if not osp.exists(args.image_path):
raise Exception('The --image_path is not existed: {}'.format(
args.image_path))
if args.background_image_path is None:
raise Exception(
'The --background_image_path is not set. Please set it')
else:
if not osp.exists(args.background_image_path):
raise Exception(
'The --background_image_path is not existed: {}'.format(
args.background_image_path))
img = cv2.imread(args.image_path)
im_shape = img.shape
im_scale_x = float(resize_w) / float(im_shape[1])
im_scale_y = float(resize_h) / float(im_shape[0])
im = cv2.resize(
img,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=cv2.INTER_LINEAR)
image = im.astype('float32')
im_info = ('resize', im_shape[0:2])
pred = model.predict(image, test_transforms)
label_map = pred['label_map']
label_map = recover(label_map, im_info)
bg = cv2.imread(args.background_image_path)
save_name = osp.basename(args.image_path)
save_path = osp.join(args.save_dir, save_name)
result = bg_replace(label_map, img, bg)
cv2.imwrite(save_path, result)
# 视频背景替换,如果提供背景视频则以背景视频作为背景,否则采用提供的背景图片
else:
is_video_bg = False
if args.background_video_path is not None:
if not osp.exists(args.background_video_path):
raise Exception(
'The --background_video_path is not existed: {}'.format(
args.background_video_path))
is_video_bg = True
elif args.background_image_path is not None:
if not osp.exists(args.background_image_path):
raise Exception(
'The --background_image_path is not existed: {}'.format(
args.background_image_path))
else:
raise Exception(
'Please offer backgound image or video. You should set --backbground_iamge_paht or --background_video_path'
)
disflow = cv2.DISOpticalFlow_create(
cv2.DISOPTICAL_FLOW_PRESET_ULTRAFAST)
prev_gray = np.zeros((resize_h, resize_w), np.uint8)
prev_cfd = np.zeros((resize_h, resize_w), np.float32)
is_init = True
if args.video_path is not None:
logging.info('Please wait. It is computing......')
if not osp.exists(args.video_path):
raise Exception('The --video_path is not existed: {}'.format(
args.video_path))
cap_video = cv2.VideoCapture(args.video_path)
fps = cap_video.get(cv2.CAP_PROP_FPS)
width = int(cap_video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
save_name = osp.basename(args.video_path)
save_name = save_name.split('.')[0]
save_path = osp.join(args.save_dir, save_name + '.avi')
cap_out = cv2.VideoWriter(
save_path,
cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps,
(width, height))
if is_video_bg:
cap_bg = cv2.VideoCapture(args.background_video_path)
frames_bg = cap_bg.get(cv2.CAP_PROP_FRAME_COUNT)
current_frame_bg = 1
else:
img_bg = cv2.imread(args.background_image_path)
while cap_video.isOpened():
ret, frame = cap_video.read()
if ret:
im_shape = frame.shape
im_scale_x = float(resize_w) / float(im_shape[1])
im_scale_y = float(resize_h) / float(im_shape[0])
im = cv2.resize(
frame,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=cv2.INTER_LINEAR)
image = im.astype('float32')
im_info = ('resize', im_shape[0:2])
pred = model.predict(image, test_transforms)
score_map = pred['score_map']
cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
cur_gray = cv2.resize(cur_gray, (resize_w, resize_h))
score_map = 255 * score_map[:, :, 1]
optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \
disflow, is_init)
prev_gray = cur_gray.copy()
prev_cfd = optflow_map.copy()
is_init = False
optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
optflow_map = threshold_mask(
optflow_map, thresh_bg=0.2, thresh_fg=0.8)
score_map = recover(optflow_map, im_info)
#循环读取背景帧
if is_video_bg:
ret_bg, frame_bg = cap_bg.read()
if ret_bg:
if current_frame_bg == frames_bg:
current_frame_bg = 1
cap_bg.set(cv2.CAP_PROP_POS_FRAMES, 0)
else:
break
current_frame_bg += 1
comb = bg_replace(score_map, frame, frame_bg)
else:
comb = bg_replace(score_map, frame, img_bg)
cap_out.write(comb)
else:
break
if is_video_bg:
cap_bg.release()
cap_video.release()
cap_out.release()
# 当没有输入预测图像和视频的时候,则打开摄像头
else:
cap_video = cv2.VideoCapture(0)
if not cap_video.isOpened():
raise IOError("Error opening video stream or file, "
"--video_path whether existing: {}"
" or camera whether working".format(
args.video_path))
return
if is_video_bg:
cap_bg = cv2.VideoCapture(args.background_video_path)
frames_bg = cap_bg.get(cv2.CAP_PROP_FRAME_COUNT)
current_frame_bg = 1
else:
img_bg = cv2.imread(args.background_image_path)
while cap_video.isOpened():
ret, frame = cap_video.read()
if ret:
im_shape = frame.shape
im_scale_x = float(resize_w) / float(im_shape[1])
im_scale_y = float(resize_h) / float(im_shape[0])
im = cv2.resize(
frame,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=cv2.INTER_LINEAR)
image = im.astype('float32')
im_info = ('resize', im_shape[0:2])
pred = model.predict(image, test_transforms)
score_map = pred['score_map']
cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
cur_gray = cv2.resize(cur_gray, (resize_w, resize_h))
score_map = 255 * score_map[:, :, 1]
optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \
disflow, is_init)
prev_gray = cur_gray.copy()
prev_cfd = optflow_map.copy()
is_init = False
optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
optflow_map = threshold_mask(
optflow_map, thresh_bg=0.2, thresh_fg=0.8)
score_map = recover(optflow_map, im_info)
#循环读取背景帧
if is_video_bg:
ret_bg, frame_bg = cap_bg.read()
if ret_bg:
if current_frame_bg == frames_bg:
current_frame_bg = 1
cap_bg.set(cv2.CAP_PROP_POS_FRAMES, 0)
else:
break
current_frame_bg += 1
comb = bg_replace(score_map, frame, frame_bg)
else:
comb = bg_replace(score_map, frame, img_bg)
cv2.imshow('HumanSegmentation', comb)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
if is_video_bg:
cap_bg.release()
cap_video.release()
if __name__ == "__main__":
args = parse_args()
infer(args)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
import paddlex as pdx
def download_data(savepath):
url = "https://paddleseg.bj.bcebos.com/humanseg/data/mini_supervisely.zip"
pdx.utils.download_and_decompress(url=url, path=savepath)
url = "https://paddleseg.bj.bcebos.com/humanseg/data/video_test.zip"
pdx.utils.download_and_decompress(url=url, path=savepath)
if __name__ == "__main__":
download_data(LOCAL_PATH)
print("Data download finish!")
# coding: utf8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import paddlex as pdx
import paddlex.utils.logging as logging
from paddlex.seg import transforms
def parse_args():
parser = argparse.ArgumentParser(description='HumanSeg training')
parser.add_argument(
'--model_dir',
dest='model_dir',
help='Model path for evaluating',
type=str,
default='output/best_model')
parser.add_argument(
'--data_dir',
dest='data_dir',
help='The root directory of dataset',
type=str)
parser.add_argument(
'--val_list',
dest='val_list',
help='Val list file of dataset',
type=str,
default=None)
parser.add_argument(
'--batch_size',
dest='batch_size',
help='Mini batch size',
type=int,
default=128)
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
return parser.parse_args()
def dict2str(dict_input):
out = ''
for k, v in dict_input.items():
try:
v = round(float(v), 6)
except:
pass
out = out + '{}={}, '.format(k, v)
return out.strip(', ')
def evaluate(args):
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape), transforms.Normalize()])
eval_dataset = pdx.datasets.SegDataset(
data_dir=args.data_dir,
file_list=args.val_list,
transforms=eval_transforms)
model = pdx.load_model(args.model_dir)
metrics = model.evaluate(eval_dataset, args.batch_size)
logging.info('[EVAL] Finished, {} .'.format(dict2str(metrics)))
if __name__ == '__main__':
args = parse_args()
evaluate(args)
# coding: utf8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os
import os.path as osp
import cv2
import numpy as np
import tqdm
import paddlex as pdx
from paddlex.seg import transforms
def parse_args():
parser = argparse.ArgumentParser(
description='HumanSeg prediction and visualization')
parser.add_argument(
'--model_dir',
dest='model_dir',
help='Model path for prediction',
type=str)
parser.add_argument(
'--data_dir',
dest='data_dir',
help='The root directory of dataset',
type=str)
parser.add_argument(
'--test_list',
dest='test_list',
help='Test list file of dataset',
type=str)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the inference results',
type=str,
default='./output/result')
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
return parser.parse_args()
def infer(args):
def makedir(path):
sub_dir = osp.dirname(path)
if not osp.exists(sub_dir):
os.makedirs(sub_dir)
test_transforms = transforms.Compose(
[transforms.Resize(args.image_shape), transforms.Normalize()])
model = pdx.load_model(args.model_dir)
added_saved_path = osp.join(args.save_dir, 'added')
mat_saved_path = osp.join(args.save_dir, 'mat')
scoremap_saved_path = osp.join(args.save_dir, 'scoremap')
with open(args.test_list, 'r') as f:
files = f.readlines()
for file in tqdm.tqdm(files):
file = file.strip()
im_file = osp.join(args.data_dir, file)
im = cv2.imread(im_file)
result = model.predict(im_file, transforms=test_transforms)
# save added image
added_image = pdx.seg.visualize(
im_file, result, weight=0.6, save_dir=None)
added_image_file = osp.join(added_saved_path, file)
makedir(added_image_file)
cv2.imwrite(added_image_file, added_image)
# save score map
score_map = result['score_map'][:, :, 1]
score_map = (score_map * 255).astype(np.uint8)
score_map_file = osp.join(scoremap_saved_path, file)
makedir(score_map_file)
cv2.imwrite(score_map_file, score_map)
# save mat image
score_map = np.expand_dims(score_map, axis=-1)
mat_image = np.concatenate([im, score_map], axis=2)
mat_file = osp.join(mat_saved_path, file)
ext = osp.splitext(mat_file)[-1]
mat_file = mat_file.replace(ext, '.png')
makedir(mat_file)
cv2.imwrite(mat_file, mat_image)
if __name__ == '__main__':
args = parse_args()
infer(args)
# coding: utf8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
def cal_optical_flow_tracking(pre_gray, cur_gray, prev_cfd, dl_weights,
disflow):
"""计算光流跟踪匹配点和光流图
输入参数:
pre_gray: 上一帧灰度图
cur_gray: 当前帧灰度图
prev_cfd: 上一帧光流图
dl_weights: 融合权重图
disflow: 光流数据结构
返回值:
is_track: 光流点跟踪二值图,即是否具有光流点匹配
track_cfd: 光流跟踪图
"""
check_thres = 8
h, w = pre_gray.shape[:2]
track_cfd = np.zeros_like(prev_cfd)
is_track = np.zeros_like(pre_gray)
flow_fw = disflow.calc(pre_gray, cur_gray, None)
flow_bw = disflow.calc(cur_gray, pre_gray, None)
flow_fw = np.round(flow_fw).astype(np.int)
flow_bw = np.round(flow_bw).astype(np.int)
y_list = np.array(range(h))
x_list = np.array(range(w))
yv, xv = np.meshgrid(y_list, x_list)
yv, xv = yv.T, xv.T
cur_x = xv + flow_fw[:, :, 0]
cur_y = yv + flow_fw[:, :, 1]
# 超出边界不跟踪
not_track = (cur_x < 0) + (cur_x >= w) + (cur_y < 0) + (cur_y >= h)
flow_bw[~not_track] = flow_bw[cur_y[~not_track], cur_x[~not_track]]
not_track += (np.square(flow_fw[:, :, 0] + flow_bw[:, :, 0]) +
np.square(flow_fw[:, :, 1] + flow_bw[:, :, 1])
) >= check_thres
track_cfd[cur_y[~not_track], cur_x[~not_track]] = prev_cfd[~not_track]
is_track[cur_y[~not_track], cur_x[~not_track]] = 1
not_flow = np.all(np.abs(flow_fw) == 0,
axis=-1) * np.all(np.abs(flow_bw) == 0, axis=-1)
dl_weights[cur_y[not_flow], cur_x[not_flow]] = 0.05
return track_cfd, is_track, dl_weights
def fuse_optical_flow_tracking(track_cfd, dl_cfd, dl_weights, is_track):
"""光流追踪图和人像分割结构融合
输入参数:
track_cfd: 光流追踪图
dl_cfd: 当前帧分割结果
dl_weights: 融合权重图
is_track: 光流点匹配二值图
返回
cur_cfd: 光流跟踪图和人像分割结果融合图
"""
fusion_cfd = dl_cfd.copy()
is_track = is_track.astype(np.bool)
fusion_cfd[is_track] = dl_weights[is_track] * dl_cfd[is_track] + (
1 - dl_weights[is_track]) * track_cfd[is_track]
# 确定区域
index_certain = ((dl_cfd > 0.9) + (dl_cfd < 0.1)) * is_track
index_less01 = (dl_weights < 0.1) * index_certain
fusion_cfd[index_less01] = 0.3 * dl_cfd[index_less01] + 0.7 * track_cfd[
index_less01]
index_larger09 = (dl_weights >= 0.1) * index_certain
fusion_cfd[index_larger09] = 0.4 * dl_cfd[
index_larger09] + 0.6 * track_cfd[index_larger09]
return fusion_cfd
def threshold_mask(img, thresh_bg, thresh_fg):
dst = (img / 255.0 - thresh_bg) / (thresh_fg - thresh_bg)
dst[np.where(dst > 1)] = 1
dst[np.where(dst < 0)] = 0
return dst.astype(np.float32)
def postprocess(cur_gray, scoremap, prev_gray, pre_cfd, disflow, is_init):
"""光流优化
Args:
cur_gray : 当前帧灰度图
pre_gray : 前一帧灰度图
pre_cfd :前一帧融合结果
scoremap : 当前帧分割结果
difflow : 光流
is_init : 是否第一帧
Returns:
fusion_cfd : 光流追踪图和预测结果融合图
"""
h, w = scoremap.shape
cur_cfd = scoremap.copy()
if is_init:
if h <= 64 or w <= 64:
disflow.setFinestScale(1)
elif h <= 160 or w <= 160:
disflow.setFinestScale(2)
else:
disflow.setFinestScale(3)
fusion_cfd = cur_cfd
else:
weights = np.ones((h, w), np.float32) * 0.3
track_cfd, is_track, weights = cal_optical_flow_tracking(
prev_gray, cur_gray, pre_cfd, weights, disflow)
fusion_cfd = fuse_optical_flow_tracking(track_cfd, cur_cfd, weights,
is_track)
return fusion_cfd
# coding: utf8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
import paddlex as pdx
import paddlehub as hub
model_urls = {
"PaddleX_HumanSeg_Server_Params":
"https://bj.bcebos.com/paddlex/models/humanseg/humanseg_server_params.tar",
"PaddleX_HumanSeg_Server_Inference":
"https://bj.bcebos.com/paddlex/models/humanseg/humanseg_server_inference.tar",
"PaddleX_HumanSeg_Mobile_Params":
"https://bj.bcebos.com/paddlex/models/humanseg/humanseg_mobile_params.tar",
"PaddleX_HumanSeg_Mobile_Inference":
"https://bj.bcebos.com/paddlex/models/humanseg/humanseg_mobile_inference.tar",
"PaddleX_HumanSeg_Mobile_Quant":
"https://bj.bcebos.com/paddlex/models/humanseg/humanseg_mobile_quant.tar"
}
if __name__ == "__main__":
for model_name, url in model_urls.items():
pdx.utils.download_and_decompress(url=url, path=LOCAL_PATH)
print("Pretrained Model download success!")
# coding: utf8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import paddlex as pdx
from paddlex.seg import transforms
def parse_args():
parser = argparse.ArgumentParser(description='HumanSeg training')
parser.add_argument(
'--model_dir',
dest='model_dir',
help='Model path for quant',
type=str,
default='output/best_model')
parser.add_argument(
'--batch_size',
dest='batch_size',
help='Mini batch size',
type=int,
default=1)
parser.add_argument(
'--batch_nums',
dest='batch_nums',
help='Batch number for quant',
type=int,
default=10)
parser.add_argument(
'--data_dir',
dest='data_dir',
help='the root directory of dataset',
type=str)
parser.add_argument(
'--quant_list',
dest='quant_list',
help='Image file list for model quantization, it can be vat.txt or train.txt',
type=str,
default=None)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the quant model',
type=str,
default='./output/quant_offline')
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
return parser.parse_args()
def evaluate(args):
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape), transforms.Normalize()])
eval_dataset = pdx.datasets.SegDataset(
data_dir=args.data_dir,
file_list=args.quant_list,
transforms=eval_transforms)
model = pdx.load_model(args.model_dir)
pdx.slim.export_quant_model(model, eval_dataset, args.batch_size,
args.batch_nums, args.save_dir)
if __name__ == '__main__':
args = parse_args()
evaluate(args)
# coding: utf8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import paddlex as pdx
from paddlex.seg import transforms
MODEL_TYPE = ['HumanSegMobile', 'HumanSegServer']
def parse_args():
parser = argparse.ArgumentParser(description='HumanSeg training')
parser.add_argument(
'--model_type',
dest='model_type',
help="Model type for traing, which is one of ('HumanSegMobile', 'HumanSegServer')",
type=str,
default='HumanSegMobile')
parser.add_argument(
'--data_dir',
dest='data_dir',
help='The root directory of dataset',
type=str)
parser.add_argument(
'--train_list',
dest='train_list',
help='Train list file of dataset',
type=str)
parser.add_argument(
'--val_list',
dest='val_list',
help='Val list file of dataset',
type=str,
default=None)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the model snapshot',
type=str,
default='./output')
parser.add_argument(
'--num_classes',
dest='num_classes',
help='Number of classes',
type=int,
default=2)
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
parser.add_argument(
'--num_epochs',
dest='num_epochs',
help='Number epochs for training',
type=int,
default=100)
parser.add_argument(
'--batch_size',
dest='batch_size',
help='Mini batch size',
type=int,
default=128)
parser.add_argument(
'--learning_rate',
dest='learning_rate',
help='Learning rate',
type=float,
default=0.01)
parser.add_argument(
'--pretrain_weights',
dest='pretrain_weights',
help='The path of pretrianed weight',
type=str,
default=None)
parser.add_argument(
'--resume_checkpoint',
dest='resume_checkpoint',
help='The path of resume checkpoint',
type=str,
default=None)
parser.add_argument(
'--use_vdl',
dest='use_vdl',
help='Whether to use visualdl',
action='store_true')
parser.add_argument(
'--save_interval_epochs',
dest='save_interval_epochs',
help='The interval epochs for save a model snapshot',
type=int,
default=5)
return parser.parse_args()
def train(args):
train_transforms = transforms.Compose([
transforms.Resize(args.image_shape), transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose(
[transforms.Resize(args.image_shape), transforms.Normalize()])
train_dataset = pdx.datasets.SegDataset(
data_dir=args.data_dir,
file_list=args.train_list,
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir=args.data_dir,
file_list=args.val_list,
transforms=eval_transforms)
if args.model_type == 'HumanSegMobile':
model = pdx.seg.HRNet(
num_classes=args.num_classes, width='18_small_v1')
elif args.model_type == 'HumanSegServer':
model = pdx.seg.DeepLabv3p(
num_classes=args.num_classes, backbone='Xception65')
else:
raise ValueError(
"--model_type: {} is set wrong, it shold be one of ('HumanSegMobile', "
"'HumanSegLite', 'HumanSegServer')".format(args.model_type))
model.train(
num_epochs=args.num_epochs,
train_dataset=train_dataset,
train_batch_size=args.batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=args.save_interval_epochs,
learning_rate=args.learning_rate,
pretrain_weights=args.pretrain_weights,
resume_checkpoint=args.resume_checkpoint,
save_dir=args.save_dir,
use_vdl=args.use_vdl)
if __name__ == '__main__':
args = parse_args()
train(args)
# coding: utf8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os
import os.path as osp
import cv2
import numpy as np
from postprocess import postprocess, threshold_mask
import paddlex as pdx
import paddlex.utils.logging as logging
from paddlex.seg import transforms
def parse_args():
parser = argparse.ArgumentParser(
description='HumanSeg inference for video')
parser.add_argument(
'--model_dir',
dest='model_dir',
help='Model path for inference',
type=str)
parser.add_argument(
'--video_path',
dest='video_path',
help='Video path for inference, camera will be used if the path not existing',
type=str,
default=None)
parser.add_argument(
'--save_dir',
dest='save_dir',
help='The directory for saving the inference results',
type=str,
default='./output')
parser.add_argument(
"--image_shape",
dest="image_shape",
help="The image shape for net inputs.",
nargs=2,
default=[192, 192],
type=int)
return parser.parse_args()
def recover(img, im_info):
if im_info[0] == 'resize':
w, h = im_info[1][1], im_info[1][0]
img = cv2.resize(img, (w, h), cv2.INTER_LINEAR)
elif im_info[0] == 'padding':
w, h = im_info[1][0], im_info[1][0]
img = img[0:h, 0:w, :]
return img
def video_infer(args):
resize_h = args.image_shape[1]
resize_w = args.image_shape[0]
model = pdx.load_model(args.model_dir)
test_transforms = transforms.Compose([transforms.Normalize()])
if not args.video_path:
cap = cv2.VideoCapture(0)
else:
cap = cv2.VideoCapture(args.video_path)
if not cap.isOpened():
raise IOError("Error opening video stream or file, "
"--video_path whether existing: {}"
" or camera whether working".format(args.video_path))
return
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
disflow = cv2.DISOpticalFlow_create(cv2.DISOPTICAL_FLOW_PRESET_ULTRAFAST)
prev_gray = np.zeros((resize_h, resize_w), np.uint8)
prev_cfd = np.zeros((resize_h, resize_w), np.float32)
is_init = True
fps = cap.get(cv2.CAP_PROP_FPS)
if args.video_path:
logging.info("Please wait. It is computing......")
# 用于保存预测结果视频
if not osp.exists(args.save_dir):
os.makedirs(args.save_dir)
out = cv2.VideoWriter(
osp.join(args.save_dir, 'result.avi'),
cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps, (width, height))
# 开始获取视频帧
while cap.isOpened():
ret, frame = cap.read()
if ret:
im_shape = frame.shape
im_scale_x = float(resize_w) / float(im_shape[1])
im_scale_y = float(resize_h) / float(im_shape[0])
im = cv2.resize(
frame,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=cv2.INTER_LINEAR)
image = im.astype('float32')
im_info = ('resize', im_shape[0:2])
pred = model.predict(image, test_transforms)
score_map = pred['score_map']
cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
score_map = 255 * score_map[:, :, 1]
optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \
disflow, is_init)
prev_gray = cur_gray.copy()
prev_cfd = optflow_map.copy()
is_init = False
optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
optflow_map = threshold_mask(
optflow_map, thresh_bg=0.2, thresh_fg=0.8)
img_matting = np.repeat(
optflow_map[:, :, np.newaxis], 3, axis=2)
img_matting = recover(img_matting, im_info)
bg_im = np.ones_like(img_matting) * 255
comb = (img_matting * frame +
(1 - img_matting) * bg_im).astype(np.uint8)
out.write(comb)
else:
break
cap.release()
out.release()
else:
while cap.isOpened():
ret, frame = cap.read()
if ret:
im_shape = frame.shape
im_scale_x = float(resize_w) / float(im_shape[1])
im_scale_y = float(resize_h) / float(im_shape[0])
im = cv2.resize(
frame,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=cv2.INTER_LINEAR)
image = im.astype('float32')
im_info = ('resize', im_shape[0:2])
pred = model.predict(image, test_transforms)
score_map = pred['score_map']
cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
cur_gray = cv2.resize(cur_gray, (resize_w, resize_h))
score_map = 255 * score_map[:, :, 1]
optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \
disflow, is_init)
prev_gray = cur_gray.copy()
prev_cfd = optflow_map.copy()
is_init = False
optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
optflow_map = threshold_mask(
optflow_map, thresh_bg=0.2, thresh_fg=0.8)
img_matting = np.repeat(
optflow_map[:, :, np.newaxis], 3, axis=2)
img_matting = recover(img_matting, im_info)
bg_im = np.ones_like(img_matting) * 255
comb = (img_matting * frame +
(1 - img_matting) * bg_im).astype(np.uint8)
cv2.imshow('HumanSegmentation', comb)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
cap.release()
if __name__ == "__main__":
args = parse_args()
video_infer(args)
# 使用教程——训练模型
本目录下整理了使用PaddleX训练模型的示例代码,代码中均提供了示例数据的自动下载,并均使用单张GPU卡进行训练。
|代码 | 模型任务 | 数据 |
|------|--------|---------|
|classification/mobilenetv2.py | 图像分类MobileNetV2 | 蔬菜分类 |
|classification/resnet50.py | 图像分类ResNet50 | 蔬菜分类 |
|detection/faster_rcnn_r50_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
|detection/mask_rcnn_f50_fpn.py | 实例分割MaskRCNN | 垃圾分拣 |
|segmentation/deeplabv3p.py | 语义分割DeepLabV3| 视盘分割 |
|segmentation/unet.py | 语义分割UNet | 视盘分割 |
|segmentation/hrnet.py | 语义分割HRNet | 视盘分割 |
|segmentation/fast_scnn.py | 语义分割FastSCNN | 视盘分割 |
## 开始训练
在安装PaddleX后,使用如下命令开始训练
```
python classification/mobilenetv2.py
```
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.cls import transforms
import paddlex as pdx
# 下载和解压蔬菜分类数据集
veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/cls_transforms.html#composedclstransforms
train_transforms = transforms.ComposedClsTransforms(mode='train', crop_size=[224, 224])
eval_transforms = transforms.ComposedClsTransforms(mode='eval', crop_size=[224, 224])
# 定义训练和验证所用的数据集
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/classification.html#imagenet
train_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/train_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/val_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/classification.html#resnet50
model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels))
model.train(
num_epochs=10,
train_dataset=train_dataset,
train_batch_size=32,
eval_dataset=eval_dataset,
lr_decay_epochs=[4, 6, 8],
learning_rate=0.025,
save_dir='output/mobilenetv2',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddle.fluid as fluid
from paddlex.cls import transforms
import paddlex as pdx
# 下载和解压蔬菜分类数据集
veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/cls_transforms.html#composedclstransforms
train_transforms = transforms.ComposedClsTransforms(mode='train', crop_size=[224, 224])
eval_transforms = transforms.ComposedClsTransforms(mode='eval', crop_size=[224, 224])
# 定义训练和验证所用的数据集
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/classification.html#imagenet
train_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/train_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/val_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=eval_transforms)
# PaddleX支持自定义构建优化器
step_each_epoch = train_dataset.num_samples // 32
learning_rate = fluid.layers.cosine_decay(
learning_rate=0.025, step_each_epoch=step_each_epoch, epochs=10)
optimizer = fluid.optimizer.Momentum(
learning_rate=learning_rate,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(4e-5))
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/resnet50/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/classification.html#resnet50
model = pdx.cls.ResNet50(num_classes=len(train_dataset.labels))
model.train(
num_epochs=10,
train_dataset=train_dataset,
train_batch_size=32,
eval_dataset=eval_dataset,
optimizer=optimizer,
save_dir='output/resnet50',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.det import transforms
import paddlex as pdx
# 下载和解压昆虫检测数据集
insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#composedrcnntransforms
train_transforms = transforms.ComposedRCNNTransforms(mode='train', min_max_size=[800, 1333])
eval_transforms = transforms.ComposedRCNNTransforms(mode='eval', min_max_size=[800, 1333])
# 定义训练和验证所用的数据集
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/detection.html#vocdetection
train_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/train_list.txt',
label_list='insect_det/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/val_list.txt',
label_list='insect_det/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/detection.html#fasterrcnn
num_classes = len(train_dataset.labels) + 1
model = pdx.det.FasterRCNN(num_classes=num_classes)
model.train(
num_epochs=12,
train_dataset=train_dataset,
train_batch_size=2,
eval_dataset=eval_dataset,
learning_rate=0.0025,
lr_decay_epochs=[8, 11],
save_dir='output/faster_rcnn_r50_fpn',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.det import transforms
import paddlex as pdx
# 下载和解压小度熊分拣数据集
xiaoduxiong_dataset = 'https://bj.bcebos.com/paddlex/datasets/xiaoduxiong_ins_det.tar.gz'
pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./')
# 定义训练和验证时的transforms
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#composedrcnntransforms
train_transforms = transforms.ComposedRCNNTransforms(mode='train', min_max_size=[800, 1333])
eval_transforms = transforms.ComposedRCNNTransforms(mode='eval', min_max_size=[800, 1333])
# 定义训练和验证所用的数据集
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/detection.html#cocodetection
train_dataset = pdx.datasets.CocoDetection(
data_dir='xiaoduxiong_ins_det/JPEGImages',
ann_file='xiaoduxiong_ins_det/train.json',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.CocoDetection(
data_dir='xiaoduxiong_ins_det/JPEGImages',
ann_file='xiaoduxiong_ins_det/val.json',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/instance_segmentation.html#maskrcnn
num_classes = len(train_dataset.labels) + 1
model = pdx.det.MaskRCNN(num_classes=num_classes)
model.train(
num_epochs=12,
train_dataset=train_dataset,
train_batch_size=1,
eval_dataset=eval_dataset,
learning_rate=0.00125,
warmup_steps=10,
lr_decay_epochs=[8, 11],
save_dir='output/mask_rcnn_r50_fpn',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.det import transforms
import paddlex as pdx
# 下载和解压昆虫检测数据集
insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#composedyolotransforms
train_transforms = transforms.ComposedYOLOv3Transforms(mode='train', shape=[608, 608])
eval_transforms = transforms.ComposedYOLOv3Transforms(mode='eva', shape=[608, 608])
# 定义训练和验证所用的数据集
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/detection.html#vocdetection
train_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/train_list.txt',
label_list='insect_det/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/val_list.txt',
label_list='insect_det/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/detection.html#yolov3
num_classes = len(train_dataset.labels)
model = pdx.det.YOLOv3(num_classes=num_classes, backbone='DarkNet53')
model.train(
num_epochs=270,
train_dataset=train_dataset,
train_batch_size=8,
eval_dataset=eval_dataset,
learning_rate=0.000125,
lr_decay_epochs=[210, 240],
save_dir='output/yolov3_darknet53',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
from paddlex.seg import transforms
# 下载和解压视盘分割数据集
optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/seg_transforms.html#composedsegtransforms
train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[769, 769])
eval_transforms = transforms.ComposedSegTransforms(mode='eval')
train_transforms.add_augmenters([
transforms.RandomRotate()
])
# 定义训练和验证所用的数据集
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/semantic_segmentation.html#segdataset
train_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/train_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/val_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/deeplab/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#deeplabv3p
num_classes = len(train_dataset.labels)
model = pdx.seg.DeepLabv3p(num_classes=num_classes)
model.train(
num_epochs=40,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.01,
save_dir='output/deeplab',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
from paddlex.seg import transforms
# 下载和解压视盘分割数据集
optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/seg_transforms.html#composedsegtransforms
train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[769, 769])
eval_transforms = transforms.ComposedSegTransforms(mode='eval')
# 定义训练和验证所用的数据集
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/semantic_segmentation.html#segdataset
train_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/train_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/val_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#hrnet
num_classes = len(train_dataset.labels)
model = pdx.seg.HRNet(num_classes=num_classes)
model.train(
num_epochs=20,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.01,
save_dir='output/hrnet',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
from paddlex.seg import transforms
# 下载和解压视盘分割数据集
optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/seg_transforms.html#composedsegtransforms
train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[769, 769])
eval_transforms = transforms.ComposedSegTransforms(mode='eval')
# 定义训练和验证所用的数据集
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/semantic_segmentation.html#segdataset
train_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/train_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/val_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#unet
num_classes = len(train_dataset.labels)
model = pdx.seg.UNet(num_classes=num_classes)
model.train(
num_epochs=20,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.01,
save_dir='output/unet',
use_vdl=True)
...@@ -53,4 +53,4 @@ log_level = 2 ...@@ -53,4 +53,4 @@ log_level = 2
from . import interpret from . import interpret
__version__ = '1.0.6' __version__ = '1.0.7'
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -15,6 +15,7 @@ ...@@ -15,6 +15,7 @@
from six import text_type as _text_type from six import text_type as _text_type
import argparse import argparse
import sys import sys
import paddlex.utils.logging as logging
def arg_parser(): def arg_parser():
...@@ -94,15 +95,15 @@ def main(): ...@@ -94,15 +95,15 @@ def main():
if args.export_onnx: if args.export_onnx:
assert args.model_dir is not None, "--model_dir should be defined while exporting onnx model" assert args.model_dir is not None, "--model_dir should be defined while exporting onnx model"
assert args.save_dir is not None, "--save_dir should be defined to create onnx model" assert args.save_dir is not None, "--save_dir should be defined to create onnx model"
assert args.fixed_input_shape is not None, "--fixed_input_shape should be defined [w,h] to create onnx model, such as [224,224]"
fixed_input_shape = [] model = pdx.load_model(args.model_dir)
if args.fixed_input_shape is not None: if model.status == "Normal" or model.status == "Prune":
fixed_input_shape = eval(args.fixed_input_shape) logging.error(
assert len( "Only support inference model, try to export model first as below,",
fixed_input_shape exit=False)
) == 2, "len of fixed input shape must == 2, such as [224,224]" logging.error(
model = pdx.load_model(args.model_dir, fixed_input_shape) "paddlex --export_inference --model_dir model_path --save_dir infer_model"
)
pdx.convertor.export_onnx_model(model, args.save_dir) pdx.convertor.export_onnx_model(model, args.save_dir)
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -30,119 +30,17 @@ def export_onnx(model_dir, save_dir, fixed_input_shape): ...@@ -30,119 +30,17 @@ def export_onnx(model_dir, save_dir, fixed_input_shape):
def export_onnx_model(model, save_dir): def export_onnx_model(model, save_dir):
support_list = [ if model.model_type == "detector" or model.__class__.__name__ == "FastSCNN":
'ResNet18', 'ResNet34', 'ResNet50', 'ResNet101', 'ResNet50_vd',
'ResNet101_vd', 'ResNet50_vd_ssld', 'ResNet101_vd_ssld', 'DarkNet53',
'MobileNetV1', 'MobileNetV2', 'DenseNet121', 'DenseNet161',
'DenseNet201'
]
if model.__class__.__name__ not in support_list:
raise Exception("Model: {} unsupport export to ONNX".format(
model.__class__.__name__))
try:
from fluid.utils import op_io_info, init_name_prefix
from onnx import helper, checker
import fluid_onnx.ops as ops
from fluid_onnx.variables import paddle_variable_to_onnx_tensor, paddle_onnx_weight
from debug.model_check import debug_model, Tracker
except Exception as e:
logging.error( logging.error(
"Import Module Failed! Please install paddle2onnx. Related requirements see https://github.com/PaddlePaddle/paddle2onnx." "Only image classifier models and semantic segmentation models(except FastSCNN) are supported to export to ONNX"
) )
raise e try:
place = fluid.CPUPlace() import x2paddle
exe = fluid.Executor(place) if x2paddle.__version__ < '0.7.4':
inference_scope = fluid.global_scope() logging.error("You need to upgrade x2paddle >= 0.7.4")
with fluid.scope_guard(inference_scope): except:
test_input_names = [ logging.error(
var.name for var in list(model.test_inputs.values()) "You need to install x2paddle first, pip install x2paddle>=0.7.4")
] from x2paddle.op_mapper.paddle_op_mapper import PaddleOpMapper
inputs_outputs_list = ["fetch", "feed"] mapper = PaddleOpMapper()
weights, weights_value_info = [], [] mapper.convert(model.test_prog, save_dir)
global_block = model.test_prog.global_block()
for var_name in global_block.vars:
var = global_block.var(var_name)
if var_name not in test_input_names\
and var.persistable:
weight, val_info = paddle_onnx_weight(
var=var, scope=inference_scope)
weights.append(weight)
weights_value_info.append(val_info)
# Create inputs
inputs = [
paddle_variable_to_onnx_tensor(v, global_block)
for v in test_input_names
]
logging.INFO("load the model parameter done.")
onnx_nodes = []
op_check_list = []
op_trackers = []
nms_first_index = -1
nms_outputs = []
for block in model.test_prog.blocks:
for op in block.ops:
if op.type in ops.node_maker:
# TODO: deal with the corner case that vars in
# different blocks have the same name
node_proto = ops.node_maker[str(op.type)](
operator=op, block=block)
op_outputs = []
last_node = None
if isinstance(node_proto, tuple):
onnx_nodes.extend(list(node_proto))
last_node = list(node_proto)
else:
onnx_nodes.append(node_proto)
last_node = [node_proto]
tracker = Tracker(str(op.type), last_node)
op_trackers.append(tracker)
op_check_list.append(str(op.type))
if op.type == "multiclass_nms" and nms_first_index < 0:
nms_first_index = 0
if nms_first_index >= 0:
_, _, output_op = op_io_info(op)
for output in output_op:
nms_outputs.extend(output_op[output])
else:
if op.type not in ['feed', 'fetch']:
op_check_list.append(op.type)
logging.info('The operator sets to run test case.')
logging.info(set(op_check_list))
# Create outputs
# Get the new names for outputs if they've been renamed in nodes' making
renamed_outputs = op_io_info.get_all_renamed_outputs()
test_outputs = list(model.test_outputs.values())
test_outputs_names = [var.name for var in model.test_outputs.values()]
test_outputs_names = [
name if name not in renamed_outputs else renamed_outputs[name]
for name in test_outputs_names
]
outputs = [
paddle_variable_to_onnx_tensor(v, global_block)
for v in test_outputs_names
]
# Make graph
onnx_name = 'paddlex.onnx'
onnx_graph = helper.make_graph(
nodes=onnx_nodes,
name=onnx_name,
initializer=weights,
inputs=inputs + weights_value_info,
outputs=outputs)
# Make model
onnx_model = helper.make_model(
onnx_graph, producer_name='PaddlePaddle')
# Model check
checker.check_model(onnx_model)
if onnx_model is not None:
onnx_model_file = os.path.join(save_dir, onnx_name)
if not os.path.exists(save_dir):
os.mkdir(save_dir)
with open(onnx_model_file, 'wb') as f:
f.write(onnx_model.SerializeToString())
logging.info("Saved converted model to path: %s" % onnx_model_file)
...@@ -100,7 +100,7 @@ class CocoDetection(VOCDetection): ...@@ -100,7 +100,7 @@ class CocoDetection(VOCDetection):
gt_score = np.ones((num_bbox, 1), dtype=np.float32) gt_score = np.ones((num_bbox, 1), dtype=np.float32)
is_crowd = np.zeros((num_bbox, 1), dtype=np.int32) is_crowd = np.zeros((num_bbox, 1), dtype=np.int32)
difficult = np.zeros((num_bbox, 1), dtype=np.int32) difficult = np.zeros((num_bbox, 1), dtype=np.int32)
gt_poly = None gt_poly = [None] * num_bbox
for i, box in enumerate(bboxes): for i, box in enumerate(bboxes):
catid = box['category_id'] catid = box['category_id']
...@@ -108,8 +108,6 @@ class CocoDetection(VOCDetection): ...@@ -108,8 +108,6 @@ class CocoDetection(VOCDetection):
gt_bbox[i, :] = box['clean_bbox'] gt_bbox[i, :] = box['clean_bbox']
is_crowd[i][0] = box['iscrowd'] is_crowd[i][0] = box['iscrowd']
if 'segmentation' in box: if 'segmentation' in box:
if gt_poly is None:
gt_poly = [None] * num_bbox
gt_poly[i] = box['segmentation'] gt_poly[i] = box['segmentation']
im_info = { im_info = {
...@@ -121,10 +119,9 @@ class CocoDetection(VOCDetection): ...@@ -121,10 +119,9 @@ class CocoDetection(VOCDetection):
'gt_class': gt_class, 'gt_class': gt_class,
'gt_bbox': gt_bbox, 'gt_bbox': gt_bbox,
'gt_score': gt_score, 'gt_score': gt_score,
'gt_poly': gt_poly,
'difficult': difficult 'difficult': difficult
} }
if gt_poly is not None:
label_info['gt_poly'] = gt_poly
coco_rec = (im_info, label_info) coco_rec = (im_info, label_info)
self.file_list.append([im_fname, coco_rec]) self.file_list.append([im_fname, coco_rec])
......
...@@ -39,14 +39,14 @@ class EasyDataCls(ImageNet): ...@@ -39,14 +39,14 @@ class EasyDataCls(ImageNet):
线程和'process'进程两种方式。默认为'process'(Windows和Mac下会强制使用thread,该参数无效)。 线程和'process'进程两种方式。默认为'process'(Windows和Mac下会强制使用thread,该参数无效)。
shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。 shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
""" """
def __init__(self, def __init__(self,
data_dir, data_dir,
file_list, file_list,
label_list, label_list,
transforms=None, transforms=None,
num_workers='auto', num_workers='auto',
buffer_size=100, buffer_size=8,
parallel_method='process', parallel_method='process',
shuffle=False): shuffle=False):
super(ImageNet, self).__init__( super(ImageNet, self).__init__(
...@@ -58,7 +58,7 @@ class EasyDataCls(ImageNet): ...@@ -58,7 +58,7 @@ class EasyDataCls(ImageNet):
self.file_list = list() self.file_list = list()
self.labels = list() self.labels = list()
self._epoch = 0 self._epoch = 0
with open(label_list, encoding=get_encoding(label_list)) as f: with open(label_list, encoding=get_encoding(label_list)) as f:
for line in f: for line in f:
item = line.strip() item = line.strip()
...@@ -73,8 +73,8 @@ class EasyDataCls(ImageNet): ...@@ -73,8 +73,8 @@ class EasyDataCls(ImageNet):
if not osp.isfile(json_file): if not osp.isfile(json_file):
continue continue
if not osp.exists(img_file): if not osp.exists(img_file):
raise IOError( raise IOError('The image file {} is not exist!'.format(
'The image file {} is not exist!'.format(img_file)) img_file))
with open(json_file, mode='r', \ with open(json_file, mode='r', \
encoding=get_encoding(json_file)) as j: encoding=get_encoding(json_file)) as j:
json_info = json.load(j) json_info = json.load(j)
...@@ -83,4 +83,3 @@ class EasyDataCls(ImageNet): ...@@ -83,4 +83,3 @@ class EasyDataCls(ImageNet):
self.num_samples = len(self.file_list) self.num_samples = len(self.file_list)
logging.info("{} samples in file {}".format( logging.info("{} samples in file {}".format(
len(self.file_list), file_list)) len(self.file_list), file_list))
\ No newline at end of file
...@@ -45,7 +45,7 @@ class ImageNet(Dataset): ...@@ -45,7 +45,7 @@ class ImageNet(Dataset):
label_list, label_list,
transforms=None, transforms=None,
num_workers='auto', num_workers='auto',
buffer_size=100, buffer_size=8,
parallel_method='process', parallel_method='process',
shuffle=False): shuffle=False):
super(ImageNet, self).__init__( super(ImageNet, self).__init__(
...@@ -70,8 +70,8 @@ class ImageNet(Dataset): ...@@ -70,8 +70,8 @@ class ImageNet(Dataset):
continue continue
full_path = osp.join(data_dir, items[0]) full_path = osp.join(data_dir, items[0])
if not osp.exists(full_path): if not osp.exists(full_path):
raise IOError( raise IOError('The image file {} is not exist!'.format(
'The image file {} is not exist!'.format(full_path)) full_path))
self.file_list.append([full_path, int(items[1])]) self.file_list.append([full_path, int(items[1])])
self.num_samples = len(self.file_list) self.num_samples = len(self.file_list)
logging.info("{} samples in file {}".format( logging.info("{} samples in file {}".format(
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -28,7 +28,7 @@ class SegDataset(Dataset): ...@@ -28,7 +28,7 @@ class SegDataset(Dataset):
Args: Args:
data_dir (str): 数据集所在的目录路径。 data_dir (str): 数据集所在的目录路径。
file_list (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对data_dir的相对路)。 file_list (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对data_dir的相对路)。
label_list (str): 描述数据集包含的类别信息文件路径。 label_list (str): 描述数据集包含的类别信息文件路径。默认值为None。
transforms (list): 数据集中每个样本的预处理/增强算子。 transforms (list): 数据集中每个样本的预处理/增强算子。
num_workers (int): 数据集中样本在预处理过程中的线程或进程数。默认为4。 num_workers (int): 数据集中样本在预处理过程中的线程或进程数。默认为4。
buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。 buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
...@@ -40,7 +40,7 @@ class SegDataset(Dataset): ...@@ -40,7 +40,7 @@ class SegDataset(Dataset):
def __init__(self, def __init__(self,
data_dir, data_dir,
file_list, file_list,
label_list, label_list=None,
transforms=None, transforms=None,
num_workers='auto', num_workers='auto',
buffer_size=100, buffer_size=100,
...@@ -56,10 +56,11 @@ class SegDataset(Dataset): ...@@ -56,10 +56,11 @@ class SegDataset(Dataset):
self.labels = list() self.labels = list()
self._epoch = 0 self._epoch = 0
with open(label_list, encoding=get_encoding(label_list)) as f: if label_list is not None:
for line in f: with open(label_list, encoding=get_encoding(label_list)) as f:
item = line.strip() for line in f:
self.labels.append(item) item = line.strip()
self.labels.append(item)
with open(file_list, encoding=get_encoding(file_list)) as f: with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f: for line in f:
...@@ -69,8 +70,8 @@ class SegDataset(Dataset): ...@@ -69,8 +70,8 @@ class SegDataset(Dataset):
full_path_im = osp.join(data_dir, items[0]) full_path_im = osp.join(data_dir, items[0])
full_path_label = osp.join(data_dir, items[1]) full_path_label = osp.join(data_dir, items[1])
if not osp.exists(full_path_im): if not osp.exists(full_path_im):
raise IOError( raise IOError('The image file {} is not exist!'.format(
'The image file {} is not exist!'.format(full_path_im)) full_path_im))
if not osp.exists(full_path_label): if not osp.exists(full_path_label):
raise IOError('The image file {} is not exist!'.format( raise IOError('The image file {} is not exist!'.format(
full_path_label)) full_path_label))
......
...@@ -17,6 +17,7 @@ import copy ...@@ -17,6 +17,7 @@ import copy
import os import os
import os.path as osp import os.path as osp
import random import random
import re
import numpy as np import numpy as np
from collections import OrderedDict from collections import OrderedDict
import xml.etree.ElementTree as ET import xml.etree.ElementTree as ET
...@@ -104,23 +105,60 @@ class VOCDetection(Dataset): ...@@ -104,23 +105,60 @@ class VOCDetection(Dataset):
else: else:
ct = int(tree.find('id').text) ct = int(tree.find('id').text)
im_id = np.array([int(tree.find('id').text)]) im_id = np.array([int(tree.find('id').text)])
pattern = re.compile('<object>', re.IGNORECASE)
objs = tree.findall('object') obj_tag = pattern.findall(
im_w = float(tree.find('size').find('width').text) str(ET.tostringlist(tree.getroot())))[0][1:-1]
im_h = float(tree.find('size').find('height').text) objs = tree.findall(obj_tag)
pattern = re.compile('<size>', re.IGNORECASE)
size_tag = pattern.findall(
str(ET.tostringlist(tree.getroot())))[0][1:-1]
size_element = tree.find(size_tag)
pattern = re.compile('<width>', re.IGNORECASE)
width_tag = pattern.findall(
str(ET.tostringlist(size_element)))[0][1:-1]
im_w = float(size_element.find(width_tag).text)
pattern = re.compile('<height>', re.IGNORECASE)
height_tag = pattern.findall(
str(ET.tostringlist(size_element)))[0][1:-1]
im_h = float(size_element.find(height_tag).text)
gt_bbox = np.zeros((len(objs), 4), dtype=np.float32) gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)
gt_class = np.zeros((len(objs), 1), dtype=np.int32) gt_class = np.zeros((len(objs), 1), dtype=np.int32)
gt_score = np.ones((len(objs), 1), dtype=np.float32) gt_score = np.ones((len(objs), 1), dtype=np.float32)
is_crowd = np.zeros((len(objs), 1), dtype=np.int32) is_crowd = np.zeros((len(objs), 1), dtype=np.int32)
difficult = np.zeros((len(objs), 1), dtype=np.int32) difficult = np.zeros((len(objs), 1), dtype=np.int32)
for i, obj in enumerate(objs): for i, obj in enumerate(objs):
cname = obj.find('name').text.strip() pattern = re.compile('<name>', re.IGNORECASE)
name_tag = pattern.findall(str(ET.tostringlist(obj)))[0][
1:-1]
cname = obj.find(name_tag).text.strip()
gt_class[i][0] = cname2cid[cname] gt_class[i][0] = cname2cid[cname]
_difficult = int(obj.find('difficult').text) pattern = re.compile('<difficult>', re.IGNORECASE)
x1 = float(obj.find('bndbox').find('xmin').text) diff_tag = pattern.findall(str(ET.tostringlist(obj)))[0][
y1 = float(obj.find('bndbox').find('ymin').text) 1:-1]
x2 = float(obj.find('bndbox').find('xmax').text) try:
y2 = float(obj.find('bndbox').find('ymax').text) _difficult = int(obj.find(diff_tag).text)
except Exception:
_difficult = 0
pattern = re.compile('<bndbox>', re.IGNORECASE)
box_tag = pattern.findall(str(ET.tostringlist(obj)))[0][1:
-1]
box_element = obj.find(box_tag)
pattern = re.compile('<xmin>', re.IGNORECASE)
xmin_tag = pattern.findall(
str(ET.tostringlist(box_element)))[0][1:-1]
x1 = float(box_element.find(xmin_tag).text)
pattern = re.compile('<ymin>', re.IGNORECASE)
ymin_tag = pattern.findall(
str(ET.tostringlist(box_element)))[0][1:-1]
y1 = float(box_element.find(ymin_tag).text)
pattern = re.compile('<xmax>', re.IGNORECASE)
xmax_tag = pattern.findall(
str(ET.tostringlist(box_element)))[0][1:-1]
x2 = float(box_element.find(xmax_tag).text)
pattern = re.compile('<ymax>', re.IGNORECASE)
ymax_tag = pattern.findall(
str(ET.tostringlist(box_element)))[0][1:-1]
y2 = float(box_element.find(ymax_tag).text)
x1 = max(0, x1) x1 = max(0, x1)
y1 = max(0, y1) y1 = max(0, y1)
if im_w > 0.5 and im_h > 0.5: if im_w > 0.5 and im_h > 0.5:
...@@ -149,6 +187,7 @@ class VOCDetection(Dataset): ...@@ -149,6 +187,7 @@ class VOCDetection(Dataset):
'gt_class': gt_class, 'gt_class': gt_class,
'gt_bbox': gt_bbox, 'gt_bbox': gt_bbox,
'gt_score': gt_score, 'gt_score': gt_score,
'gt_poly': [],
'difficult': difficult 'difficult': difficult
} }
voc_rec = (im_info, label_info) voc_rec = (im_info, label_info)
......
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -24,11 +24,12 @@ class HRNet(DeepLabv3p): ...@@ -24,11 +24,12 @@ class HRNet(DeepLabv3p):
Args: Args:
num_classes (int): 类别数。 num_classes (int): 类别数。
width (int): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64]。 width (int|str): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64, '18_small_v1']。
'18_small_v1'是18的轻量级版本。
use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。 use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。 use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。 当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为 class_weight (list|str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重 num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1, 自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
即平时使用的交叉熵损失函数。 即平时使用的交叉熵损失函数。
...@@ -168,6 +169,6 @@ class HRNet(DeepLabv3p): ...@@ -168,6 +169,6 @@ class HRNet(DeepLabv3p):
return super(HRNet, self).train( return super(HRNet, self).train(
num_epochs, train_dataset, train_batch_size, eval_dataset, num_epochs, train_dataset, train_batch_size, eval_dataset,
save_interval_epochs, log_interval_steps, save_dir, save_interval_epochs, log_interval_steps, save_dir,
pretrain_weights, optimizer, learning_rate, lr_decay_power, use_vdl, pretrain_weights, optimizer, learning_rate, lr_decay_power,
sensitivities_file, eval_metric_loss, early_stop, use_vdl, sensitivities_file, eval_metric_loss, early_stop,
early_stop_patience, resume_checkpoint) early_stop_patience, resume_checkpoint)
...@@ -108,6 +108,7 @@ def load_model(model_dir, fixed_input_shape=None): ...@@ -108,6 +108,7 @@ def load_model(model_dir, fixed_input_shape=None):
logging.info("Model[{}] loaded.".format(info['Model'])) logging.info("Model[{}] loaded.".format(info['Model']))
model.trainable = False model.trainable = False
model.status = status
return model return model
......
...@@ -158,6 +158,7 @@ def prune_program(model, prune_params_ratios=None): ...@@ -158,6 +158,7 @@ def prune_program(model, prune_params_ratios=None):
prune_params_ratios (dict): 由裁剪参数名和裁剪率组成的字典,当为None时 prune_params_ratios (dict): 由裁剪参数名和裁剪率组成的字典,当为None时
使用默认裁剪参数名和裁剪率。默认为None。 使用默认裁剪参数名和裁剪率。默认为None。
""" """
assert model.status == 'Normal', 'Only the models saved while training are supported!'
place = model.places[0] place = model.places[0]
train_prog = model.train_prog train_prog = model.train_prog
eval_prog = model.test_prog eval_prog = model.test_prog
...@@ -235,6 +236,7 @@ def cal_params_sensitivities(model, save_file, eval_dataset, batch_size=8): ...@@ -235,6 +236,7 @@ def cal_params_sensitivities(model, save_file, eval_dataset, batch_size=8):
其中``weight_0``是卷积Kernel名;``sensitivities['weight_0']``是一个字典,key是裁剪率,value是敏感度。 其中``weight_0``是卷积Kernel名;``sensitivities['weight_0']``是一个字典,key是裁剪率,value是敏感度。
""" """
assert model.status == 'Normal', 'Only the models saved while training are supported!'
if os.path.exists(save_file): if os.path.exists(save_file):
os.remove(save_file) os.remove(save_file)
......
...@@ -19,6 +19,8 @@ import paddle.fluid as fluid ...@@ -19,6 +19,8 @@ import paddle.fluid as fluid
import paddlex import paddlex
sensitivities_data = { sensitivities_data = {
'AlexNet':
'https://bj.bcebos.com/paddlex/slim_prune/alexnet_sensitivities.data',
'ResNet18': 'ResNet18':
'https://bj.bcebos.com/paddlex/slim_prune/resnet18.sensitivities', 'https://bj.bcebos.com/paddlex/slim_prune/resnet18.sensitivities',
'ResNet34': 'ResNet34':
...@@ -41,6 +43,10 @@ sensitivities_data = { ...@@ -41,6 +43,10 @@ sensitivities_data = {
'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_large.sensitivities', 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_large.sensitivities',
'MobileNetV3_small': 'MobileNetV3_small':
'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_small.sensitivities', 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_small.sensitivities',
'MobileNetV3_large_ssld':
'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_large_ssld_sensitivities.data',
'MobileNetV3_small_ssld':
'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_small_ssld_sensitivities.data',
'DenseNet121': 'DenseNet121':
'https://bj.bcebos.com/paddlex/slim_prune/densenet121.sensitivities', 'https://bj.bcebos.com/paddlex/slim_prune/densenet121.sensitivities',
'DenseNet161': 'DenseNet161':
...@@ -51,6 +57,8 @@ sensitivities_data = { ...@@ -51,6 +57,8 @@ sensitivities_data = {
'https://bj.bcebos.com/paddlex/slim_prune/xception41.sensitivities', 'https://bj.bcebos.com/paddlex/slim_prune/xception41.sensitivities',
'Xception65': 'Xception65':
'https://bj.bcebos.com/paddlex/slim_prune/xception65.sensitivities', 'https://bj.bcebos.com/paddlex/slim_prune/xception65.sensitivities',
'ShuffleNetV2':
'https://bj.bcebos.com/paddlex/slim_prune/shufflenetv2_sensitivities.data',
'YOLOv3_MobileNetV1': 'YOLOv3_MobileNetV1':
'https://bj.bcebos.com/paddlex/slim_prune/yolov3_mobilenetv1.sensitivities', 'https://bj.bcebos.com/paddlex/slim_prune/yolov3_mobilenetv1.sensitivities',
'YOLOv3_MobileNetV3_large': 'YOLOv3_MobileNetV3_large':
...@@ -143,7 +151,8 @@ def get_prune_params(model): ...@@ -143,7 +151,8 @@ def get_prune_params(model):
if model_type.startswith('ResNet') or \ if model_type.startswith('ResNet') or \
model_type.startswith('DenseNet') or \ model_type.startswith('DenseNet') or \
model_type.startswith('DarkNet') or \ model_type.startswith('DarkNet') or \
model_type.startswith('AlexNet'): model_type.startswith('AlexNet') or \
model_type.startswith('ShuffleNetV2'):
for block in program.blocks: for block in program.blocks:
for param in block.all_parameters(): for param in block.all_parameters():
pd_var = fluid.global_scope().find_var(param.name) pd_var = fluid.global_scope().find_var(param.name)
...@@ -152,6 +161,28 @@ def get_prune_params(model): ...@@ -152,6 +161,28 @@ def get_prune_params(model):
prune_names.append(param.name) prune_names.append(param.name)
if model_type == 'AlexNet': if model_type == 'AlexNet':
prune_names.remove('conv5_weights') prune_names.remove('conv5_weights')
if model_type == 'ShuffleNetV2':
not_prune_names = ['stage_2_1_conv5_weights',
'stage_2_1_conv3_weights',
'stage_2_2_conv3_weights',
'stage_2_3_conv3_weights',
'stage_2_4_conv3_weights',
'stage_3_1_conv5_weights',
'stage_3_1_conv3_weights',
'stage_3_2_conv3_weights',
'stage_3_3_conv3_weights',
'stage_3_4_conv3_weights',
'stage_3_5_conv3_weights',
'stage_3_6_conv3_weights',
'stage_3_7_conv3_weights',
'stage_3_8_conv3_weights',
'stage_4_1_conv5_weights',
'stage_4_1_conv3_weights',
'stage_4_2_conv3_weights',
'stage_4_3_conv3_weights',
'stage_4_4_conv3_weights',]
for name in not_prune_names:
prune_names.remove(name)
elif model_type == "MobileNetV1": elif model_type == "MobileNetV1":
prune_names.append("conv1_weights") prune_names.append("conv1_weights")
for param in program.global_block().all_parameters(): for param in program.global_block().all_parameters():
......
...@@ -81,7 +81,7 @@ coco_pretrain = { ...@@ -81,7 +81,7 @@ coco_pretrain = {
'YOLOv3_MobileNetV1_COCO': 'YOLOv3_MobileNetV1_COCO':
'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar', 'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar',
'YOLOv3_MobileNetV3_large_COCO': 'YOLOv3_MobileNetV3_large_COCO':
'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v3.pdparams', 'https://bj.bcebos.com/paddlex/models/yolov3_mobilenet_v3.tar',
'YOLOv3_ResNet34_COCO': 'YOLOv3_ResNet34_COCO':
'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_r34.tar', 'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_r34.tar',
'YOLOv3_ResNet50_vd_COCO': 'YOLOv3_ResNet50_vd_COCO':
......
...@@ -51,15 +51,38 @@ class HRNet(object): ...@@ -51,15 +51,38 @@ class HRNet(object):
self.width = width self.width = width
self.has_se = has_se self.has_se = has_se
self.num_modules = {
'18_small_v1': [1, 1, 1, 1],
'18': [1, 1, 4, 3],
'30': [1, 1, 4, 3],
'32': [1, 1, 4, 3],
'40': [1, 1, 4, 3],
'44': [1, 1, 4, 3],
'48': [1, 1, 4, 3],
'60': [1, 1, 4, 3],
'64': [1, 1, 4, 3]
}
self.num_blocks = {
'18_small_v1': [[1], [2, 2], [2, 2, 2], [2, 2, 2, 2]],
'18': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
'30': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
'32': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
'40': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
'44': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
'48': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
'60': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
'64': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]]
}
self.channels = { self.channels = {
18: [[18, 36], [18, 36, 72], [18, 36, 72, 144]], '18_small_v1': [[32], [16, 32], [16, 32, 64], [16, 32, 64, 128]],
30: [[30, 60], [30, 60, 120], [30, 60, 120, 240]], '18': [[64], [18, 36], [18, 36, 72], [18, 36, 72, 144]],
32: [[32, 64], [32, 64, 128], [32, 64, 128, 256]], '30': [[64], [30, 60], [30, 60, 120], [30, 60, 120, 240]],
40: [[40, 80], [40, 80, 160], [40, 80, 160, 320]], '32': [[64], [32, 64], [32, 64, 128], [32, 64, 128, 256]],
44: [[44, 88], [44, 88, 176], [44, 88, 176, 352]], '40': [[64], [40, 80], [40, 80, 160], [40, 80, 160, 320]],
48: [[48, 96], [48, 96, 192], [48, 96, 192, 384]], '44': [[64], [44, 88], [44, 88, 176], [44, 88, 176, 352]],
60: [[60, 120], [60, 120, 240], [60, 120, 240, 480]], '48': [[64], [48, 96], [48, 96, 192], [48, 96, 192, 384]],
64: [[64, 128], [64, 128, 256], [64, 128, 256, 512]], '60': [[64], [60, 120], [60, 120, 240], [60, 120, 240, 480]],
'64': [[64], [64, 128], [64, 128, 256], [64, 128, 256, 512]],
} }
self.freeze_at = freeze_at self.freeze_at = freeze_at
...@@ -73,31 +96,38 @@ class HRNet(object): ...@@ -73,31 +96,38 @@ class HRNet(object):
def net(self, input): def net(self, input):
width = self.width width = self.width
channels_2, channels_3, channels_4 = self.channels[width] channels_1, channels_2, channels_3, channels_4 = self.channels[str(
num_modules_2, num_modules_3, num_modules_4 = 1, 4, 3 width)]
num_modules_1, num_modules_2, num_modules_3, num_modules_4 = self.num_modules[
str(width)]
num_blocks_1, num_blocks_2, num_blocks_3, num_blocks_4 = self.num_blocks[
str(width)]
x = self.conv_bn_layer( x = self.conv_bn_layer(
input=input, input=input,
filter_size=3, filter_size=3,
num_filters=64, num_filters=channels_1[0],
stride=2, stride=2,
if_act=True, if_act=True,
name='layer1_1') name='layer1_1')
x = self.conv_bn_layer( x = self.conv_bn_layer(
input=x, input=x,
filter_size=3, filter_size=3,
num_filters=64, num_filters=channels_1[0],
stride=2, stride=2,
if_act=True, if_act=True,
name='layer1_2') name='layer1_2')
la1 = self.layer1(x, name='layer2') la1 = self.layer1(x, num_blocks_1, channels_1, name='layer2')
tr1 = self.transition_layer([la1], [256], channels_2, name='tr1') tr1 = self.transition_layer([la1], [256], channels_2, name='tr1')
st2 = self.stage(tr1, num_modules_2, channels_2, name='st2') st2 = self.stage(
tr1, num_modules_2, num_blocks_2, channels_2, name='st2')
tr2 = self.transition_layer(st2, channels_2, channels_3, name='tr2') tr2 = self.transition_layer(st2, channels_2, channels_3, name='tr2')
st3 = self.stage(tr2, num_modules_3, channels_3, name='st3') st3 = self.stage(
tr2, num_modules_3, num_blocks_3, channels_3, name='st3')
tr3 = self.transition_layer(st3, channels_3, channels_4, name='tr3') tr3 = self.transition_layer(st3, channels_3, channels_4, name='tr3')
st4 = self.stage(tr3, num_modules_4, channels_4, name='st4') st4 = self.stage(
tr3, num_modules_4, num_blocks_4, channels_4, name='st4')
# classification # classification
if self.num_classes: if self.num_classes:
...@@ -139,12 +169,12 @@ class HRNet(object): ...@@ -139,12 +169,12 @@ class HRNet(object):
self.end_points = st4 self.end_points = st4
return st4[-1] return st4[-1]
def layer1(self, input, name=None): def layer1(self, input, num_blocks, channels, name=None):
conv = input conv = input
for i in range(4): for i in range(num_blocks[0]):
conv = self.bottleneck_block( conv = self.bottleneck_block(
conv, conv,
num_filters=64, num_filters=channels[0],
downsample=True if i == 0 else False, downsample=True if i == 0 else False,
name=name + '_' + str(i + 1)) name=name + '_' + str(i + 1))
return conv return conv
...@@ -178,7 +208,7 @@ class HRNet(object): ...@@ -178,7 +208,7 @@ class HRNet(object):
out = [] out = []
for i in range(len(channels)): for i in range(len(channels)):
residual = x[i] residual = x[i]
for j in range(block_num): for j in range(block_num[i]):
residual = self.basic_block( residual = self.basic_block(
residual, residual,
channels[i], channels[i],
...@@ -240,10 +270,11 @@ class HRNet(object): ...@@ -240,10 +270,11 @@ class HRNet(object):
def high_resolution_module(self, def high_resolution_module(self,
x, x,
num_blocks,
channels, channels,
multi_scale_output=True, multi_scale_output=True,
name=None): name=None):
residual = self.branches(x, 4, channels, name=name) residual = self.branches(x, num_blocks, channels, name=name)
out = self.fuse_layers( out = self.fuse_layers(
residual, residual,
channels, channels,
...@@ -254,6 +285,7 @@ class HRNet(object): ...@@ -254,6 +285,7 @@ class HRNet(object):
def stage(self, def stage(self,
x, x,
num_modules, num_modules,
num_blocks,
channels, channels,
multi_scale_output=True, multi_scale_output=True,
name=None): name=None):
...@@ -262,12 +294,13 @@ class HRNet(object): ...@@ -262,12 +294,13 @@ class HRNet(object):
if i == num_modules - 1 and multi_scale_output == False: if i == num_modules - 1 and multi_scale_output == False:
out = self.high_resolution_module( out = self.high_resolution_module(
out, out,
num_blocks,
channels, channels,
multi_scale_output=False, multi_scale_output=False,
name=name + '_' + str(i + 1)) name=name + '_' + str(i + 1))
else: else:
out = self.high_resolution_module( out = self.high_resolution_module(
out, channels, name=name + '_' + str(i + 1)) out, num_blocks, channels, name=name + '_' + str(i + 1))
return out return out
......
...@@ -82,7 +82,8 @@ class HRNet(object): ...@@ -82,7 +82,8 @@ class HRNet(object):
st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=shape) st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=shape)
out = fluid.layers.concat(st4, axis=1) out = fluid.layers.concat(st4, axis=1)
last_channels = sum(self.backbone.channels[self.backbone.width][-1]) last_channels = sum(self.backbone.channels[str(self.backbone.width)][
-1])
out = self._conv_bn_layer( out = self._conv_bn_layer(
input=out, input=out,
......
...@@ -70,8 +70,8 @@ class Compose(ClsTransform): ...@@ -70,8 +70,8 @@ class Compose(ClsTransform):
if isinstance(im, np.ndarray): if isinstance(im, np.ndarray):
if len(im.shape) != 3: if len(im.shape) != 3:
raise Exception( raise Exception(
"im should be 3-dimension, but now is {}-dimensions". "im should be 3-dimension, but now is {}-dimensions".format(
format(len(im.shape))) len(im.shape)))
else: else:
try: try:
im = cv2.imread(im).astype('float32') im = cv2.imread(im).astype('float32')
...@@ -100,7 +100,9 @@ class Compose(ClsTransform): ...@@ -100,7 +100,9 @@ class Compose(ClsTransform):
transform_names = [type(x).__name__ for x in self.transforms] transform_names = [type(x).__name__ for x in self.transforms]
for aug in augmenters: for aug in augmenters:
if type(aug).__name__ in transform_names: if type(aug).__name__ in transform_names:
logging.error("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__)) logging.error(
"{} is already in ComposedTransforms, need to remove it from add_augmenters().".
format(type(aug).__name__))
self.transforms = augmenters + self.transforms self.transforms = augmenters + self.transforms
...@@ -139,8 +141,8 @@ class RandomCrop(ClsTransform): ...@@ -139,8 +141,8 @@ class RandomCrop(ClsTransform):
tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据; tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
""" """
im = random_crop(im, self.crop_size, self.lower_scale, im = random_crop(im, self.crop_size, self.lower_scale, self.lower_ratio,
self.lower_ratio, self.upper_ratio) self.upper_ratio)
if label is None: if label is None:
return (im, ) return (im, )
else: else:
...@@ -270,14 +272,12 @@ class ResizeByShort(ClsTransform): ...@@ -270,14 +272,12 @@ class ResizeByShort(ClsTransform):
im_short_size = min(im.shape[0], im.shape[1]) im_short_size = min(im.shape[0], im.shape[1])
im_long_size = max(im.shape[0], im.shape[1]) im_long_size = max(im.shape[0], im.shape[1])
scale = float(self.short_size) / im_short_size scale = float(self.short_size) / im_short_size
if self.max_size > 0 and np.round(scale * if self.max_size > 0 and np.round(scale * im_long_size) > self.max_size:
im_long_size) > self.max_size:
scale = float(self.max_size) / float(im_long_size) scale = float(self.max_size) / float(im_long_size)
resized_width = int(round(im.shape[1] * scale)) resized_width = int(round(im.shape[1] * scale))
resized_height = int(round(im.shape[0] * scale)) resized_height = int(round(im.shape[0] * scale))
im = cv2.resize( im = cv2.resize(
im, (resized_width, resized_height), im, (resized_width, resized_height), interpolation=cv2.INTER_LINEAR)
interpolation=cv2.INTER_LINEAR)
if label is None: if label is None:
return (im, ) return (im, )
...@@ -490,13 +490,15 @@ class ComposedClsTransforms(Compose): ...@@ -490,13 +490,15 @@ class ComposedClsTransforms(Compose):
crop_size(int|list): 输入模型里的图像大小 crop_size(int|list): 输入模型里的图像大小
mean(list): 图像均值 mean(list): 图像均值
std(list): 图像方差 std(list): 图像方差
random_horizontal_flip(bool): 是否以0.5的概率使用随机水平翻转增强,该仅在mode为`train`时生效,默认为True
""" """
def __init__(self, def __init__(self,
mode, mode,
crop_size=[224, 224], crop_size=[224, 224],
mean=[0.485, 0.456, 0.406], mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]): std=[0.229, 0.224, 0.225],
random_horizontal_flip=True):
width = crop_size width = crop_size
if isinstance(crop_size, list): if isinstance(crop_size, list):
if crop_size[0] != crop_size[1]: if crop_size[0] != crop_size[1]:
...@@ -512,10 +514,11 @@ class ComposedClsTransforms(Compose): ...@@ -512,10 +514,11 @@ class ComposedClsTransforms(Compose):
if mode == 'train': if mode == 'train':
# 训练时的transforms,包含数据增强 # 训练时的transforms,包含数据增强
transforms = [ transforms = [
RandomCrop(crop_size=width), RandomHorizontalFlip(prob=0.5), RandomCrop(crop_size=width), Normalize(
Normalize(
mean=mean, std=std) mean=mean, std=std)
] ]
if random_horizontal_flip:
transforms.insert(0, RandomHorizontalFlip())
else: else:
# 验证/预测时的transforms # 验证/预测时的transforms
transforms = [ transforms = [
......
...@@ -160,7 +160,9 @@ class Compose(DetTransform): ...@@ -160,7 +160,9 @@ class Compose(DetTransform):
transform_names = [type(x).__name__ for x in self.transforms] transform_names = [type(x).__name__ for x in self.transforms]
for aug in augmenters: for aug in augmenters:
if type(aug).__name__ in transform_names: if type(aug).__name__ in transform_names:
logging.error("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__)) logging.error(
"{} is already in ComposedTransforms, need to remove it from add_augmenters().".
format(type(aug).__name__))
self.transforms = augmenters + self.transforms self.transforms = augmenters + self.transforms
...@@ -220,15 +222,13 @@ class ResizeByShort(DetTransform): ...@@ -220,15 +222,13 @@ class ResizeByShort(DetTransform):
im_short_size = min(im.shape[0], im.shape[1]) im_short_size = min(im.shape[0], im.shape[1])
im_long_size = max(im.shape[0], im.shape[1]) im_long_size = max(im.shape[0], im.shape[1])
scale = float(self.short_size) / im_short_size scale = float(self.short_size) / im_short_size
if self.max_size > 0 and np.round(scale * if self.max_size > 0 and np.round(scale * im_long_size) > self.max_size:
im_long_size) > self.max_size:
scale = float(self.max_size) / float(im_long_size) scale = float(self.max_size) / float(im_long_size)
resized_width = int(round(im.shape[1] * scale)) resized_width = int(round(im.shape[1] * scale))
resized_height = int(round(im.shape[0] * scale)) resized_height = int(round(im.shape[0] * scale))
im_resize_info = [resized_height, resized_width, scale] im_resize_info = [resized_height, resized_width, scale]
im = cv2.resize( im = cv2.resize(
im, (resized_width, resized_height), im, (resized_width, resized_height), interpolation=cv2.INTER_LINEAR)
interpolation=cv2.INTER_LINEAR)
im_info['im_resize_info'] = np.array(im_resize_info).astype(np.float32) im_info['im_resize_info'] = np.array(im_resize_info).astype(np.float32)
if label_info is None: if label_info is None:
return (im, im_info) return (im, im_info)
...@@ -268,8 +268,7 @@ class Padding(DetTransform): ...@@ -268,8 +268,7 @@ class Padding(DetTransform):
if not isinstance(target_size, tuple) and not isinstance( if not isinstance(target_size, tuple) and not isinstance(
target_size, list): target_size, list):
raise TypeError( raise TypeError(
"Padding: Type of target_size must in (int|list|tuple)." "Padding: Type of target_size must in (int|list|tuple).")
)
elif len(target_size) != 2: elif len(target_size) != 2:
raise ValueError( raise ValueError(
"Padding: Length of target_size must equal 2.") "Padding: Length of target_size must equal 2.")
...@@ -454,8 +453,7 @@ class RandomHorizontalFlip(DetTransform): ...@@ -454,8 +453,7 @@ class RandomHorizontalFlip(DetTransform):
ValueError: 数据长度不匹配。 ValueError: 数据长度不匹配。
""" """
if not isinstance(im, np.ndarray): if not isinstance(im, np.ndarray):
raise TypeError( raise TypeError("RandomHorizontalFlip: image is not a numpy array.")
"RandomHorizontalFlip: image is not a numpy array.")
if len(im.shape) != 3: if len(im.shape) != 3:
raise ValueError( raise ValueError(
"RandomHorizontalFlip: image is not 3-dimensional.") "RandomHorizontalFlip: image is not 3-dimensional.")
...@@ -736,7 +734,7 @@ class MixupImage(DetTransform): ...@@ -736,7 +734,7 @@ class MixupImage(DetTransform):
gt_poly2 = im_info['mixup'][2]['gt_poly'] gt_poly2 = im_info['mixup'][2]['gt_poly']
is_crowd1 = label_info['is_crowd'] is_crowd1 = label_info['is_crowd']
is_crowd2 = im_info['mixup'][2]['is_crowd'] is_crowd2 = im_info['mixup'][2]['is_crowd']
if 0 not in gt_class1 and 0 not in gt_class2: if 0 not in gt_class1 and 0 not in gt_class2:
gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0) gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0)
gt_class = np.concatenate((gt_class1, gt_class2), axis=0) gt_class = np.concatenate((gt_class1, gt_class2), axis=0)
...@@ -785,9 +783,7 @@ class RandomExpand(DetTransform): ...@@ -785,9 +783,7 @@ class RandomExpand(DetTransform):
fill_value (list): 扩张图像的初始填充值(0-255)。默认为[123.675, 116.28, 103.53]。 fill_value (list): 扩张图像的初始填充值(0-255)。默认为[123.675, 116.28, 103.53]。
""" """
def __init__(self, def __init__(self, ratio=4., prob=0.5,
ratio=4.,
prob=0.5,
fill_value=[123.675, 116.28, 103.53]): fill_value=[123.675, 116.28, 103.53]):
super(RandomExpand, self).__init__() super(RandomExpand, self).__init__()
assert ratio > 1.01, "expand ratio must be larger than 1.01" assert ratio > 1.01, "expand ratio must be larger than 1.01"
...@@ -1281,21 +1277,25 @@ class ComposedRCNNTransforms(Compose): ...@@ -1281,21 +1277,25 @@ class ComposedRCNNTransforms(Compose):
min_max_size(list): 图像在缩放时,最小边和最大边的约束条件 min_max_size(list): 图像在缩放时,最小边和最大边的约束条件
mean(list): 图像均值 mean(list): 图像均值
std(list): 图像方差 std(list): 图像方差
random_horizontal_flip(bool): 是否以0.5的概率使用随机水平翻转增强,该仅在mode为`train`时生效,默认为True
""" """
def __init__(self, def __init__(self,
mode, mode,
min_max_size=[800, 1333], min_max_size=[800, 1333],
mean=[0.485, 0.456, 0.406], mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]): std=[0.229, 0.224, 0.225],
random_horizontal_flip=True):
if mode == 'train': if mode == 'train':
# 训练时的transforms,包含数据增强 # 训练时的transforms,包含数据增强
transforms = [ transforms = [
RandomHorizontalFlip(prob=0.5), Normalize( Normalize(
mean=mean, std=std), ResizeByShort( mean=mean, std=std), ResizeByShort(
short_size=min_max_size[0], max_size=min_max_size[1]), short_size=min_max_size[0], max_size=min_max_size[1]),
Padding(coarsest_stride=32) Padding(coarsest_stride=32)
] ]
if random_horizontal_flip:
transforms.insert(0, RandomHorizontalFlip())
else: else:
# 验证/预测时的transforms # 验证/预测时的transforms
transforms = [ transforms = [
...@@ -1325,9 +1325,14 @@ class ComposedYOLOv3Transforms(Compose): ...@@ -1325,9 +1325,14 @@ class ComposedYOLOv3Transforms(Compose):
Args: Args:
mode(str): 图像处理流程所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' mode(str): 图像处理流程所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test'
shape(list): 输入模型中图像的大小,输入模型的图像会被Resize成此大小 shape(list): 输入模型中图像的大小,输入模型的图像会被Resize成此大小
mixup_epoch(int): 模型训练过程中,前mixup_epoch会使用mixup策略 mixup_epoch(int): 模型训练过程中,前mixup_epoch会使用mixup策略, 若设为-1,则表示不使用该策略
mean(list): 图像均值 mean(list): 图像均值
std(list): 图像方差 std(list): 图像方差
random_distort(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机扰动图像,默认为True
random_expand(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机扩张图像,默认为True
random_crop(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机裁剪图像,默认为True
random_horizontal_flip(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机水平翻转图像,默认为True
""" """
def __init__(self, def __init__(self,
...@@ -1335,7 +1340,11 @@ class ComposedYOLOv3Transforms(Compose): ...@@ -1335,7 +1340,11 @@ class ComposedYOLOv3Transforms(Compose):
shape=[608, 608], shape=[608, 608],
mixup_epoch=250, mixup_epoch=250,
mean=[0.485, 0.456, 0.406], mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]): std=[0.229, 0.224, 0.225],
random_distort=True,
random_expand=True,
random_crop=True,
random_horizontal_flip=True):
width = shape width = shape
if isinstance(shape, list): if isinstance(shape, list):
if shape[0] != shape[1]: if shape[0] != shape[1]:
...@@ -1350,12 +1359,18 @@ class ComposedYOLOv3Transforms(Compose): ...@@ -1350,12 +1359,18 @@ class ComposedYOLOv3Transforms(Compose):
if mode == 'train': if mode == 'train':
# 训练时的transforms,包含数据增强 # 训练时的transforms,包含数据增强
transforms = [ transforms = [
MixupImage(mixup_epoch=mixup_epoch), RandomDistort(), MixupImage(mixup_epoch=mixup_epoch), Resize(
RandomExpand(), RandomCrop(), Resize( target_size=width, interp='RANDOM'), Normalize(
target_size=width,
interp='RANDOM'), RandomHorizontalFlip(), Normalize(
mean=mean, std=std) mean=mean, std=std)
] ]
if random_horizontal_flip:
transforms.insert(1, RandomHorizontalFlip())
if random_crop:
transforms.insert(1, RandomCrop())
if random_expand:
transforms.insert(1, RandomExpand())
if random_distort:
transforms.insert(1, RandomDistort())
else: else:
# 验证/预测时的transforms # 验证/预测时的transforms
transforms = [ transforms = [
......
...@@ -116,7 +116,9 @@ class Compose(SegTransform): ...@@ -116,7 +116,9 @@ class Compose(SegTransform):
transform_names = [type(x).__name__ for x in self.transforms] transform_names = [type(x).__name__ for x in self.transforms]
for aug in augmenters: for aug in augmenters:
if type(aug).__name__ in transform_names: if type(aug).__name__ in transform_names:
logging.error("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__)) logging.error(
"{} is already in ComposedTransforms, need to remove it from add_augmenters().".
format(type(aug).__name__))
self.transforms = augmenters + self.transforms self.transforms = augmenters + self.transforms
...@@ -401,8 +403,7 @@ class ResizeByShort(SegTransform): ...@@ -401,8 +403,7 @@ class ResizeByShort(SegTransform):
im_short_size = min(im.shape[0], im.shape[1]) im_short_size = min(im.shape[0], im.shape[1])
im_long_size = max(im.shape[0], im.shape[1]) im_long_size = max(im.shape[0], im.shape[1])
scale = float(self.short_size) / im_short_size scale = float(self.short_size) / im_short_size
if self.max_size > 0 and np.round(scale * if self.max_size > 0 and np.round(scale * im_long_size) > self.max_size:
im_long_size) > self.max_size:
scale = float(self.max_size) / float(im_long_size) scale = float(self.max_size) / float(im_long_size)
resized_width = int(round(im.shape[1] * scale)) resized_width = int(round(im.shape[1] * scale))
resized_height = int(round(im.shape[0] * scale)) resized_height = int(round(im.shape[0] * scale))
...@@ -1113,25 +1114,35 @@ class ComposedSegTransforms(Compose): ...@@ -1113,25 +1114,35 @@ class ComposedSegTransforms(Compose):
Args: Args:
mode(str): 图像处理所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' mode(str): 图像处理所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test'
train_crop_size(list): 模型训练阶段,随机从原图crop的大小 min_max_size(list): 训练过程中,图像的最长边会随机resize至此区间(短边按比例相应resize);预测阶段,图像最长边会resize至此区间中间值,即(min_size+max_size)/2。默认为[400, 600]
train_crop_size(list): 仅在mode为'train`时生效,训练过程中,随机从图像中裁剪出对应大小的子图(如若原图小于此大小,则会padding到此大小),默认为[400, 600]
mean(list): 图像均值 mean(list): 图像均值
std(list): 图像方差 std(list): 图像方差
random_horizontal_flip(bool): 数据增强方式,仅在mode为`train`时生效,表示训练过程是否随机水平翻转图像,默认为True
""" """
def __init__(self, def __init__(self,
mode, mode,
train_crop_size=[769, 769], min_max_size=[400, 600],
train_crop_size=[512, 512],
mean=[0.5, 0.5, 0.5], mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]): std=[0.5, 0.5, 0.5],
random_horizontal_flip=True):
if mode == 'train': if mode == 'train':
# 训练时的transforms,包含数据增强 # 训练时的transforms,包含数据增强
transforms = [ transforms = [
RandomHorizontalFlip(prob=0.5), ResizeStepScaling(), ResizeRangeScaling(
min_value=min(min_max_size), max_value=max(min_max_size)),
RandomPaddingCrop(crop_size=train_crop_size), Normalize( RandomPaddingCrop(crop_size=train_crop_size), Normalize(
mean=mean, std=std) mean=mean, std=std)
] ]
if random_horizontal_flip:
transforms.insert(0, RandomHorizontalFlip())
else: else:
# 验证/预测时的transforms # 验证/预测时的transforms
transforms = [Normalize(mean=mean, std=std)] long_size = (min(min_max_size) + max(min_max_size)) // 2
transforms = [
ResizeByLong(long_size=long_size), Normalize(
mean=mean, std=std)
]
super(ComposedSegTransforms, self).__init__(transforms) super(ComposedSegTransforms, self).__init__(transforms)
...@@ -70,8 +70,10 @@ def normlime(img_file, ...@@ -70,8 +70,10 @@ def normlime(img_file,
normlime_weights_file=None): normlime_weights_file=None):
"""使用NormLIME算法将模型预测结果的可解释性可视化。 """使用NormLIME算法将模型预测结果的可解释性可视化。
NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会提前计算一定数量的测 NormLIME是利用一定数量的样本来出一个全局的解释。由于NormLIME计算量较大,此处采用一种简化的方式:
试样本的LIME结果,然后对相同的特征进行权重的归一化,这样来得到一个全局的输入和输出的关系。 使用一定数量的测试样本(目前默认使用所有测试样本),对每个样本进行特征提取,映射到同一个特征空间;
然后以此特征做为输入,以模型输出做为输出,使用线性回归对其进行拟合,得到一个全局的输入和输出的关系。
之后,对一测试样本进行解释时,使用NormLIME全局的解释,来对LIME的结果进行滤波,使最终的可视化结果更加稳定。
注意1:dataset读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。 注意1:dataset读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。
注意2:NormLIME可解释性结果可视化目前只支持分类模型。 注意2:NormLIME可解释性结果可视化目前只支持分类模型。
......
...@@ -15,11 +15,11 @@ ...@@ -15,11 +15,11 @@
import setuptools import setuptools
import sys import sys
long_description = "PaddleX. A end-to-end deeplearning model development toolkit base on PaddlePaddle\n\n" long_description = "PaddlePaddle Entire Process Development Toolkit"
setuptools.setup( setuptools.setup(
name="paddlex", name="paddlex",
version='1.0.6', version='1.0.7',
author="paddlex", author="paddlex",
author_email="paddlex@baidu.com", author_email="paddlex@baidu.com",
description=long_description, description=long_description,
......
...@@ -35,7 +35,7 @@ eval_dataset = pdx.datasets.SegDataset( ...@@ -35,7 +35,7 @@ eval_dataset = pdx.datasets.SegDataset(
# 浏览器打开 https://0.0.0.0:8001即可 # 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP # 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#hrnet # https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#fastscnn
num_classes = len(train_dataset.labels) num_classes = len(train_dataset.labels)
model = pdx.seg.FastSCNN(num_classes=num_classes) model = pdx.seg.FastSCNN(num_classes=num_classes)
model.train( model.train(
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册