diff --git a/README.md b/README.md index 4105c32ad8ec0462b16ea365937bf348bc45b903..644769e6dce6a7133e1532379ce36a8a78f569f4 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,7 @@ [![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleX.svg)](https://github.com/PaddlePaddle/PaddleX/releases) ![python version](https://img.shields.io/badge/python-3.6+-orange.svg) ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg) +![QQGroup](https://img.shields.io/badge/QQ_Group-1045148026-52B6EF?style=social&logo=tencent-qq&logoColor=000&logoWidth=20) PaddleX是基于飞桨核心框架、开发套件和工具组件的深度学习全流程开发工具。具备**全流程打通**、**融合产业实践**、**易用易集成**三大特点。 diff --git a/docs/FAQ.md b/docs/FAQ.md index b120ebd10ed791c65c3f65e611c5b45da2a9211f..e25faab5ad9e230f34f1790db0dcf24fba3328e6 100755 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -13,7 +13,7 @@ > 可以使用模型裁剪,参考文档[模型裁剪使用教程](slim/prune.md),通过调整裁剪参数,可以控制模型裁剪后的大小,在实际实验中,如VOC检测数据,使用yolov3-mobilenet,原模型大小为XXM,裁剪后为XX M,精度基本保持不变 ## 4. 如何配置训练时GPU的卡数 -> 通过在终端export环境变量,或在Python代码中设置,可参考文档[CPU/多卡GPU训练](gpu_configure.md) +> 通过在终端export环境变量,或在Python代码中设置,可参考文档[CPU/多卡GPU训练](appendix/gpu_configure.md) ## 5. 想将之前训练的模型参数上继续训练 > 在训练调用`train`接口时,将`pretrain_weights`设为之前的模型保存路径即可 @@ -52,7 +52,7 @@ > 1. 用户自行训练时,如不确定迭代的轮数,可以将轮数设高一些,同时注意设置`save_interval_epochs`,这样模型迭代每间隔相应轮数就会在验证集上进行评估和保存,可以根据不同轮数模型在验证集上的评估指标,判断模型是否已经收敛,若模型已收敛,可以自行结束训练进程 > ## 9. 只有CPU,没有GPU,如何提升训练速度 -> 当没有GPU时,可以根据自己的CPU配置,选择是否使用多CPU进行训练,具体配置方式可以参考文档[多卡CPU/GPU训练](gpu_configure.md) +> 当没有GPU时,可以根据自己的CPU配置,选择是否使用多CPU进行训练,具体配置方式可以参考文档[多卡CPU/GPU训练](appendix/gpu_configure.md) > ## 10. 电脑不能联网,训练时因为下载预训练模型失败,如何解决 > 可以预先通过其它方式准备好预训练模型,然后训练时自定义`pretrain_weights`即可,可参考文档[无联网模型训练](how_to_offline_run.md) @@ -61,8 +61,8 @@ > 1.可以按照9的方式来解决这个问题 > 2.每次训练前都设定`paddlex.pretrain_dir`路径,如设定`paddlex.pretrain_dir='/usrname/paddlex`,如此下载完的预训练模型会存放至`/usrname/paddlex`目录下,而已经下载在该目录的模型也不会再次重复下载 -## 12. 程序启动时提示"Failed to execute script PaddleX",如何解决? +## 12. PaddleX GUI启动时提示"Failed to execute script PaddleX",如何解决? > 1. 请检查目标机器上PaddleX程序所在路径是否包含中文。目前暂不支持中文路径,请尝试将程序移动到英文目录。 > 2. 如果您的系统是Windows 7或者Windows Server 2012时,原因是缺少MFPlat.DLL/MF.dll/MFReadWrite.dll等OpenCV依赖的DLL,请按如下方式安装桌面体验:通过“我的电脑”-->“属性”-->"管理"打开服务器管理器,点击右上角“管理”选择“添加角色和功能”。点击“服务器选择”-->“功能”,拖动滚动条到最下端,点开“用户界面和基础结构”,勾选“桌面体验”后点击“安装”,等安装完成尝试再次运行PaddleX。 > 3. 请检查目标机器上是否有其他的PaddleX程序或者进程在运行中,如有请退出或者重启机器看是否解决 -> 4. 请确认运行程序的用户是否有管理员权限,如非管理员权限用户请尝试使用管理员运行看是否成功 \ No newline at end of file +> 4. 请确认运行程序的用户是否有管理员权限,如非管理员权限用户请尝试使用管理员运行看是否成功 diff --git a/docs/apis/models/classification.md b/docs/apis/models/classification.md index 82b459d8281b1e9bc9d1f7abdd48fddb16473c21..b70b555a7007b77851af22ddd4a775a4b3a8f93b 100755 --- a/docs/apis/models/classification.md +++ b/docs/apis/models/classification.md @@ -80,7 +80,7 @@ predict(self, img_file, transforms=None, topk=5) ## 其它分类器类 -PaddleX提供了共计22种分类器,所有分类器均提供同`ResNet50`相同的训练`train`,评估`evaluate`和预测`predict`接口,各模型效果可参考[模型库](../appendix/model_zoo.md)。 +PaddleX提供了共计22种分类器,所有分类器均提供同`ResNet50`相同的训练`train`,评估`evaluate`和预测`predict`接口,各模型效果可参考[模型库](https://paddlex.readthedocs.io/zh_CN/latest/appendix/model_zoo.html)。 ### ResNet18 ```python diff --git a/docs/apis/models/semantic_segmentation.md b/docs/apis/models/semantic_segmentation.md index 26a695a9564f6929ff586eaa179242b99b5466de..3ff66337fe64b35f29a2a7985cea040fcb233d82 100755 --- a/docs/apis/models/semantic_segmentation.md +++ b/docs/apis/models/semantic_segmentation.md @@ -186,10 +186,10 @@ paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=Fal > **参数** > > - **num_classes** (int): 类别数。 -> > - **width** (int): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64]。 +> > - **width** (int|str): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64, '18_small_v1']。'18_small_v1'是18的轻量级版本。 > > - **use_bce_loss** (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。 > > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。 -> > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。 +> > - **class_weight** (list|str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。 > > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。 ### train 训练接口 diff --git a/docs/apis/transforms/seg_transforms.md b/docs/apis/transforms/seg_transforms.md index 1fb2b561e4818edad72fd97f43029de079b355b3..264af5c472cb824865188a5386a513e5a00fe0ba 100755 --- a/docs/apis/transforms/seg_transforms.md +++ b/docs/apis/transforms/seg_transforms.md @@ -200,7 +200,7 @@ ComposedSegTransforms.add_augmenters(augmenters) import paddlex as pdx from paddlex.seg import transforms train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[512, 512]) -eval_transforms = transforms.ComposedYOLOTransforms(mode='eval') +eval_transforms = transforms.ComposedSegTransforms(mode='eval') # 添加数据增强 import imgaug.augmenters as iaa diff --git a/docs/apis/visualize.md b/docs/apis/visualize.md index 069913274580f1e8bd5fdb5ee6e6e642c977b3ce..8fe45d4abb82c01c859f0be60bf6f52706eb4e52 100755 --- a/docs/apis/visualize.md +++ b/docs/apis/visualize.md @@ -146,10 +146,11 @@ paddlex.interpret.normlime(img_file, dataset=None, num_samples=3000, batch_size=50, - save_dir='./') + save_dir='./', + normlime_weights_file=None) ``` 使用NormLIME算法将模型预测结果的可解释性可视化。 -NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会提前计算一定数量的测试样本的LIME结果,然后对相同的特征进行权重的归一化,这样来得到一个全局的输入和输出的关系。 +NormLIME是利用一定数量的样本来出一个全局的解释。由于NormLIME计算量较大,此处采用一种简化的方式:使用一定数量的测试样本(目前默认使用所有测试样本),对每个样本进行特征提取,映射到同一个特征空间;然后以此特征做为输入,以模型输出做为输出,使用线性回归对其进行拟合,得到一个全局的输入和输出的关系。之后,对一测试样本进行解释时,使用NormLIME全局的解释,来对LIME的结果进行滤波,使最终的可视化结果更加稳定。 **注意:** 可解释性结果可视化目前只支持分类模型。 @@ -159,9 +160,10 @@ NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会 >* **dataset** (paddlex.datasets): 数据集读取器,默认为None。 >* **num_samples** (int): LIME用于学习线性模型的采样数,默认为3000。 >* **batch_size** (int): 预测数据batch大小,默认为50。 ->* **save_dir** (str): 可解释性可视化结果(保存为png格式文件)和中间文件存储路径。 +>* **save_dir** (str): 可解释性可视化结果(保存为png格式文件)和中间文件存储路径。 +>* **normlime_weights_file** (str): NormLIME初始化文件名,若不存在,则计算一次,保存于该路径;若存在,则直接载入。 -**注意:** dataset`读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。 +**注意:** dataset`读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。NormLIME可解释性结果可视化目前只支持分类模型。 ### 使用示例 > 对预测可解释性结果可视化的过程可参见[代码](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/interpret/normlime.py)。 diff --git a/docs/appendix/index.rst b/docs/appendix/index.rst index c402384ebc307713ed87055dc86cab58dcf33bbe..814a611948a451a76d73fd0aa9276f40db2c28b9 100755 --- a/docs/appendix/index.rst +++ b/docs/appendix/index.rst @@ -7,6 +7,7 @@ :caption: 目录: model_zoo.md + slim_model_zoo.md metrics.md interpret.md parameters.md diff --git a/docs/appendix/interpret.md b/docs/appendix/interpret.md index 886620df2fa98c03abda4717dea627277715b2d9..43ecd48e23810c2e3ed3cd1652bf06b6e1fc04f7 100644 --- a/docs/appendix/interpret.md +++ b/docs/appendix/interpret.md @@ -20,9 +20,20 @@ LIME的使用方式可参见[代码示例](https://github.com/PaddlePaddle/Paddl ## NormLIME NormLIME是在LIME上的改进,LIME的解释是局部性的,是针对当前样本给的特定解释,而NormLIME是利用一定数量的样本对当前样本的一个全局性的解释,有一定的降噪效果。其实现步骤如下所示: 1. 下载Kmeans模型参数和ResNet50_vc网络前三层参数。(ResNet50_vc的参数是在ImageNet上训练所得网络的参数;使用ImageNet图像作为数据集,每张图像从ResNet50_vc的第三层输出提取对应超象素位置上的平均特征和质心上的特征,训练将得到此处的Kmeans模型) -2. 计算测试集中每张图像的LIME结果。(如无测试集,可用验证集代替) -3. 使用Kmeans模型对所有图像中的所有像素进行聚类。 -4. 对在同一个簇的超像素(相同的特征)进行权重的归一化,得到每个超像素的权重,以此来解释模型。 +2. 使用测试集中的数据计算normlime的权重信息(如无测试集,可用验证集代替): + 对每张图像的处理: + (1) 获取图像的超像素。 + (2) 使用ResNet50_vc获取第三层特征,针对每个超像素位置,组合质心特征和均值特征`F`。 + (3) 把`F`作为Kmeans模型的输入,计算每个超像素位置的聚类中心。 + (4) 使用训练好的分类模型,预测该张图像的`label`。 + 对所有图像的处理: + (1) 以每张图像的聚类中心信息组成的向量(若某聚类中心出现在盖章途中设置为1,反之为0)为输入, + 预测的`label`为输出,构建逻辑回归函数`regression_func`。 + (2) 由`regression_func`可获得每个聚类中心不同类别下的权重,并对权重进行归一化。 +3. 使用Kmeans模型获取需要可视化图像的每个超像素的聚类中心。 +4. 对需要可视化的图像的超像素进行随机遮掩构成新的图像。 +5. 对每张构造的图像使用预测模型预测label。 +6. 根据normlime的权重信息,每个超像素可获不同的权重,选取最高的权重为最终的权重,以此来解释模型。 NormLIME的使用方式可参见[代码示例](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/interpret/normlime.py)和[api介绍](../apis/visualize.html#normlime)。在使用时,参数中的`num_samples`设置尤为重要,其表示上述步骤2中的随机采样的个数,若设置过小会影响可解释性结果的稳定性,若设置过大则将在上述步骤3耗费较长时间;参数`batch_size`则表示在计算上述步骤3时,预测的batch size,若设置过小将在上述步骤3耗费较长时间,而上限则根据机器配置决定;而`dataset`则是由测试集或验证集构造的数据。 diff --git a/docs/appendix/model_zoo.md b/docs/appendix/model_zoo.md index 200847bc95aec5872879c3fbbe49b6f2ed0c741e..f866b39173ead1c162e9e3ee722ae2ea2cb2afb3 100644 --- a/docs/appendix/model_zoo.md +++ b/docs/appendix/model_zoo.md @@ -40,8 +40,8 @@ |[FasterRCNN-ResNet101](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_1x.tar)| 212.5MB | 582.911 | 38.3 | |[FasterRCNN-ResNet50-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_fpn_1x.tar)| 167.7MB | 83.189 | 37.2 | |[FasterRCNN-ResNet50_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_2x.tar)|167.8MB | 128.277 | 38.9 | -|[FasterRCNN-ResNet101-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_fpn_1x.tar)| 244.2MB | 156.097 | 38.7 | -|[FasterRCNN-ResNet101_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_2x.tar) |244.3MB | 119.788 | 40.5 | +|[FasterRCNN-ResNet101-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_fpn_1x.tar)| 244.2MB | 119.788 | 38.7 | +|[FasterRCNN-ResNet101_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_2x.tar) |244.3MB | 156.097 | 40.5 | |[FasterRCNN-HRNet_W18-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) |115.5MB | 81.592 | 36 | |[YOLOv3-DarkNet53](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar)|249.2MB | 42.672 | 38.9 | |[YOLOv3-MobileNetV1](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar) |99.2MB | 15.442 | 29.3 | diff --git a/docs/appendix/slim_model_zoo.md b/docs/appendix/slim_model_zoo.md new file mode 100644 index 0000000000000000000000000000000000000000..a594d53dd7a777288571ccae6fad5ec21415de36 --- /dev/null +++ b/docs/appendix/slim_model_zoo.md @@ -0,0 +1,121 @@ +# PaddleX压缩模型库 + +## 图像分类 + +数据集:ImageNet-1000 + +### 量化 + +| 模型 | 压缩策略 | Top-1准确率 | 存储体积 | TensorRT时延(V100, ms) | +|:--:|:---:|:--:|:--:|:--:| +|MobileNetV1| 无 |70.99%| 17MB | -| +|MobileNetV1| 量化 |70.18% (-0.81%)| 4.4MB | - | +| MobileNetV2 | 无 |72.15%| 15MB | - | +| MobileNetV2 | 量化 | 71.15% (-1%)| 4.0MB | - | +|ResNet50| 无 |76.50%| 99MB | 2.71 | +|ResNet50| 量化 |76.33% (-0.17%)| 25.1MB | 1.19 | + +分类模型Lite时延(ms) + +| 设备 | 模型类型 | 压缩策略 | armv7 Thread 1 | armv7 Thread 2 | armv7 Thread 4 | armv8 Thread 1 | armv8 Thread 2 | armv8 Thread 4 | +| ------- | ----------- | ------------- | -------------- | -------------- | -------------- | -------------- | -------------- | -------------- | +| 高通835 | MobileNetV1 | 无 | 96.1942 | 53.2058 | 32.4468 | 88.4955 | 47.95 | 27.5189 | +| 高通835 | MobileNetV1 | 量化 | 60.5615 | 32.4016 | 16.6596 | 56.5266 | 29.7178 | 15.1459 | +| 高通835 | MobileNetV2 | 无 | 65.715 | 38.1346 | 25.155 | 61.3593 | 36.2038 | 22.849 | +| 高通835 | MobileNetV2 | 量化 | 48.3495 | 30.3069 | 22.1506 | 45.8715 | 27.4105 | 18.2223 | +| 高通835 | ResNet50 | 无 | 526.811 | 319.6486 | 205.8345 | 506.1138 | 335.1584 | 214.8936 | +| 高通835 | ResNet50 | 量化 | 476.0507 | 256.5963 | 139.7266 | 461.9176 | 248.3795 | 149.353 | +| 高通855 | MobileNetV1 | 无 | 33.5086 | 19.5773 | 11.7534 | 31.3474 | 18.5382 | 10.0811 | +| 高通855 | MobileNetV1 | 量化 | 37.0498 | 21.7081 | 11.0779 | 14.0947 | 8.1926 | 4.2934 | +| 高通855 | MobileNetV2 | 无 | 25.0396 | 15.2862 | 9.6609 | 22.909 | 14.1797 | 8.8325 | +| 高通855 | MobileNetV2 | 量化 | 28.1631 | 18.3917 | 11.8333 | 16.9399 | 11.1772 | 7.4176 | +| 高通855 | ResNet50 | 无 | 185.3705 | 113.0825 | 87.0741 | 177.7367 | 110.0433 | 74.4114 | +| 高通855 | ResNet50 | 量化 | 328.2683 | 201.9937 | 106.744 | 242.6397 | 150.0338 | 79.8659 | +| 麒麟970 | MobileNetV1 | 无 | 101.2455 | 56.4053 | 35.6484 | 94.8985 | 51.7251 | 31.9511 | +| 麒麟970 | MobileNetV1 | 量化 | 62.4412 | 32.2585 | 16.6215 | 57.825 | 29.2573 | 15.1206 | +| 麒麟970 | MobileNetV2 | 无 | 70.4176 | 42.0795 | 25.1939 | 68.9597 | 39.2145 | 22.6617 | +| 麒麟970 | MobileNetV2 | 量化 | 53.0961 | 31.7987 | 21.8334 | 49.383 | 28.2358 | 18.3642 | +| 麒麟970 | ResNet50 | 无 | 586.8943 | 344.0858 | 228.2293 | 573.3344 | 351.4332 | 225.8006 | +| 麒麟970 | ResNet50 | 量化 | 489.6188 | 258.3279 | 142.6063 | 480.0064 | 249.5339 | 138.5284 | + +### 剪裁 + +PaddleLite推理耗时说明: + +环境:Qualcomm SnapDragon 845 + armv8 + +速度指标:Thread1/Thread2/Thread4耗时 + + +| 模型 | 压缩策略 | Top-1 | 存储体积 |PaddleLite推理耗时|TensorRT推理速度(FPS)| +|:--:|:---:|:--:|:--:|:--:|:--:| +| MobileNetV1 | 无 | 70.99% | 17MB | 66.052\35.8014\19.5762|-| +| MobileNetV1 | 剪裁 -30% | 70.4% (-0.59%) | 12MB | 46.5958\25.3098\13.6982|-| +| MobileNetV1 | 剪裁 -50% | 69.8% (-1.19%) | 9MB | 37.9892\20.7882\11.3144|-| + +## 目标检测 + +### 量化 + +数据集: COCO2017 + +| 模型 | 压缩策略 | 数据集 | Image/GPU | 输入608 Box AP | 存储体积 | TensorRT时延(V100, ms) | +| :----------------------------: | :---------: | :----: | :-------: | :------------: | :------------: | :----------: | +| MobileNet-V1-YOLOv3 | 无 | COCO | 8 | 29.3 | 95MB | - | +| MobileNet-V1-YOLOv3 | 量化 | COCO | 8 | 27.9 (-1.4)| 25MB | - | +| R34-YOLOv3 | 无 | COCO | 8 | 36.2 | 162MB | - | +| R34-YOLOv3 | 量化 | COCO | 8 | 35.7 (-0.5) | 42.7MB | - | + +### 剪裁 + +数据集:Pasacl VOC & COCO2017 + +PaddleLite推理耗时说明: + +环境:Qualcomm SnapDragon 845 + armv8 + +速度指标:Thread1/Thread2/Thread4耗时 + +| 模型 | 压缩策略 | 数据集 | Image/GPU | 输入608 Box mmAP | 存储体积 | PaddleLite推理耗时(ms)(608*608) | TensorRT推理速度(FPS)(608*608) | +| :----------------------------: | :---------------: | :--------: | :-------: | :------------: | :----------: | :--------------: | :--------------: | +| MobileNet-V1-YOLOv3 | 无 | Pascal VOC | 8 | 76.2 | 94MB | 1238\796.943\520.101|60.04| +| MobileNet-V1-YOLOv3 | 剪裁 -52.88% | Pascal VOC | 8 | 77.6 (+1.4) | 31MB | 602.497\353.759\222.427 |99.36| +| MobileNet-V1-YOLOv3 | 无 | COCO | 8 | 29.3 | 95MB |-|-| +| MobileNet-V1-YOLOv3 | 剪裁 -51.77% | COCO | 8 | 26.0 (-3.3) | 32MB |-|73.93| + +## 语义分割 + +数据集:Cityscapes + + +### 量化 + +| 模型 | 压缩策略 | mIoU | 存储体积 | +| :--------------------: | :---------: | :-----------: | :------------: | +| DeepLabv3-MobileNetv2 | 无 | 69.81 | 7.4MB | +| DeepLabv3-MobileNetv2 | 量化 | 67.59 (-2.22) | 2.1MB | + +图像分割模型Lite时延(ms), 输入尺寸769 x 769 + +| 设备 | 模型类型 | 压缩策略 | armv7 Thread 1 | armv7 Thread 2 | armv7 Thread 4 | armv8 Thread 1 | armv8 Thread 2 | armv8 Thread 4 | +| ------- | ---------------------- | ------------- | -------------- | -------------- | -------------- | -------------- | -------------- | -------------- | +| 高通835 | Deeplabv3-MobileNetV2 | 无 | 1282.8126 | 793.2064 | 653.6538 | 1193.9908 | 737.1827 | 593.4522 | +| 高通835 | Deeplabv3-MobileNetV2 | 量化 | 981.44 | 658.4969 | 538.6166 | 885.3273 | 586.1284 | 484.0018 | +| 高通855 | Deeplabv3-MobileNetV2 | 无 | 639.4425 | 390.1851 | 322.7014 | 477.7667 | 339.7411 | 262.2847 | +| 高通855 | Deeplabv3-MobileNetV2 | 量化 | 705.7589 | 474.4076 | 427.2951 | 394.8352 | 297.4035 | 264.6724 | +| 麒麟970 | Deeplabv3-MobileNetV2 | 无 | 1771.1301 | 1746.0569 | 1222.4805 | 1448.9739 | 1192.4491 | 760.606 | +| 麒麟970 | Deeplabv3-MobileNetV2 | 量化 | 1320.386 | 918.5328 | 672.2481 | 1020.753 | 820.094 | 591.4114 | + +### 剪裁 + +PaddleLite推理耗时说明: + +环境:Qualcomm SnapDragon 845 + armv8 + +速度指标:Thread1/Thread2/Thread4耗时 + + +| 模型 | 压缩方法 | mIoU | 存储体积 | PaddleLite推理耗时 | TensorRT推理速度(FPS) | +| :-------: | :---------------: | :-----------: | :------: | :------------: | :----: | +| FastSCNN | 无 | 69.64 | 11MB | 1226.36\682.96\415.664 |39.53| +| FastSCNN | 剪裁 -47.60% | 66.68 (-2.96) | 5.7MB | 866.693\494.467\291.748 |51.48| diff --git a/docs/cv_solutions.md b/docs/cv_solutions.md index cb96c2d9e71ac6e98ee036364b8700ec9656411a..4d8482da94423ba5cc4f0695bf3f9669ef5f732a 100755 --- a/docs/cv_solutions.md +++ b/docs/cv_solutions.md @@ -1,63 +1,132 @@ # PaddleX视觉方案介绍 -PaddleX目前提供了4种视觉任务解决方案,分别为图像分类、目标检测、实例分割和语义分割。用户可以根据自己的任务类型按需选取。 +PaddleX针对图像分类、目标检测、实例分割和语义分割4种视觉任务提供了包含模型选择、压缩策略选择、部署方案选择在内的解决方案。用户根据自己的需求选择合适的模型,选择合适的压缩策略来减小模型的计算量和存储体积、加速模型预测推理,最后选择合适的部署方案将模型部署在移动端或者服务器端。 -## 图像分类 +## 模型选择 + +### 图像分类 图像分类任务指的是输入一张图片,模型预测图片的类别,如识别为风景、动物、车等。 ![](./images/image_classification.png) -对于图像分类任务,针对不同的应用场景,PaddleX提供了百度改进的模型,见下表所示 +对于图像分类任务,针对不同的应用场景,PaddleX提供了百度改进的模型,见下表所示: +> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。 +> 表中CPU预测速度 (测试CPU型号为)。 +> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。 +> 测速时模型输入大小为224 x 224,Top1准确率为ImageNet-1000数据集上评估所得。 -| 模型 | 模型大小 | GPU预测速度 | CPU预测速度 | ARM芯片预测速度 | 准确率 | 备注 | -| :--------- | :------ | :---------- | :-----------| :------------- | :----- | :--- | -| MobileNetV3_small_ssld | 12M | - | - | - | 71.3% |适用于移动端场景 | -| MobileNetV3_large_ssld | 21M | - | - | - | 79.0% | 适用于移动端/服务端场景 | -| ResNet50_vd_ssld | 102.8MB | - | - | - | 82.4% | 适用于服务端场景 | -| ResNet101_vd_ssld | 179.2MB | - | - | - |83.7% | 适用于服务端场景 | +| 模型 | 模型特点 | 存储体积 | GPU预测速度(毫秒) | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| Top1准确率 | +| :--------- | :------ | :---------- | :-----------| :------------- | :------------- |:--- | +| MobileNetV3_small_ssld | 轻量高速,适用于追求高速的实时移动端场景 | 12.5MB | 7.08837 | - | 6.546 | 71.3.0% | +| ShuffleNetV2 | 轻量级模型,精度相对偏低,适用于要求更小存储体积的实时移动端场景 | 10.2MB | 15.40 | - | 10.941 | 68.8% | +| MobileNetV3_large_ssld | 轻量级模型,在存储方面优势不大,在速度和精度上表现适中,适合于移动端场景 | 22.8MB | 8.06651 | - | 19.803 | 79.0% | +| MobileNetV2 | 轻量级模型,适用于使用GPU预测的移动端场景 | 15.0MB | 5.92667 | - | 23.318| 72.2 % | +| ResNet50_vd_ssld | 高精度模型,预测时间较短,适用于大多数的服务器端场景 | 103.5MB | 7.79264 | - | - | 82.4% | +| ResNet101_vd_ssld | 超高精度模型,预测时间相对较长,适用于有大数据量时的服务器端场景 | 180.5MB | 13.34580 | - | -| 83.7% | +| Xception65 | 超高精度模型,预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 161.6MB | 13.87017 | - | - | 80.3% | -除上述模型外,PaddleX还支持近20种图像分类模型,模型列表可参考[PaddleX模型库](../appendix/model_zoo.md) +包括上述模型,PaddleX支持近20种图像分类模型,其余模型可参考[PaddleX模型库](../appendix/model_zoo.md) -## 目标检测 +### 目标检测 目标检测任务指的是输入图像,模型识别出图像中物体的位置(用矩形框框出来,并给出框的位置),和物体的类别,如在手机等零件质检中,用于检测外观上的瑕疵等。 ![](./images/object_detection.png) 对于目标检测,针对不同的应用场景,PaddleX提供了主流的YOLOv3模型和Faster-RCNN模型,见下表所示 - -| 模型 | 模型大小 | GPU预测速度 | CPU预测速度 |ARM芯片预测速度 | BoxMAP | 备注 | -| :------- | :------- | :--------- | :---------- | :------------- | :----- | :--- | -| YOLOv3-MobileNetV1 | 101.2M | - | - | - | 29.3 | | -| YOLOv3-MobileNetV3 | 94.6M | - | - | - | 31.6 | | -| YOLOv3-ResNet34 | 169.7M | - | - | - | 36.2 | | -| YOLOv3-DarkNet53 | 252.4 | - | - | - | 38.9 | | - -除YOLOv3模型外,PaddleX同时也支持FasterRCNN模型,支持FPN结构和5种backbone网络,详情可参考[PaddleX模型库](../appendix/model_zoo.md) - -## 实例分割 +> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。 +> 表中CPU预测速度 (测试CPU型号为)。 +> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。 +> 测速时YOLOv3的输入大小为608 x 608,FasterRCNN的输入大小为800 x 1333,Box mmAP为COCO2017数据集上评估所得。 + +| 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| Box mmAP | +| :------- | :------- | :--------- | :---------- | :------------- | :------------- |:--- | +| YOLOv3-MobileNetV3_larget | 适用于追求高速预测的移动端场景 | 100.7MB | 143.322 | - | - | 31.6 | +| YOLOv3-MobileNetV1 | 精度相对偏低,适用于追求高速预测的服务器端场景 | 99.2MB| 15.422 | - | - | 29.3 | +| YOLOv3-DarkNet53 | 在预测速度和模型精度上都有较好的表现,适用于大多数的服务器端场景| 249.2MB | 42.672 | - | - | 38.9 | +| FasterRCNN-ResNet50-FPN | 经典的二阶段检测器,预测速度相对较慢,适用于重视模型精度的服务器端场景 | 167.MB | 83.189 | - | -| 37.2 | +| FasterRCNN-HRNet_W18-FPN | 适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景 | 115.5MB | 81.592 | - | - | 36 | +| FasterRCNN-ResNet101_vd-FPN | 超高精度模型,预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 244.3MB | 156.097 | - | - | 40.5 | + +除上述模型外,YOLOv3和Faster RCNN还支持其他backbone,详情可参考[PaddleX模型库](../appendix/model_zoo.md) + +### 实例分割 在目标检测中,模型识别出图像中物体的位置和物体的类别。而实例分割则是在目标检测的基础上,做了像素级的分类,将框内的属于目标物体的像素识别出来。 ![](./images/instance_segmentation.png) PaddleX目前提供了实例分割MaskRCNN模型,支持5种不同的backbone网络,详情可参考[PaddleX模型库](../appendix/model_zoo.md) - -| 模型 | 模型大小 | GPU预测速度 | CPU预测速度 | ARM芯片预测速度 | BoxMAP | SegMAP | 备注 | -| :---- | :------- | :---------- | :---------- | :------------- | :----- | :----- | :--- | -| MaskRCNN-ResNet50_vd-FPN | 185.5M | - | - | - | 39.8 | 35.4 | | -| MaskRCNN-ResNet101_vd-FPN | 268.6M | - | - | - | 41.4 | 36.8 | | - - -## 语义分割 +> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。 +> 表中CPU预测速度 (测试CPU型号为)。 +> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。 +> 测速时MaskRCNN的输入大小为800 x 1333,Box mmAP和Seg mmAP为COCO2017数据集上评估所得。 + +| 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| Box mmAP | Seg mmAP | +| :---- | :------- | :---------- | :---------- | :----- | :----- | :--- |:--- | +| MaskRCNN-HRNet_W18-FPN | 适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景 | - | - | - | - | 37.0 | 33.4 | +| MaskRCNN-ResNet50-FPN | 精度较高,适合大多数的服务器端场景| 185.5M | - | - | - | 37.9 | 34.2 | +| MaskRCNN-ResNet101_vd-FPN | 高精度但预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 268.6M | - | - | - | 41.4 | 36.8 | + +### 语义分割 语义分割用于对图像做像素级的分类,应用在人像分类、遥感图像识别等场景。 ![](./images/semantic_segmentation.png) 对于语义分割,PaddleX也针对不同的应用场景,提供了不同的模型选择,如下表所示 +> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。 +> 表中CPU预测速度 (测试CPU型号为)。 +> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。 +> 测速时模型的输入大小为1024 x 2048,mIOU为Cityscapes数据集上评估所得。 + +| 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| mIOU | +| :---- | :------- | :---------- | :---------- | :----- | :----- |:--- | +| DeepLabv3p-MobileNetV2_x1.0 | 轻量级模型,适用于移动端场景| - | - | - | 69.8% | +| HRNet_W18_Small_v1 | 轻量高速,适用于移动端场景 | - | - | - | - | +| FastSCNN | 轻量高速,适用于追求高速预测的移动端或服务器端场景 | - | - | - | 69.64 | +| HRNet_W18 | 高精度模型,适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景| - | - | - | 79.36 | +| DeepLabv3p-Xception65 | 高精度但预测时间更长,在处理较大数据量时有较高的精度,适用于服务器且背景复杂的场景| - | - | - | 79.3% | + +## 压缩策略选择 + +PaddleX提供包含模型剪裁、定点量化的模型压缩策略来减小模型的计算量和存储体积,加快模型部署后的预测速度。使用不同压缩策略在图像分类、目标检测和语义分割模型上的模型精度和预测速度详见以下内容,用户可以选择根据自己的需求选择合适的压缩策略,进一步优化模型的性能。 + +| 压缩策略 | 策略特点 | +| :---- | :------- | +| 量化 | 较为显著地减少模型的存储体积,适用于移动端或服务期端TensorRT部署,在移动端对于MobileNet系列模型有明显的加速效果 | +| 剪裁 | 能够去除冗余的参数,达到显著减少参数计算量和模型体积的效果,提升模型的预测性能,适用于CPU部署或移动端部署(GPU上无明显加速效果) | +| 先剪裁后量化 | 可以进一步提升模型的预测性能,适用于移动端或服务器端TensorRT部署 | + +### 性能对比 + +* 表中各指标的格式为XXX/YYY,XXX表示未采取压缩策略时的指标,YYY表示压缩后的指标 +* 分类模型的准确率指的是ImageNet-1000数据集上的Top1准确率(模型输入大小为224x224),检测模型的准确率指的是COCO2017数据集上的mmAP(模型输入大小为608x608),分割模型的准确率指的是Cityscapes数据集上mIOU(模型输入大小为769x769) +* 量化策略中,PaddleLiter推理环境为Qualcomm SnapDragon 855 + armv8,速度指标为Thread4耗时 +* 剪裁策略中,PaddleLiter推理环境为Qualcomm SnapDragon 845 + armv8,速度指标为Thread4耗时 + + +| 模型 | 压缩策略 | 存储体积(MB) | 准确率(%) | PaddleLite推理耗时(ms) | +| :--: | :------: | :------: | :----: | :----------------: | +| MobileNetV1 | 量化 | 17/4.4 | 70.99/70.18 | 10.0811/4.2934 | +| MobileNetV1 | 剪裁 -30% | 17/12 | 70.99/70.4 | 19.5762/13.6982 | +| YOLOv3-MobileNetV1 | 量化 | 95/25 | 29.3/27.9 | - | +| YOLOv3-MobileNetV1 | 剪裁 -51.77% | 95/25 | 29.3/26 | - | +| Deeplabv3-MobileNetV2 | 量化 | 7.4/1.8 | 63.26/62.03 | 593.4522/484.0018 | +| FastSCNN | 剪裁 -47.60% | 11/5.7 | 69.64/66.68 | 415.664/291.748 | + +更多模型在不同设备上压缩前后的指标对比详见[PaddleX压缩模型库](appendix/slim_model_zoo.md) + +压缩策略的具体使用流程详见[模型压缩](tutorials/compress) + +**注意:PaddleX中全部图像分类模型和语义分割模型都支持量化和剪裁操作,目标检测仅有YOLOv3支持量化和剪裁操作。** + +## 模型部署 + +PaddleX提供服务器端python部署、服务器端c++部署、服务器端加密部署、OpenVINO部署、移动端部署共5种部署方案,用户可以根据自己的需求选择合适的部署方案,点击以下链接了解部署的具体流程。 -| 模型 | 模型大小 | GPU预测速度 | CPU预测速度 | ARM芯片预测速度 | mIOU | 备注 | -| :---- | :------- | :---------- | :---------- | :------------- | :----- | :----- | -| DeepLabv3p-MobileNetV2_x0.25 | | - | - | - | - | - | -| DeepLabv3p-MobileNetV2_x1.0 | | - | - | - | - | - | -| DeepLabv3p-Xception65 | | - | - | - | - | - | -| UNet | | - | - | - | - | - | +| 部署方案 | 部署流程 | +| :------: | :------: | +| 服务器端python部署 | [部署流程](tutorials/deploy/deploy_server/deploy_python.html)| +| 服务器端c++部署 | [部署流程](tutorials/deploy/deploy_server/deploy_cpp/) | +| 服务器端加密部署 | [部署流程](tutorials/deploy/deploy_server/encryption.html) | +| OpenVINO部署 | [部署流程](tutorials/deploy/deploy_openvino.html) | +| 移动端部署 | [部署流程](tutorials/deploy/deploy_lite.html) | diff --git a/docs/images/lime.png b/docs/images/lime.png index de435a2e2375a788319f0d80a4cce7a21d395e41..801be69b57c80ad92dcc0ca69bf1a0a4de074b0f 100644 Binary files a/docs/images/lime.png and b/docs/images/lime.png differ diff --git a/docs/images/normlime.png b/docs/images/normlime.png index 4e5099347f261d3f5ce47b93d28cfa484c1d3776..dd9a2f8f96a3ade26179010f340c7c5185bf0656 100644 Binary files a/docs/images/normlime.png and b/docs/images/normlime.png differ diff --git a/docs/tutorials/deploy/deploy_lite.md b/docs/tutorials/deploy/deploy_lite.md index 5419aed636545b95e9f98fdd45109592b7a6d9d6..fd757933dcd201cf5c45b9a58013ee8078248ba0 100644 --- a/docs/tutorials/deploy/deploy_lite.md +++ b/docs/tutorials/deploy/deploy_lite.md @@ -21,7 +21,7 @@ step 2: 将PaddleX模型导出为inference模型 step 3: 将inference模型转换成PaddleLite模型 ``` -python /path/to/PaddleX/deploy/lite/export_lite.py --model_dir /path/to/inference_model --save_file /path/to/onnx_model --place place/to/run +python /path/to/PaddleX/deploy/lite/export_lite.py --model_dir /path/to/inference_model --save_file /path/to/lite_model --place place/to/run ``` diff --git a/docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_linux.md b/docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_linux.md index 9deceffd7cc499048f0cea89ef4918f48c4e9fc1..dada892cc0ea706941d0a9966bd52e657fff0d56 100755 --- a/docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_linux.md +++ b/docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_linux.md @@ -30,7 +30,7 @@ PaddlePaddle C++ 预测库针对不同的`CPU`,`CUDA`,以及是否支持Tens | ubuntu14.04_cuda10.0_cudnn7_avx_mkl | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.8.2-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz ) | | ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6 | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.8.2-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz) | -更多和更新的版本,请根据实际情况下载: [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/windows_cpp_inference.html#id1) +更多和更新的版本,请根据实际情况下载: [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html) 下载并解压后`/root/projects/fluid_inference`目录包含内容为: ``` @@ -42,7 +42,7 @@ fluid_inference └── version.txt # 版本和编译信息 ``` -**注意:** 预编译版本除`nv-jetson-cuda10-cudnn7.5-trt5` 以外其它包都是基于`GCC 4.8.5`编译,使用高版本`GCC`可能存在 `ABI`兼容性问题,建议降级或[自行编译预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12)。 +**注意:** 预编译版本除`nv-jetson-cuda10-cudnn7.5-trt5` 以外其它包都是基于`GCC 4.8.5`编译,使用高版本`GCC`可能存在 `ABI`兼容性问题,建议降级或[自行编译预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12)。 ### Step4: 编译 diff --git a/docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_win_vs2019.md b/docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_win_vs2019.md index a1b659cb65db1d6774e1797732054dceef590711..7f6afb08ce43c25fc62d0aac60a42d2e2f2df9db 100755 --- a/docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_win_vs2019.md +++ b/docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_win_vs2019.md @@ -31,6 +31,7 @@ PaddlePaddle C++ 预测库针对不同的`CPU`,`CUDA`,以及是否支持Tens | 版本说明 | 预测库(1.8.2版本) | 编译器 | 构建工具| cuDNN | CUDA | ---- | ---- | ---- | ---- | ---- | ---- | + | cpu_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.2/win-infer/mkl/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3 | CMake v3.16.0 | | cpu_avx_openblas | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.2/win-infer/open/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3 | CMake v3.16.0 | | cuda9.0_cudnn7_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.2/win-infer/mkl/post97/fluid_inference_install_dir.zip) | MSVC 2015 update 3 | CMake v3.16.0 | 7.4.1 | 9.0 | diff --git a/examples/human_segmentation/README.md b/examples/human_segmentation/README.md new file mode 100644 index 0000000000000000000000000000000000000000..18d1f22f3b48979602028e13d1045b63991794b8 --- /dev/null +++ b/examples/human_segmentation/README.md @@ -0,0 +1,181 @@ +# HumanSeg人像分割模型 + +本教程基于PaddleX核心分割网络,提供针对人像分割场景从预训练模型、Fine-tune、视频分割预测部署的全流程应用指南。 + +## 安装 + +**前置依赖** +* paddlepaddle >= 1.8.0 +* python >= 3.5 + +``` +pip install paddlex -i https://mirror.baidu.com/pypi/simple +``` +安装的相关问题参考[PaddleX安装](https://paddlex.readthedocs.io/zh_CN/latest/install.html) + +## 预训练模型 +HumanSeg开放了在大规模人像数据上训练的两个预训练模型,满足多种使用场景的需求 + +| 模型类型 | Checkpoint Parameter | Inference Model | Quant Inference Model | 备注 | +| --- | --- | --- | ---| --- | +| HumanSeg-server | [humanseg_server_params](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_server.pdparams) | [humanseg_server_inference](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_server_inference.zip) | -- | 高精度模型,适用于服务端GPU且背景复杂的人像场景, 模型结构为Deeplabv3+/Xcetion65, 输入大小(512, 512) | +| HumanSeg-mobile | [humanseg_mobile_params](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_mobile.pdparams) | [humanseg_mobile_inference](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_mobile_inference.zip) | [humanseg_mobile_quant](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_mobile_quant.zip) | 轻量级模型, 适用于移动端或服务端CPU的前置摄像头场景,模型结构为HRNet_w18_samll_v1,输入大小(192, 192) | + + +模型性能 + +| 模型 | 模型大小 | 计算耗时 | +| --- | --- | --- | +|humanseg_server_inference| 158M | - | +|humanseg_mobile_inference | 5.8 M | 42.35ms | +|humanseg_mobile_quant | 1.6M | 24.93ms | + +计算耗时运行环境: 小米,cpu:骁龙855, 内存:6GB, 图片大小:192*192 + + +**NOTE:** +其中Checkpoint Parameter为模型权重,用于Fine-tuning场景。 + +* Inference Model和Quant Inference Model为预测部署模型,包含`__model__`计算图结构、`__params__`模型参数和`model.yaml`基础的模型配置信息。 + +* 其中Inference Model适用于服务端的CPU和GPU预测部署,Qunat Inference Model为量化版本,适用于通过Paddle Lite进行移动端等端侧设备部署。 + +执行以下脚本进行HumanSeg预训练模型的下载 +```bash +python pretrain_weights/download_pretrain_weights.py +``` + +## 下载测试数据 +我们提供了[supervise.ly](https://supervise.ly/)发布人像分割数据集**Supervisely Persons**, 从中随机抽取一小部分并转化成PaddleX可直接加载数据格式。通过运行以下代码进行快速下载,其中包含手机前置摄像头的人像测试视频`video_test.mp4`. + +```bash +python data/download_data.py +``` + +## 快速体验视频流人像分割 +结合DIS(Dense Inverse Search-basedmethod)光流算法预测结果与分割结果,改善视频流人像分割 +```bash +# 通过电脑摄像头进行实时分割处理 +python video_infer.py --model_dir pretrain_weights/humanseg_mobile_inference + +# 对人像视频进行分割处理 +python video_infer.py --model_dir pretrain_weights/humanseg_mobile_inference --video_path data/video_test.mp4 +``` + +视频分割结果如下: + + + +根据所选背景进行背景替换,背景可以是一张图片,也可以是一段视频。 +```bash +# 通过电脑摄像头进行实时背景替换处理, 也可通过'--background_video_path'传入背景视频 +python bg_replace.py --model_dir pretrain_weights/humanseg_mobile_inference --background_image_path data/background.jpg + +# 对人像视频进行背景替换处理, 也可通过'--background_video_path'传入背景视频 +python bg_replace.py --model_dir pretrain_weights/humanseg_mobile_inference --video_path data/video_test.mp4 --background_image_path data/background.jpg + +# 对单张图像进行背景替换 +python bg_replace.py --model_dir pretrain_weights/humanseg_mobile_inference --image_path data/human_image.jpg --background_image_path data/background.jpg + +``` + +背景替换结果如下: + + + + +**NOTE**: + +视频分割处理时间需要几分钟,请耐心等待。 + +提供的模型适用于手机摄像头竖屏拍摄场景,宽屏效果会略差一些。 + +## 训练 +使用下述命令基于与训练模型进行Fine-tuning,请确保选用的模型结构`model_type`与模型参数`pretrain_weights`匹配。 +```bash +# 指定GPU卡号(以0号卡为例) +export CUDA_VISIBLE_DEVICES=0 +# 若不使用GPU,则将CUDA_VISIBLE_DEVICES指定为空 +# export CUDA_VISIBLE_DEVICES= +python train.py --model_type HumanSegMobile \ +--save_dir output/ \ +--data_dir data/mini_supervisely \ +--train_list data/mini_supervisely/train.txt \ +--val_list data/mini_supervisely/val.txt \ +--pretrain_weights pretrain_weights/humanseg_mobile_params \ +--batch_size 8 \ +--learning_rate 0.001 \ +--num_epochs 10 \ +--image_shape 192 192 +``` +其中参数含义如下: +* `--model_type`: 模型类型,可选项为:HumanSegServer和HumanSegMobile +* `--save_dir`: 模型保存路径 +* `--data_dir`: 数据集路径 +* `--train_list`: 训练集列表路径 +* `--val_list`: 验证集列表路径 +* `--pretrain_weights`: 预训练模型路径 +* `--batch_size`: 批大小 +* `--learning_rate`: 初始学习率 +* `--num_epochs`: 训练轮数 +* `--image_shape`: 网络输入图像大小(w, h) + +更多命令行帮助可运行下述命令进行查看: +```bash +python train.py --help +``` +**NOTE** +可通过更换`--model_type`变量与对应的`--pretrain_weights`使用不同的模型快速尝试。 + +## 评估 +使用下述命令进行评估 +```bash +python eval.py --model_dir output/best_model \ +--data_dir data/mini_supervisely \ +--val_list data/mini_supervisely/val.txt \ +--image_shape 192 192 +``` +其中参数含义如下: +* `--model_dir`: 模型路径 +* `--data_dir`: 数据集路径 +* `--val_list`: 验证集列表路径 +* `--image_shape`: 网络输入图像大小(w, h) + +## 预测 +使用下述命令进行预测, 预测结果默认保存在`./output/result/`文件夹中。 +```bash +python infer.py --model_dir output/best_model \ +--data_dir data/mini_supervisely \ +--test_list data/mini_supervisely/test.txt \ +--save_dir output/result \ +--image_shape 192 192 +``` +其中参数含义如下: +* `--model_dir`: 模型路径 +* `--data_dir`: 数据集路径 +* `--test_list`: 测试集列表路径 +* `--image_shape`: 网络输入图像大小(w, h) + +## 模型导出 +```bash +paddlex --export_inference --model_dir output/best_model \ +--save_dir output/export +``` +其中参数含义如下: +* `--model_dir`: 模型路径 +* `--save_dir`: 导出模型保存路径 + +## 离线量化 +```bash +python quant_offline.py --model_dir output/best_model \ +--data_dir data/mini_supervisely \ +--quant_list data/mini_supervisely/val.txt \ +--save_dir output/quant_offline \ +--image_shape 192 192 +``` +其中参数含义如下: +* `--model_dir`: 待量化模型路径 +* `--data_dir`: 数据集路径 +* `--quant_list`: 量化数据集列表路径,一般直接选择训练集或验证集 +* `--save_dir`: 量化模型保存路径 +* `--image_shape`: 网络输入图像大小(w, h) diff --git a/examples/human_segmentation/bg_replace.py b/examples/human_segmentation/bg_replace.py new file mode 100644 index 0000000000000000000000000000000000000000..e0c1cc4261f0c946aaf07c11b5c4f6d1c21f6dca --- /dev/null +++ b/examples/human_segmentation/bg_replace.py @@ -0,0 +1,314 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os +import os.path as osp +import cv2 +import numpy as np + +from postprocess import postprocess, threshold_mask +import paddlex as pdx +import paddlex.utils.logging as logging +from paddlex.seg import transforms + + +def parse_args(): + parser = argparse.ArgumentParser( + description='HumanSeg inference for video') + parser.add_argument( + '--model_dir', + dest='model_dir', + help='Model path for inference', + type=str) + parser.add_argument( + '--image_path', + dest='image_path', + help='Image including human', + type=str, + default=None) + parser.add_argument( + '--background_image_path', + dest='background_image_path', + help='Background image for replacing', + type=str, + default=None) + parser.add_argument( + '--video_path', + dest='video_path', + help='Video path for inference', + type=str, + default=None) + parser.add_argument( + '--background_video_path', + dest='background_video_path', + help='Background video path for replacing', + type=str, + default=None) + parser.add_argument( + '--save_dir', + dest='save_dir', + help='The directory for saving the inference results', + type=str, + default='./output') + parser.add_argument( + "--image_shape", + dest="image_shape", + help="The image shape for net inputs.", + nargs=2, + default=[192, 192], + type=int) + + return parser.parse_args() + + +def bg_replace(label_map, img, bg): + h, w, _ = img.shape + bg = cv2.resize(bg, (w, h)) + label_map = np.repeat(label_map[:, :, np.newaxis], 3, axis=2) + comb = (label_map * img + (1 - label_map) * bg).astype(np.uint8) + return comb + + +def recover(img, im_info): + if im_info[0] == 'resize': + w, h = im_info[1][1], im_info[1][0] + img = cv2.resize(img, (w, h), cv2.INTER_LINEAR) + elif im_info[0] == 'padding': + w, h = im_info[1][0], im_info[1][0] + img = img[0:h, 0:w, :] + return img + + +def infer(args): + resize_h = args.image_shape[1] + resize_w = args.image_shape[0] + + test_transforms = transforms.Compose([transforms.Normalize()]) + model = pdx.load_model(args.model_dir) + + if not osp.exists(args.save_dir): + os.makedirs(args.save_dir) + + # 图像背景替换 + if args.image_path is not None: + if not osp.exists(args.image_path): + raise Exception('The --image_path is not existed: {}'.format( + args.image_path)) + if args.background_image_path is None: + raise Exception( + 'The --background_image_path is not set. Please set it') + else: + if not osp.exists(args.background_image_path): + raise Exception( + 'The --background_image_path is not existed: {}'.format( + args.background_image_path)) + + img = cv2.imread(args.image_path) + im_shape = img.shape + im_scale_x = float(resize_w) / float(im_shape[1]) + im_scale_y = float(resize_h) / float(im_shape[0]) + im = cv2.resize( + img, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=cv2.INTER_LINEAR) + image = im.astype('float32') + im_info = ('resize', im_shape[0:2]) + pred = model.predict(image, test_transforms) + label_map = pred['label_map'] + label_map = recover(label_map, im_info) + bg = cv2.imread(args.background_image_path) + save_name = osp.basename(args.image_path) + save_path = osp.join(args.save_dir, save_name) + result = bg_replace(label_map, img, bg) + cv2.imwrite(save_path, result) + + # 视频背景替换,如果提供背景视频则以背景视频作为背景,否则采用提供的背景图片 + else: + is_video_bg = False + if args.background_video_path is not None: + if not osp.exists(args.background_video_path): + raise Exception( + 'The --background_video_path is not existed: {}'.format( + args.background_video_path)) + is_video_bg = True + elif args.background_image_path is not None: + if not osp.exists(args.background_image_path): + raise Exception( + 'The --background_image_path is not existed: {}'.format( + args.background_image_path)) + else: + raise Exception( + 'Please offer backgound image or video. You should set --backbground_iamge_paht or --background_video_path' + ) + + disflow = cv2.DISOpticalFlow_create( + cv2.DISOPTICAL_FLOW_PRESET_ULTRAFAST) + prev_gray = np.zeros((resize_h, resize_w), np.uint8) + prev_cfd = np.zeros((resize_h, resize_w), np.float32) + is_init = True + if args.video_path is not None: + logging.info('Please wait. It is computing......') + if not osp.exists(args.video_path): + raise Exception('The --video_path is not existed: {}'.format( + args.video_path)) + + cap_video = cv2.VideoCapture(args.video_path) + fps = cap_video.get(cv2.CAP_PROP_FPS) + width = int(cap_video.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(cap_video.get(cv2.CAP_PROP_FRAME_HEIGHT)) + save_name = osp.basename(args.video_path) + save_name = save_name.split('.')[0] + save_path = osp.join(args.save_dir, save_name + '.avi') + + cap_out = cv2.VideoWriter( + save_path, + cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps, + (width, height)) + + if is_video_bg: + cap_bg = cv2.VideoCapture(args.background_video_path) + frames_bg = cap_bg.get(cv2.CAP_PROP_FRAME_COUNT) + current_frame_bg = 1 + else: + img_bg = cv2.imread(args.background_image_path) + while cap_video.isOpened(): + ret, frame = cap_video.read() + if ret: + im_shape = frame.shape + im_scale_x = float(resize_w) / float(im_shape[1]) + im_scale_y = float(resize_h) / float(im_shape[0]) + im = cv2.resize( + frame, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=cv2.INTER_LINEAR) + image = im.astype('float32') + im_info = ('resize', im_shape[0:2]) + pred = model.predict(image, test_transforms) + score_map = pred['score_map'] + cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY) + cur_gray = cv2.resize(cur_gray, (resize_w, resize_h)) + score_map = 255 * score_map[:, :, 1] + optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \ + disflow, is_init) + prev_gray = cur_gray.copy() + prev_cfd = optflow_map.copy() + is_init = False + optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0) + optflow_map = threshold_mask( + optflow_map, thresh_bg=0.2, thresh_fg=0.8) + score_map = recover(optflow_map, im_info) + + #循环读取背景帧 + if is_video_bg: + ret_bg, frame_bg = cap_bg.read() + if ret_bg: + if current_frame_bg == frames_bg: + current_frame_bg = 1 + cap_bg.set(cv2.CAP_PROP_POS_FRAMES, 0) + else: + break + current_frame_bg += 1 + comb = bg_replace(score_map, frame, frame_bg) + else: + comb = bg_replace(score_map, frame, img_bg) + + cap_out.write(comb) + else: + break + + if is_video_bg: + cap_bg.release() + cap_video.release() + cap_out.release() + + # 当没有输入预测图像和视频的时候,则打开摄像头 + else: + cap_video = cv2.VideoCapture(0) + if not cap_video.isOpened(): + raise IOError("Error opening video stream or file, " + "--video_path whether existing: {}" + " or camera whether working".format( + args.video_path)) + return + + if is_video_bg: + cap_bg = cv2.VideoCapture(args.background_video_path) + frames_bg = cap_bg.get(cv2.CAP_PROP_FRAME_COUNT) + current_frame_bg = 1 + else: + img_bg = cv2.imread(args.background_image_path) + while cap_video.isOpened(): + ret, frame = cap_video.read() + if ret: + im_shape = frame.shape + im_scale_x = float(resize_w) / float(im_shape[1]) + im_scale_y = float(resize_h) / float(im_shape[0]) + im = cv2.resize( + frame, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=cv2.INTER_LINEAR) + image = im.astype('float32') + im_info = ('resize', im_shape[0:2]) + pred = model.predict(image, test_transforms) + score_map = pred['score_map'] + cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY) + cur_gray = cv2.resize(cur_gray, (resize_w, resize_h)) + score_map = 255 * score_map[:, :, 1] + optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \ + disflow, is_init) + prev_gray = cur_gray.copy() + prev_cfd = optflow_map.copy() + is_init = False + optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0) + optflow_map = threshold_mask( + optflow_map, thresh_bg=0.2, thresh_fg=0.8) + score_map = recover(optflow_map, im_info) + + #循环读取背景帧 + if is_video_bg: + ret_bg, frame_bg = cap_bg.read() + if ret_bg: + if current_frame_bg == frames_bg: + current_frame_bg = 1 + cap_bg.set(cv2.CAP_PROP_POS_FRAMES, 0) + else: + break + current_frame_bg += 1 + comb = bg_replace(score_map, frame, frame_bg) + else: + comb = bg_replace(score_map, frame, img_bg) + cv2.imshow('HumanSegmentation', comb) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + else: + break + if is_video_bg: + cap_bg.release() + cap_video.release() + + +if __name__ == "__main__": + args = parse_args() + infer(args) diff --git a/examples/human_segmentation/data/download_data.py b/examples/human_segmentation/data/download_data.py new file mode 100644 index 0000000000000000000000000000000000000000..941b4cc81ef05335c867c6c1eea20c07c44c7360 --- /dev/null +++ b/examples/human_segmentation/data/download_data.py @@ -0,0 +1,33 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License" +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os + +LOCAL_PATH = os.path.dirname(os.path.abspath(__file__)) + +import paddlex as pdx + + +def download_data(savepath): + url = "https://paddleseg.bj.bcebos.com/humanseg/data/mini_supervisely.zip" + pdx.utils.download_and_decompress(url=url, path=savepath) + + url = "https://paddleseg.bj.bcebos.com/humanseg/data/video_test.zip" + pdx.utils.download_and_decompress(url=url, path=savepath) + + +if __name__ == "__main__": + download_data(LOCAL_PATH) + print("Data download finish!") diff --git a/examples/human_segmentation/eval.py b/examples/human_segmentation/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..a6e05ea0b2c463b948a1a021fa74f01512985675 --- /dev/null +++ b/examples/human_segmentation/eval.py @@ -0,0 +1,85 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import paddlex as pdx +import paddlex.utils.logging as logging +from paddlex.seg import transforms + + +def parse_args(): + parser = argparse.ArgumentParser(description='HumanSeg training') + parser.add_argument( + '--model_dir', + dest='model_dir', + help='Model path for evaluating', + type=str, + default='output/best_model') + parser.add_argument( + '--data_dir', + dest='data_dir', + help='The root directory of dataset', + type=str) + parser.add_argument( + '--val_list', + dest='val_list', + help='Val list file of dataset', + type=str, + default=None) + parser.add_argument( + '--batch_size', + dest='batch_size', + help='Mini batch size', + type=int, + default=128) + parser.add_argument( + "--image_shape", + dest="image_shape", + help="The image shape for net inputs.", + nargs=2, + default=[192, 192], + type=int) + return parser.parse_args() + + +def dict2str(dict_input): + out = '' + for k, v in dict_input.items(): + try: + v = round(float(v), 6) + except: + pass + out = out + '{}={}, '.format(k, v) + return out.strip(', ') + + +def evaluate(args): + eval_transforms = transforms.Compose( + [transforms.Resize(args.image_shape), transforms.Normalize()]) + + eval_dataset = pdx.datasets.SegDataset( + data_dir=args.data_dir, + file_list=args.val_list, + transforms=eval_transforms) + + model = pdx.load_model(args.model_dir) + metrics = model.evaluate(eval_dataset, args.batch_size) + logging.info('[EVAL] Finished, {} .'.format(dict2str(metrics))) + + +if __name__ == '__main__': + args = parse_args() + + evaluate(args) diff --git a/examples/human_segmentation/infer.py b/examples/human_segmentation/infer.py new file mode 100644 index 0000000000000000000000000000000000000000..c78df7ae51609299a44d1c706197c56e2a20618e --- /dev/null +++ b/examples/human_segmentation/infer.py @@ -0,0 +1,109 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os +import os.path as osp +import cv2 +import numpy as np +import tqdm + +import paddlex as pdx +from paddlex.seg import transforms + + +def parse_args(): + parser = argparse.ArgumentParser( + description='HumanSeg prediction and visualization') + parser.add_argument( + '--model_dir', + dest='model_dir', + help='Model path for prediction', + type=str) + parser.add_argument( + '--data_dir', + dest='data_dir', + help='The root directory of dataset', + type=str) + parser.add_argument( + '--test_list', + dest='test_list', + help='Test list file of dataset', + type=str) + parser.add_argument( + '--save_dir', + dest='save_dir', + help='The directory for saving the inference results', + type=str, + default='./output/result') + parser.add_argument( + "--image_shape", + dest="image_shape", + help="The image shape for net inputs.", + nargs=2, + default=[192, 192], + type=int) + return parser.parse_args() + + +def infer(args): + def makedir(path): + sub_dir = osp.dirname(path) + if not osp.exists(sub_dir): + os.makedirs(sub_dir) + + test_transforms = transforms.Compose( + [transforms.Resize(args.image_shape), transforms.Normalize()]) + model = pdx.load_model(args.model_dir) + added_saved_path = osp.join(args.save_dir, 'added') + mat_saved_path = osp.join(args.save_dir, 'mat') + scoremap_saved_path = osp.join(args.save_dir, 'scoremap') + + with open(args.test_list, 'r') as f: + files = f.readlines() + + for file in tqdm.tqdm(files): + file = file.strip() + im_file = osp.join(args.data_dir, file) + im = cv2.imread(im_file) + result = model.predict(im_file, transforms=test_transforms) + + # save added image + added_image = pdx.seg.visualize( + im_file, result, weight=0.6, save_dir=None) + added_image_file = osp.join(added_saved_path, file) + makedir(added_image_file) + cv2.imwrite(added_image_file, added_image) + + # save score map + score_map = result['score_map'][:, :, 1] + score_map = (score_map * 255).astype(np.uint8) + score_map_file = osp.join(scoremap_saved_path, file) + makedir(score_map_file) + cv2.imwrite(score_map_file, score_map) + + # save mat image + score_map = np.expand_dims(score_map, axis=-1) + mat_image = np.concatenate([im, score_map], axis=2) + mat_file = osp.join(mat_saved_path, file) + ext = osp.splitext(mat_file)[-1] + mat_file = mat_file.replace(ext, '.png') + makedir(mat_file) + cv2.imwrite(mat_file, mat_image) + + +if __name__ == '__main__': + args = parse_args() + infer(args) diff --git a/examples/human_segmentation/postprocess.py b/examples/human_segmentation/postprocess.py new file mode 100644 index 0000000000000000000000000000000000000000..88e5dcc80f3d49d7d5625e74fe4de313b59fa844 --- /dev/null +++ b/examples/human_segmentation/postprocess.py @@ -0,0 +1,125 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np + + +def cal_optical_flow_tracking(pre_gray, cur_gray, prev_cfd, dl_weights, + disflow): + """计算光流跟踪匹配点和光流图 + 输入参数: + pre_gray: 上一帧灰度图 + cur_gray: 当前帧灰度图 + prev_cfd: 上一帧光流图 + dl_weights: 融合权重图 + disflow: 光流数据结构 + 返回值: + is_track: 光流点跟踪二值图,即是否具有光流点匹配 + track_cfd: 光流跟踪图 + """ + check_thres = 8 + h, w = pre_gray.shape[:2] + track_cfd = np.zeros_like(prev_cfd) + is_track = np.zeros_like(pre_gray) + flow_fw = disflow.calc(pre_gray, cur_gray, None) + flow_bw = disflow.calc(cur_gray, pre_gray, None) + flow_fw = np.round(flow_fw).astype(np.int) + flow_bw = np.round(flow_bw).astype(np.int) + y_list = np.array(range(h)) + x_list = np.array(range(w)) + yv, xv = np.meshgrid(y_list, x_list) + yv, xv = yv.T, xv.T + cur_x = xv + flow_fw[:, :, 0] + cur_y = yv + flow_fw[:, :, 1] + + # 超出边界不跟踪 + not_track = (cur_x < 0) + (cur_x >= w) + (cur_y < 0) + (cur_y >= h) + flow_bw[~not_track] = flow_bw[cur_y[~not_track], cur_x[~not_track]] + not_track += (np.square(flow_fw[:, :, 0] + flow_bw[:, :, 0]) + + np.square(flow_fw[:, :, 1] + flow_bw[:, :, 1]) + ) >= check_thres + track_cfd[cur_y[~not_track], cur_x[~not_track]] = prev_cfd[~not_track] + + is_track[cur_y[~not_track], cur_x[~not_track]] = 1 + + not_flow = np.all(np.abs(flow_fw) == 0, + axis=-1) * np.all(np.abs(flow_bw) == 0, axis=-1) + dl_weights[cur_y[not_flow], cur_x[not_flow]] = 0.05 + return track_cfd, is_track, dl_weights + + +def fuse_optical_flow_tracking(track_cfd, dl_cfd, dl_weights, is_track): + """光流追踪图和人像分割结构融合 + 输入参数: + track_cfd: 光流追踪图 + dl_cfd: 当前帧分割结果 + dl_weights: 融合权重图 + is_track: 光流点匹配二值图 + 返回 + cur_cfd: 光流跟踪图和人像分割结果融合图 + """ + fusion_cfd = dl_cfd.copy() + is_track = is_track.astype(np.bool) + fusion_cfd[is_track] = dl_weights[is_track] * dl_cfd[is_track] + ( + 1 - dl_weights[is_track]) * track_cfd[is_track] + # 确定区域 + index_certain = ((dl_cfd > 0.9) + (dl_cfd < 0.1)) * is_track + index_less01 = (dl_weights < 0.1) * index_certain + fusion_cfd[index_less01] = 0.3 * dl_cfd[index_less01] + 0.7 * track_cfd[ + index_less01] + index_larger09 = (dl_weights >= 0.1) * index_certain + fusion_cfd[index_larger09] = 0.4 * dl_cfd[ + index_larger09] + 0.6 * track_cfd[index_larger09] + return fusion_cfd + + +def threshold_mask(img, thresh_bg, thresh_fg): + dst = (img / 255.0 - thresh_bg) / (thresh_fg - thresh_bg) + dst[np.where(dst > 1)] = 1 + dst[np.where(dst < 0)] = 0 + return dst.astype(np.float32) + + +def postprocess(cur_gray, scoremap, prev_gray, pre_cfd, disflow, is_init): + """光流优化 + Args: + cur_gray : 当前帧灰度图 + pre_gray : 前一帧灰度图 + pre_cfd :前一帧融合结果 + scoremap : 当前帧分割结果 + difflow : 光流 + is_init : 是否第一帧 + Returns: + fusion_cfd : 光流追踪图和预测结果融合图 + """ + h, w = scoremap.shape + cur_cfd = scoremap.copy() + + if is_init: + if h <= 64 or w <= 64: + disflow.setFinestScale(1) + elif h <= 160 or w <= 160: + disflow.setFinestScale(2) + else: + disflow.setFinestScale(3) + fusion_cfd = cur_cfd + else: + weights = np.ones((h, w), np.float32) * 0.3 + track_cfd, is_track, weights = cal_optical_flow_tracking( + prev_gray, cur_gray, pre_cfd, weights, disflow) + fusion_cfd = fuse_optical_flow_tracking(track_cfd, cur_cfd, weights, + is_track) + + return fusion_cfd diff --git a/examples/human_segmentation/pretrain_weights/download_pretrain_weights.py b/examples/human_segmentation/pretrain_weights/download_pretrain_weights.py new file mode 100644 index 0000000000000000000000000000000000000000..be961ab6ebca2f8fef2e5573a817ccfd29fee41a --- /dev/null +++ b/examples/human_segmentation/pretrain_weights/download_pretrain_weights.py @@ -0,0 +1,40 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os + +LOCAL_PATH = os.path.dirname(os.path.abspath(__file__)) + +import paddlex as pdx +import paddlehub as hub + +model_urls = { + "PaddleX_HumanSeg_Server_Params": + "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_server_params.tar", + "PaddleX_HumanSeg_Server_Inference": + "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_server_inference.tar", + "PaddleX_HumanSeg_Mobile_Params": + "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_mobile_params.tar", + "PaddleX_HumanSeg_Mobile_Inference": + "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_mobile_inference.tar", + "PaddleX_HumanSeg_Mobile_Quant": + "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_mobile_quant.tar" +} + +if __name__ == "__main__": + for model_name, url in model_urls.items(): + pdx.utils.download_and_decompress(url=url, path=LOCAL_PATH) + print("Pretrained Model download success!") diff --git a/examples/human_segmentation/quant_offline.py b/examples/human_segmentation/quant_offline.py new file mode 100644 index 0000000000000000000000000000000000000000..a801f8d02263f8dab98f3250478a289337492ae4 --- /dev/null +++ b/examples/human_segmentation/quant_offline.py @@ -0,0 +1,85 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import paddlex as pdx +from paddlex.seg import transforms + + +def parse_args(): + parser = argparse.ArgumentParser(description='HumanSeg training') + parser.add_argument( + '--model_dir', + dest='model_dir', + help='Model path for quant', + type=str, + default='output/best_model') + parser.add_argument( + '--batch_size', + dest='batch_size', + help='Mini batch size', + type=int, + default=1) + parser.add_argument( + '--batch_nums', + dest='batch_nums', + help='Batch number for quant', + type=int, + default=10) + parser.add_argument( + '--data_dir', + dest='data_dir', + help='the root directory of dataset', + type=str) + parser.add_argument( + '--quant_list', + dest='quant_list', + help='Image file list for model quantization, it can be vat.txt or train.txt', + type=str, + default=None) + parser.add_argument( + '--save_dir', + dest='save_dir', + help='The directory for saving the quant model', + type=str, + default='./output/quant_offline') + parser.add_argument( + "--image_shape", + dest="image_shape", + help="The image shape for net inputs.", + nargs=2, + default=[192, 192], + type=int) + return parser.parse_args() + + +def evaluate(args): + eval_transforms = transforms.Compose( + [transforms.Resize(args.image_shape), transforms.Normalize()]) + + eval_dataset = pdx.datasets.SegDataset( + data_dir=args.data_dir, + file_list=args.quant_list, + transforms=eval_transforms) + + model = pdx.load_model(args.model_dir) + pdx.slim.export_quant_model(model, eval_dataset, args.batch_size, + args.batch_nums, args.save_dir) + + +if __name__ == '__main__': + args = parse_args() + + evaluate(args) diff --git a/examples/human_segmentation/train.py b/examples/human_segmentation/train.py new file mode 100644 index 0000000000000000000000000000000000000000..a7df98f360a78c2624814fc75bb0c382e19b7e95 --- /dev/null +++ b/examples/human_segmentation/train.py @@ -0,0 +1,156 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse + +import paddlex as pdx +from paddlex.seg import transforms + +MODEL_TYPE = ['HumanSegMobile', 'HumanSegServer'] + + +def parse_args(): + parser = argparse.ArgumentParser(description='HumanSeg training') + parser.add_argument( + '--model_type', + dest='model_type', + help="Model type for traing, which is one of ('HumanSegMobile', 'HumanSegServer')", + type=str, + default='HumanSegMobile') + parser.add_argument( + '--data_dir', + dest='data_dir', + help='The root directory of dataset', + type=str) + parser.add_argument( + '--train_list', + dest='train_list', + help='Train list file of dataset', + type=str) + parser.add_argument( + '--val_list', + dest='val_list', + help='Val list file of dataset', + type=str, + default=None) + parser.add_argument( + '--save_dir', + dest='save_dir', + help='The directory for saving the model snapshot', + type=str, + default='./output') + parser.add_argument( + '--num_classes', + dest='num_classes', + help='Number of classes', + type=int, + default=2) + parser.add_argument( + "--image_shape", + dest="image_shape", + help="The image shape for net inputs.", + nargs=2, + default=[192, 192], + type=int) + parser.add_argument( + '--num_epochs', + dest='num_epochs', + help='Number epochs for training', + type=int, + default=100) + parser.add_argument( + '--batch_size', + dest='batch_size', + help='Mini batch size', + type=int, + default=128) + parser.add_argument( + '--learning_rate', + dest='learning_rate', + help='Learning rate', + type=float, + default=0.01) + parser.add_argument( + '--pretrain_weights', + dest='pretrain_weights', + help='The path of pretrianed weight', + type=str, + default=None) + parser.add_argument( + '--resume_checkpoint', + dest='resume_checkpoint', + help='The path of resume checkpoint', + type=str, + default=None) + parser.add_argument( + '--use_vdl', + dest='use_vdl', + help='Whether to use visualdl', + action='store_true') + parser.add_argument( + '--save_interval_epochs', + dest='save_interval_epochs', + help='The interval epochs for save a model snapshot', + type=int, + default=5) + + return parser.parse_args() + + +def train(args): + train_transforms = transforms.Compose([ + transforms.Resize(args.image_shape), transforms.RandomHorizontalFlip(), + transforms.Normalize() + ]) + + eval_transforms = transforms.Compose( + [transforms.Resize(args.image_shape), transforms.Normalize()]) + + train_dataset = pdx.datasets.SegDataset( + data_dir=args.data_dir, + file_list=args.train_list, + transforms=train_transforms, + shuffle=True) + eval_dataset = pdx.datasets.SegDataset( + data_dir=args.data_dir, + file_list=args.val_list, + transforms=eval_transforms) + + if args.model_type == 'HumanSegMobile': + model = pdx.seg.HRNet( + num_classes=args.num_classes, width='18_small_v1') + elif args.model_type == 'HumanSegServer': + model = pdx.seg.DeepLabv3p( + num_classes=args.num_classes, backbone='Xception65') + else: + raise ValueError( + "--model_type: {} is set wrong, it shold be one of ('HumanSegMobile', " + "'HumanSegLite', 'HumanSegServer')".format(args.model_type)) + model.train( + num_epochs=args.num_epochs, + train_dataset=train_dataset, + train_batch_size=args.batch_size, + eval_dataset=eval_dataset, + save_interval_epochs=args.save_interval_epochs, + learning_rate=args.learning_rate, + pretrain_weights=args.pretrain_weights, + resume_checkpoint=args.resume_checkpoint, + save_dir=args.save_dir, + use_vdl=args.use_vdl) + + +if __name__ == '__main__': + args = parse_args() + train(args) diff --git a/examples/human_segmentation/video_infer.py b/examples/human_segmentation/video_infer.py new file mode 100644 index 0000000000000000000000000000000000000000..c2a67fe0032eae19e937580ff35e53ba09d1118f --- /dev/null +++ b/examples/human_segmentation/video_infer.py @@ -0,0 +1,187 @@ +# coding: utf8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os +import os.path as osp +import cv2 +import numpy as np + +from postprocess import postprocess, threshold_mask +import paddlex as pdx +import paddlex.utils.logging as logging +from paddlex.seg import transforms + + +def parse_args(): + parser = argparse.ArgumentParser( + description='HumanSeg inference for video') + parser.add_argument( + '--model_dir', + dest='model_dir', + help='Model path for inference', + type=str) + parser.add_argument( + '--video_path', + dest='video_path', + help='Video path for inference, camera will be used if the path not existing', + type=str, + default=None) + parser.add_argument( + '--save_dir', + dest='save_dir', + help='The directory for saving the inference results', + type=str, + default='./output') + parser.add_argument( + "--image_shape", + dest="image_shape", + help="The image shape for net inputs.", + nargs=2, + default=[192, 192], + type=int) + + return parser.parse_args() + + +def recover(img, im_info): + if im_info[0] == 'resize': + w, h = im_info[1][1], im_info[1][0] + img = cv2.resize(img, (w, h), cv2.INTER_LINEAR) + elif im_info[0] == 'padding': + w, h = im_info[1][0], im_info[1][0] + img = img[0:h, 0:w, :] + return img + + +def video_infer(args): + resize_h = args.image_shape[1] + resize_w = args.image_shape[0] + + model = pdx.load_model(args.model_dir) + test_transforms = transforms.Compose([transforms.Normalize()]) + if not args.video_path: + cap = cv2.VideoCapture(0) + else: + cap = cv2.VideoCapture(args.video_path) + if not cap.isOpened(): + raise IOError("Error opening video stream or file, " + "--video_path whether existing: {}" + " or camera whether working".format(args.video_path)) + return + + width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) + height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) + + disflow = cv2.DISOpticalFlow_create(cv2.DISOPTICAL_FLOW_PRESET_ULTRAFAST) + prev_gray = np.zeros((resize_h, resize_w), np.uint8) + prev_cfd = np.zeros((resize_h, resize_w), np.float32) + is_init = True + + fps = cap.get(cv2.CAP_PROP_FPS) + if args.video_path: + logging.info("Please wait. It is computing......") + # 用于保存预测结果视频 + if not osp.exists(args.save_dir): + os.makedirs(args.save_dir) + out = cv2.VideoWriter( + osp.join(args.save_dir, 'result.avi'), + cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps, (width, height)) + # 开始获取视频帧 + while cap.isOpened(): + ret, frame = cap.read() + if ret: + im_shape = frame.shape + im_scale_x = float(resize_w) / float(im_shape[1]) + im_scale_y = float(resize_h) / float(im_shape[0]) + im = cv2.resize( + frame, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=cv2.INTER_LINEAR) + image = im.astype('float32') + im_info = ('resize', im_shape[0:2]) + pred = model.predict(image, test_transforms) + score_map = pred['score_map'] + cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY) + score_map = 255 * score_map[:, :, 1] + optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \ + disflow, is_init) + prev_gray = cur_gray.copy() + prev_cfd = optflow_map.copy() + is_init = False + optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0) + optflow_map = threshold_mask( + optflow_map, thresh_bg=0.2, thresh_fg=0.8) + img_matting = np.repeat( + optflow_map[:, :, np.newaxis], 3, axis=2) + img_matting = recover(img_matting, im_info) + bg_im = np.ones_like(img_matting) * 255 + comb = (img_matting * frame + + (1 - img_matting) * bg_im).astype(np.uint8) + out.write(comb) + else: + break + cap.release() + out.release() + + else: + while cap.isOpened(): + ret, frame = cap.read() + if ret: + im_shape = frame.shape + im_scale_x = float(resize_w) / float(im_shape[1]) + im_scale_y = float(resize_h) / float(im_shape[0]) + im = cv2.resize( + frame, + None, + None, + fx=im_scale_x, + fy=im_scale_y, + interpolation=cv2.INTER_LINEAR) + image = im.astype('float32') + im_info = ('resize', im_shape[0:2]) + pred = model.predict(image, test_transforms) + score_map = pred['score_map'] + cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY) + cur_gray = cv2.resize(cur_gray, (resize_w, resize_h)) + score_map = 255 * score_map[:, :, 1] + optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \ + disflow, is_init) + prev_gray = cur_gray.copy() + prev_cfd = optflow_map.copy() + is_init = False + optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0) + optflow_map = threshold_mask( + optflow_map, thresh_bg=0.2, thresh_fg=0.8) + img_matting = np.repeat( + optflow_map[:, :, np.newaxis], 3, axis=2) + img_matting = recover(img_matting, im_info) + bg_im = np.ones_like(img_matting) * 255 + comb = (img_matting * frame + + (1 - img_matting) * bg_im).astype(np.uint8) + cv2.imshow('HumanSegmentation', comb) + if cv2.waitKey(1) & 0xFF == ord('q'): + break + else: + break + cap.release() + + +if __name__ == "__main__": + args = parse_args() + video_infer(args) diff --git a/new_tutorials/train/README.md b/new_tutorials/train/README.md deleted file mode 100644 index fc319d16d0c795f856600355d43c18ef413eae0e..0000000000000000000000000000000000000000 --- a/new_tutorials/train/README.md +++ /dev/null @@ -1,21 +0,0 @@ -# 使用教程——训练模型 - -本目录下整理了使用PaddleX训练模型的示例代码,代码中均提供了示例数据的自动下载,并均使用单张GPU卡进行训练。 - -|代码 | 模型任务 | 数据 | -|------|--------|---------| -|classification/mobilenetv2.py | 图像分类MobileNetV2 | 蔬菜分类 | -|classification/resnet50.py | 图像分类ResNet50 | 蔬菜分类 | -|detection/faster_rcnn_r50_fpn.py | 目标检测FasterRCNN | 昆虫检测 | -|detection/mask_rcnn_f50_fpn.py | 实例分割MaskRCNN | 垃圾分拣 | -|segmentation/deeplabv3p.py | 语义分割DeepLabV3| 视盘分割 | -|segmentation/unet.py | 语义分割UNet | 视盘分割 | -|segmentation/hrnet.py | 语义分割HRNet | 视盘分割 | -|segmentation/fast_scnn.py | 语义分割FastSCNN | 视盘分割 | - - -## 开始训练 -在安装PaddleX后,使用如下命令开始训练 -``` -python classification/mobilenetv2.py -``` diff --git a/new_tutorials/train/classification/mobilenetv2.py b/new_tutorials/train/classification/mobilenetv2.py deleted file mode 100644 index 9a075526a3cbb7e560c133f08faef68ea5a07121..0000000000000000000000000000000000000000 --- a/new_tutorials/train/classification/mobilenetv2.py +++ /dev/null @@ -1,47 +0,0 @@ -import os -# 选择使用0号卡 -os.environ['CUDA_VISIBLE_DEVICES'] = '0' - -from paddlex.cls import transforms -import paddlex as pdx - -# 下载和解压蔬菜分类数据集 -veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz' -pdx.utils.download_and_decompress(veg_dataset, path='./') - -# 定义训练和验证时的transforms -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/cls_transforms.html#composedclstransforms -train_transforms = transforms.ComposedClsTransforms(mode='train', crop_size=[224, 224]) -eval_transforms = transforms.ComposedClsTransforms(mode='eval', crop_size=[224, 224]) - -# 定义训练和验证所用的数据集 -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/classification.html#imagenet -train_dataset = pdx.datasets.ImageNet( - data_dir='vegetables_cls', - file_list='vegetables_cls/train_list.txt', - label_list='vegetables_cls/labels.txt', - transforms=train_transforms, - shuffle=True) -eval_dataset = pdx.datasets.ImageNet( - data_dir='vegetables_cls', - file_list='vegetables_cls/val_list.txt', - label_list='vegetables_cls/labels.txt', - transforms=eval_transforms) - -# 初始化模型,并进行训练 -# 可使用VisualDL查看训练指标 -# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001 -# 浏览器打开 https://0.0.0.0:8001即可 -# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP - -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/classification.html#resnet50 -model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels)) -model.train( - num_epochs=10, - train_dataset=train_dataset, - train_batch_size=32, - eval_dataset=eval_dataset, - lr_decay_epochs=[4, 6, 8], - learning_rate=0.025, - save_dir='output/mobilenetv2', - use_vdl=True) diff --git a/new_tutorials/train/classification/resnet50.py b/new_tutorials/train/classification/resnet50.py deleted file mode 100644 index bf56a605f1c3376057c1ab9283fa1251491b2750..0000000000000000000000000000000000000000 --- a/new_tutorials/train/classification/resnet50.py +++ /dev/null @@ -1,56 +0,0 @@ -import os -# 选择使用0号卡 -os.environ['CUDA_VISIBLE_DEVICES'] = '0' - -import paddle.fluid as fluid -from paddlex.cls import transforms -import paddlex as pdx - -# 下载和解压蔬菜分类数据集 -veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz' -pdx.utils.download_and_decompress(veg_dataset, path='./') - -# 定义训练和验证时的transforms -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/cls_transforms.html#composedclstransforms -train_transforms = transforms.ComposedClsTransforms(mode='train', crop_size=[224, 224]) -eval_transforms = transforms.ComposedClsTransforms(mode='eval', crop_size=[224, 224]) - -# 定义训练和验证所用的数据集 -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/classification.html#imagenet -train_dataset = pdx.datasets.ImageNet( - data_dir='vegetables_cls', - file_list='vegetables_cls/train_list.txt', - label_list='vegetables_cls/labels.txt', - transforms=train_transforms, - shuffle=True) -eval_dataset = pdx.datasets.ImageNet( - data_dir='vegetables_cls', - file_list='vegetables_cls/val_list.txt', - label_list='vegetables_cls/labels.txt', - transforms=eval_transforms) - -# PaddleX支持自定义构建优化器 -step_each_epoch = train_dataset.num_samples // 32 -learning_rate = fluid.layers.cosine_decay( - learning_rate=0.025, step_each_epoch=step_each_epoch, epochs=10) -optimizer = fluid.optimizer.Momentum( - learning_rate=learning_rate, - momentum=0.9, - regularization=fluid.regularizer.L2Decay(4e-5)) - -# 初始化模型,并进行训练 -# 可使用VisualDL查看训练指标 -# VisualDL启动方式: visualdl --logdir output/resnet50/vdl_log --port 8001 -# 浏览器打开 https://0.0.0.0:8001即可 -# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP - -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/classification.html#resnet50 -model = pdx.cls.ResNet50(num_classes=len(train_dataset.labels)) -model.train( - num_epochs=10, - train_dataset=train_dataset, - train_batch_size=32, - eval_dataset=eval_dataset, - optimizer=optimizer, - save_dir='output/resnet50', - use_vdl=True) diff --git a/new_tutorials/train/detection/faster_rcnn_r50_fpn.py b/new_tutorials/train/detection/faster_rcnn_r50_fpn.py deleted file mode 100644 index a64b711c3af48cb85cfd8a82938785ca386a99ec..0000000000000000000000000000000000000000 --- a/new_tutorials/train/detection/faster_rcnn_r50_fpn.py +++ /dev/null @@ -1,49 +0,0 @@ -import os -# 选择使用0号卡 -os.environ['CUDA_VISIBLE_DEVICES'] = '0' - -from paddlex.det import transforms -import paddlex as pdx - -# 下载和解压昆虫检测数据集 -insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz' -pdx.utils.download_and_decompress(insect_dataset, path='./') - -# 定义训练和验证时的transforms -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#composedrcnntransforms -train_transforms = transforms.ComposedRCNNTransforms(mode='train', min_max_size=[800, 1333]) -eval_transforms = transforms.ComposedRCNNTransforms(mode='eval', min_max_size=[800, 1333]) - -# 定义训练和验证所用的数据集 -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/detection.html#vocdetection -train_dataset = pdx.datasets.VOCDetection( - data_dir='insect_det', - file_list='insect_det/train_list.txt', - label_list='insect_det/labels.txt', - transforms=train_transforms, - shuffle=True) -eval_dataset = pdx.datasets.VOCDetection( - data_dir='insect_det', - file_list='insect_det/val_list.txt', - label_list='insect_det/labels.txt', - transforms=eval_transforms) - -# 初始化模型,并进行训练 -# 可使用VisualDL查看训练指标 -# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001 -# 浏览器打开 https://0.0.0.0:8001即可 -# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP -# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1 - -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/detection.html#fasterrcnn -num_classes = len(train_dataset.labels) + 1 -model = pdx.det.FasterRCNN(num_classes=num_classes) -model.train( - num_epochs=12, - train_dataset=train_dataset, - train_batch_size=2, - eval_dataset=eval_dataset, - learning_rate=0.0025, - lr_decay_epochs=[8, 11], - save_dir='output/faster_rcnn_r50_fpn', - use_vdl=True) diff --git a/new_tutorials/train/detection/mask_rcnn_r50_fpn.py b/new_tutorials/train/detection/mask_rcnn_r50_fpn.py deleted file mode 100644 index f2ebf6e20f18054bf16452eb6e60b9ea24f20748..0000000000000000000000000000000000000000 --- a/new_tutorials/train/detection/mask_rcnn_r50_fpn.py +++ /dev/null @@ -1,48 +0,0 @@ -import os -# 选择使用0号卡 -os.environ['CUDA_VISIBLE_DEVICES'] = '0' - -from paddlex.det import transforms -import paddlex as pdx - -# 下载和解压小度熊分拣数据集 -xiaoduxiong_dataset = 'https://bj.bcebos.com/paddlex/datasets/xiaoduxiong_ins_det.tar.gz' -pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./') - -# 定义训练和验证时的transforms -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#composedrcnntransforms -train_transforms = transforms.ComposedRCNNTransforms(mode='train', min_max_size=[800, 1333]) -eval_transforms = transforms.ComposedRCNNTransforms(mode='eval', min_max_size=[800, 1333]) - -# 定义训练和验证所用的数据集 -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/detection.html#cocodetection -train_dataset = pdx.datasets.CocoDetection( - data_dir='xiaoduxiong_ins_det/JPEGImages', - ann_file='xiaoduxiong_ins_det/train.json', - transforms=train_transforms, - shuffle=True) -eval_dataset = pdx.datasets.CocoDetection( - data_dir='xiaoduxiong_ins_det/JPEGImages', - ann_file='xiaoduxiong_ins_det/val.json', - transforms=eval_transforms) - -# 初始化模型,并进行训练 -# 可使用VisualDL查看训练指标 -# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001 -# 浏览器打开 https://0.0.0.0:8001即可 -# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP -# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1 - -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/instance_segmentation.html#maskrcnn -num_classes = len(train_dataset.labels) + 1 -model = pdx.det.MaskRCNN(num_classes=num_classes) -model.train( - num_epochs=12, - train_dataset=train_dataset, - train_batch_size=1, - eval_dataset=eval_dataset, - learning_rate=0.00125, - warmup_steps=10, - lr_decay_epochs=[8, 11], - save_dir='output/mask_rcnn_r50_fpn', - use_vdl=True) diff --git a/new_tutorials/train/detection/yolov3_darknet53.py b/new_tutorials/train/detection/yolov3_darknet53.py deleted file mode 100644 index 8027a506458aac94de82a915aa8b058d71ba97f7..0000000000000000000000000000000000000000 --- a/new_tutorials/train/detection/yolov3_darknet53.py +++ /dev/null @@ -1,48 +0,0 @@ -import os -# 选择使用0号卡 -os.environ['CUDA_VISIBLE_DEVICES'] = '0' - -from paddlex.det import transforms -import paddlex as pdx - -# 下载和解压昆虫检测数据集 -insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz' -pdx.utils.download_and_decompress(insect_dataset, path='./') - -# 定义训练和验证时的transforms -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#composedyolotransforms -train_transforms = transforms.ComposedYOLOv3Transforms(mode='train', shape=[608, 608]) -eval_transforms = transforms.ComposedYOLOv3Transforms(mode='eva', shape=[608, 608]) - -# 定义训练和验证所用的数据集 -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/detection.html#vocdetection -train_dataset = pdx.datasets.VOCDetection( - data_dir='insect_det', - file_list='insect_det/train_list.txt', - label_list='insect_det/labels.txt', - transforms=train_transforms, - shuffle=True) -eval_dataset = pdx.datasets.VOCDetection( - data_dir='insect_det', - file_list='insect_det/val_list.txt', - label_list='insect_det/labels.txt', - transforms=eval_transforms) - -# 初始化模型,并进行训练 -# 可使用VisualDL查看训练指标 -# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001 -# 浏览器打开 https://0.0.0.0:8001即可 -# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP - -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/detection.html#yolov3 -num_classes = len(train_dataset.labels) -model = pdx.det.YOLOv3(num_classes=num_classes, backbone='DarkNet53') -model.train( - num_epochs=270, - train_dataset=train_dataset, - train_batch_size=8, - eval_dataset=eval_dataset, - learning_rate=0.000125, - lr_decay_epochs=[210, 240], - save_dir='output/yolov3_darknet53', - use_vdl=True) diff --git a/new_tutorials/train/segmentation/deeplabv3p.py b/new_tutorials/train/segmentation/deeplabv3p.py deleted file mode 100644 index cb18fcfad65331d02b04abe3c3a76fa0356fb5b8..0000000000000000000000000000000000000000 --- a/new_tutorials/train/segmentation/deeplabv3p.py +++ /dev/null @@ -1,51 +0,0 @@ -import os -# 选择使用0号卡 -os.environ['CUDA_VISIBLE_DEVICES'] = '0' - -import paddlex as pdx -from paddlex.seg import transforms - -# 下载和解压视盘分割数据集 -optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz' -pdx.utils.download_and_decompress(optic_dataset, path='./') - -# 定义训练和验证时的transforms -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/seg_transforms.html#composedsegtransforms -train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[769, 769]) -eval_transforms = transforms.ComposedSegTransforms(mode='eval') - -train_transforms.add_augmenters([ - transforms.RandomRotate() -]) - -# 定义训练和验证所用的数据集 -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/semantic_segmentation.html#segdataset -train_dataset = pdx.datasets.SegDataset( - data_dir='optic_disc_seg', - file_list='optic_disc_seg/train_list.txt', - label_list='optic_disc_seg/labels.txt', - transforms=train_transforms, - shuffle=True) -eval_dataset = pdx.datasets.SegDataset( - data_dir='optic_disc_seg', - file_list='optic_disc_seg/val_list.txt', - label_list='optic_disc_seg/labels.txt', - transforms=eval_transforms) - -# 初始化模型,并进行训练 -# 可使用VisualDL查看训练指标 -# VisualDL启动方式: visualdl --logdir output/deeplab/vdl_log --port 8001 -# 浏览器打开 https://0.0.0.0:8001即可 -# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP - -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#deeplabv3p -num_classes = len(train_dataset.labels) -model = pdx.seg.DeepLabv3p(num_classes=num_classes) -model.train( - num_epochs=40, - train_dataset=train_dataset, - train_batch_size=4, - eval_dataset=eval_dataset, - learning_rate=0.01, - save_dir='output/deeplab', - use_vdl=True) diff --git a/new_tutorials/train/segmentation/hrnet.py b/new_tutorials/train/segmentation/hrnet.py deleted file mode 100644 index 98fdd1b925bd4707001fdad56b3ffdc6bb2b58ae..0000000000000000000000000000000000000000 --- a/new_tutorials/train/segmentation/hrnet.py +++ /dev/null @@ -1,47 +0,0 @@ -import os -# 选择使用0号卡 -os.environ['CUDA_VISIBLE_DEVICES'] = '0' - -import paddlex as pdx -from paddlex.seg import transforms - -# 下载和解压视盘分割数据集 -optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz' -pdx.utils.download_and_decompress(optic_dataset, path='./') - -# 定义训练和验证时的transforms -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/seg_transforms.html#composedsegtransforms -train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[769, 769]) -eval_transforms = transforms.ComposedSegTransforms(mode='eval') - -# 定义训练和验证所用的数据集 -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/semantic_segmentation.html#segdataset -train_dataset = pdx.datasets.SegDataset( - data_dir='optic_disc_seg', - file_list='optic_disc_seg/train_list.txt', - label_list='optic_disc_seg/labels.txt', - transforms=train_transforms, - shuffle=True) -eval_dataset = pdx.datasets.SegDataset( - data_dir='optic_disc_seg', - file_list='optic_disc_seg/val_list.txt', - label_list='optic_disc_seg/labels.txt', - transforms=eval_transforms) - -# 初始化模型,并进行训练 -# 可使用VisualDL查看训练指标 -# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001 -# 浏览器打开 https://0.0.0.0:8001即可 -# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP - -# https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#hrnet -num_classes = len(train_dataset.labels) -model = pdx.seg.HRNet(num_classes=num_classes) -model.train( - num_epochs=20, - train_dataset=train_dataset, - train_batch_size=4, - eval_dataset=eval_dataset, - learning_rate=0.01, - save_dir='output/hrnet', - use_vdl=True) diff --git a/new_tutorials/train/segmentation/unet.py b/new_tutorials/train/segmentation/unet.py deleted file mode 100644 index ddf4f7991a690b0d0d506967df0c140f60945e85..0000000000000000000000000000000000000000 --- a/new_tutorials/train/segmentation/unet.py +++ /dev/null @@ -1,47 +0,0 @@ -import os -# 选择使用0号卡 -os.environ['CUDA_VISIBLE_DEVICES'] = '0' - -import paddlex as pdx -from paddlex.seg import transforms - -# 下载和解压视盘分割数据集 -optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz' -pdx.utils.download_and_decompress(optic_dataset, path='./') - -# 定义训练和验证时的transforms -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/seg_transforms.html#composedsegtransforms -train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[769, 769]) -eval_transforms = transforms.ComposedSegTransforms(mode='eval') - -# 定义训练和验证所用的数据集 -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/semantic_segmentation.html#segdataset -train_dataset = pdx.datasets.SegDataset( - data_dir='optic_disc_seg', - file_list='optic_disc_seg/train_list.txt', - label_list='optic_disc_seg/labels.txt', - transforms=train_transforms, - shuffle=True) -eval_dataset = pdx.datasets.SegDataset( - data_dir='optic_disc_seg', - file_list='optic_disc_seg/val_list.txt', - label_list='optic_disc_seg/labels.txt', - transforms=eval_transforms) - -# 初始化模型,并进行训练 -# 可使用VisualDL查看训练指标 -# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001 -# 浏览器打开 https://0.0.0.0:8001即可 -# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP - -# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#unet -num_classes = len(train_dataset.labels) -model = pdx.seg.UNet(num_classes=num_classes) -model.train( - num_epochs=20, - train_dataset=train_dataset, - train_batch_size=4, - eval_dataset=eval_dataset, - learning_rate=0.01, - save_dir='output/unet', - use_vdl=True) diff --git a/paddlex/__init__.py b/paddlex/__init__.py index b80363f2e6adfdbd6ce712cfec486540753abbb7..6fc8aff1d3fdbc08a7474627bf38f2af17599fb3 100644 --- a/paddlex/__init__.py +++ b/paddlex/__init__.py @@ -53,4 +53,4 @@ log_level = 2 from . import interpret -__version__ = '1.0.6' +__version__ = '1.0.7' diff --git a/paddlex/command.py b/paddlex/command.py index 8198291180b92a061dd633eae863f8ddb17727cb..612bc5f3f2b2c3bbec23f56c2983a722d76e21fc 100644 --- a/paddlex/command.py +++ b/paddlex/command.py @@ -1,11 +1,11 @@ # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# +# # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -15,6 +15,7 @@ from six import text_type as _text_type import argparse import sys +import paddlex.utils.logging as logging def arg_parser(): @@ -94,15 +95,15 @@ def main(): if args.export_onnx: assert args.model_dir is not None, "--model_dir should be defined while exporting onnx model" assert args.save_dir is not None, "--save_dir should be defined to create onnx model" - assert args.fixed_input_shape is not None, "--fixed_input_shape should be defined [w,h] to create onnx model, such as [224,224]" - fixed_input_shape = [] - if args.fixed_input_shape is not None: - fixed_input_shape = eval(args.fixed_input_shape) - assert len( - fixed_input_shape - ) == 2, "len of fixed input shape must == 2, such as [224,224]" - model = pdx.load_model(args.model_dir, fixed_input_shape) + model = pdx.load_model(args.model_dir) + if model.status == "Normal" or model.status == "Prune": + logging.error( + "Only support inference model, try to export model first as below,", + exit=False) + logging.error( + "paddlex --export_inference --model_dir model_path --save_dir infer_model" + ) pdx.convertor.export_onnx_model(model, args.save_dir) diff --git a/paddlex/convertor.py b/paddlex/convertor.py index a6888ae1ef9bd764d213125142d355e7e2ca2428..47fc8a82be5ac337206eb0c9dc395aecb862299e 100644 --- a/paddlex/convertor.py +++ b/paddlex/convertor.py @@ -1,11 +1,11 @@ # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# +# # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -30,119 +30,17 @@ def export_onnx(model_dir, save_dir, fixed_input_shape): def export_onnx_model(model, save_dir): - support_list = [ - 'ResNet18', 'ResNet34', 'ResNet50', 'ResNet101', 'ResNet50_vd', - 'ResNet101_vd', 'ResNet50_vd_ssld', 'ResNet101_vd_ssld', 'DarkNet53', - 'MobileNetV1', 'MobileNetV2', 'DenseNet121', 'DenseNet161', - 'DenseNet201' - ] - if model.__class__.__name__ not in support_list: - raise Exception("Model: {} unsupport export to ONNX".format( - model.__class__.__name__)) - try: - from fluid.utils import op_io_info, init_name_prefix - from onnx import helper, checker - import fluid_onnx.ops as ops - from fluid_onnx.variables import paddle_variable_to_onnx_tensor, paddle_onnx_weight - from debug.model_check import debug_model, Tracker - except Exception as e: + if model.model_type == "detector" or model.__class__.__name__ == "FastSCNN": logging.error( - "Import Module Failed! Please install paddle2onnx. Related requirements see https://github.com/PaddlePaddle/paddle2onnx." + "Only image classifier models and semantic segmentation models(except FastSCNN) are supported to export to ONNX" ) - raise e - place = fluid.CPUPlace() - exe = fluid.Executor(place) - inference_scope = fluid.global_scope() - with fluid.scope_guard(inference_scope): - test_input_names = [ - var.name for var in list(model.test_inputs.values()) - ] - inputs_outputs_list = ["fetch", "feed"] - weights, weights_value_info = [], [] - global_block = model.test_prog.global_block() - for var_name in global_block.vars: - var = global_block.var(var_name) - if var_name not in test_input_names\ - and var.persistable: - weight, val_info = paddle_onnx_weight( - var=var, scope=inference_scope) - weights.append(weight) - weights_value_info.append(val_info) - - # Create inputs - inputs = [ - paddle_variable_to_onnx_tensor(v, global_block) - for v in test_input_names - ] - logging.INFO("load the model parameter done.") - onnx_nodes = [] - op_check_list = [] - op_trackers = [] - nms_first_index = -1 - nms_outputs = [] - for block in model.test_prog.blocks: - for op in block.ops: - if op.type in ops.node_maker: - # TODO: deal with the corner case that vars in - # different blocks have the same name - node_proto = ops.node_maker[str(op.type)]( - operator=op, block=block) - op_outputs = [] - last_node = None - if isinstance(node_proto, tuple): - onnx_nodes.extend(list(node_proto)) - last_node = list(node_proto) - else: - onnx_nodes.append(node_proto) - last_node = [node_proto] - tracker = Tracker(str(op.type), last_node) - op_trackers.append(tracker) - op_check_list.append(str(op.type)) - if op.type == "multiclass_nms" and nms_first_index < 0: - nms_first_index = 0 - if nms_first_index >= 0: - _, _, output_op = op_io_info(op) - for output in output_op: - nms_outputs.extend(output_op[output]) - else: - if op.type not in ['feed', 'fetch']: - op_check_list.append(op.type) - logging.info('The operator sets to run test case.') - logging.info(set(op_check_list)) - - # Create outputs - # Get the new names for outputs if they've been renamed in nodes' making - renamed_outputs = op_io_info.get_all_renamed_outputs() - test_outputs = list(model.test_outputs.values()) - test_outputs_names = [var.name for var in model.test_outputs.values()] - test_outputs_names = [ - name if name not in renamed_outputs else renamed_outputs[name] - for name in test_outputs_names - ] - outputs = [ - paddle_variable_to_onnx_tensor(v, global_block) - for v in test_outputs_names - ] - - # Make graph - onnx_name = 'paddlex.onnx' - onnx_graph = helper.make_graph( - nodes=onnx_nodes, - name=onnx_name, - initializer=weights, - inputs=inputs + weights_value_info, - outputs=outputs) - - # Make model - onnx_model = helper.make_model( - onnx_graph, producer_name='PaddlePaddle') - - # Model check - checker.check_model(onnx_model) - if onnx_model is not None: - onnx_model_file = os.path.join(save_dir, onnx_name) - if not os.path.exists(save_dir): - os.mkdir(save_dir) - with open(onnx_model_file, 'wb') as f: - f.write(onnx_model.SerializeToString()) - logging.info("Saved converted model to path: %s" % onnx_model_file) + try: + import x2paddle + if x2paddle.__version__ < '0.7.4': + logging.error("You need to upgrade x2paddle >= 0.7.4") + except: + logging.error( + "You need to install x2paddle first, pip install x2paddle>=0.7.4") + from x2paddle.op_mapper.paddle_op_mapper import PaddleOpMapper + mapper = PaddleOpMapper() + mapper.convert(model.test_prog, save_dir) diff --git a/paddlex/cv/datasets/coco.py b/paddlex/cv/datasets/coco.py index 97e791be5ed3cac1656fba4429d90f1653bfe1be..264b2da1e6a6aa9e15bf8a2ae9b3fbdc3ee75f1b 100644 --- a/paddlex/cv/datasets/coco.py +++ b/paddlex/cv/datasets/coco.py @@ -100,7 +100,7 @@ class CocoDetection(VOCDetection): gt_score = np.ones((num_bbox, 1), dtype=np.float32) is_crowd = np.zeros((num_bbox, 1), dtype=np.int32) difficult = np.zeros((num_bbox, 1), dtype=np.int32) - gt_poly = None + gt_poly = [None] * num_bbox for i, box in enumerate(bboxes): catid = box['category_id'] @@ -108,8 +108,6 @@ class CocoDetection(VOCDetection): gt_bbox[i, :] = box['clean_bbox'] is_crowd[i][0] = box['iscrowd'] if 'segmentation' in box: - if gt_poly is None: - gt_poly = [None] * num_bbox gt_poly[i] = box['segmentation'] im_info = { @@ -121,10 +119,9 @@ class CocoDetection(VOCDetection): 'gt_class': gt_class, 'gt_bbox': gt_bbox, 'gt_score': gt_score, + 'gt_poly': gt_poly, 'difficult': difficult } - if gt_poly is not None: - label_info['gt_poly'] = gt_poly coco_rec = (im_info, label_info) self.file_list.append([im_fname, coco_rec]) diff --git a/paddlex/cv/datasets/easydata_cls.py b/paddlex/cv/datasets/easydata_cls.py index 121ae563308c695a0a76fcf383eb6e6bb7f43011..9b6dddc4843616ff0a09712e6766e3ea9552b466 100644 --- a/paddlex/cv/datasets/easydata_cls.py +++ b/paddlex/cv/datasets/easydata_cls.py @@ -39,14 +39,14 @@ class EasyDataCls(ImageNet): 线程和'process'进程两种方式。默认为'process'(Windows和Mac下会强制使用thread,该参数无效)。 shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。 """ - + def __init__(self, data_dir, file_list, label_list, transforms=None, num_workers='auto', - buffer_size=100, + buffer_size=8, parallel_method='process', shuffle=False): super(ImageNet, self).__init__( @@ -58,7 +58,7 @@ class EasyDataCls(ImageNet): self.file_list = list() self.labels = list() self._epoch = 0 - + with open(label_list, encoding=get_encoding(label_list)) as f: for line in f: item = line.strip() @@ -73,8 +73,8 @@ class EasyDataCls(ImageNet): if not osp.isfile(json_file): continue if not osp.exists(img_file): - raise IOError( - 'The image file {} is not exist!'.format(img_file)) + raise IOError('The image file {} is not exist!'.format( + img_file)) with open(json_file, mode='r', \ encoding=get_encoding(json_file)) as j: json_info = json.load(j) @@ -83,4 +83,3 @@ class EasyDataCls(ImageNet): self.num_samples = len(self.file_list) logging.info("{} samples in file {}".format( len(self.file_list), file_list)) - \ No newline at end of file diff --git a/paddlex/cv/datasets/imagenet.py b/paddlex/cv/datasets/imagenet.py index 99723d3b8f4ec6f8c0b9297f9fe66c1fbc60693f..0986f823add893c6fb746168f3c2bcfa438f5e10 100644 --- a/paddlex/cv/datasets/imagenet.py +++ b/paddlex/cv/datasets/imagenet.py @@ -45,7 +45,7 @@ class ImageNet(Dataset): label_list, transforms=None, num_workers='auto', - buffer_size=100, + buffer_size=8, parallel_method='process', shuffle=False): super(ImageNet, self).__init__( @@ -70,8 +70,8 @@ class ImageNet(Dataset): continue full_path = osp.join(data_dir, items[0]) if not osp.exists(full_path): - raise IOError( - 'The image file {} is not exist!'.format(full_path)) + raise IOError('The image file {} is not exist!'.format( + full_path)) self.file_list.append([full_path, int(items[1])]) self.num_samples = len(self.file_list) logging.info("{} samples in file {}".format( diff --git a/paddlex/cv/datasets/seg_dataset.py b/paddlex/cv/datasets/seg_dataset.py index 61697e3d799ccb0ca765410a81e7257741acfb44..6e8bfae1ca623ed90a6d583042627cf4aecb2ea6 100644 --- a/paddlex/cv/datasets/seg_dataset.py +++ b/paddlex/cv/datasets/seg_dataset.py @@ -1,4 +1,4 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -28,7 +28,7 @@ class SegDataset(Dataset): Args: data_dir (str): 数据集所在的目录路径。 file_list (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对data_dir的相对路)。 - label_list (str): 描述数据集包含的类别信息文件路径。 + label_list (str): 描述数据集包含的类别信息文件路径。默认值为None。 transforms (list): 数据集中每个样本的预处理/增强算子。 num_workers (int): 数据集中样本在预处理过程中的线程或进程数。默认为4。 buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。 @@ -40,7 +40,7 @@ class SegDataset(Dataset): def __init__(self, data_dir, file_list, - label_list, + label_list=None, transforms=None, num_workers='auto', buffer_size=100, @@ -56,10 +56,11 @@ class SegDataset(Dataset): self.labels = list() self._epoch = 0 - with open(label_list, encoding=get_encoding(label_list)) as f: - for line in f: - item = line.strip() - self.labels.append(item) + if label_list is not None: + with open(label_list, encoding=get_encoding(label_list)) as f: + for line in f: + item = line.strip() + self.labels.append(item) with open(file_list, encoding=get_encoding(file_list)) as f: for line in f: @@ -69,8 +70,8 @@ class SegDataset(Dataset): full_path_im = osp.join(data_dir, items[0]) full_path_label = osp.join(data_dir, items[1]) if not osp.exists(full_path_im): - raise IOError( - 'The image file {} is not exist!'.format(full_path_im)) + raise IOError('The image file {} is not exist!'.format( + full_path_im)) if not osp.exists(full_path_label): raise IOError('The image file {} is not exist!'.format( full_path_label)) diff --git a/paddlex/cv/datasets/voc.py b/paddlex/cv/datasets/voc.py index 9b2e8528c52d5f2ecd6a041bbf7e86f095ea35ac..b701c56847b6e0da9aace3784c4cb8e76dbbed77 100644 --- a/paddlex/cv/datasets/voc.py +++ b/paddlex/cv/datasets/voc.py @@ -17,6 +17,7 @@ import copy import os import os.path as osp import random +import re import numpy as np from collections import OrderedDict import xml.etree.ElementTree as ET @@ -104,23 +105,60 @@ class VOCDetection(Dataset): else: ct = int(tree.find('id').text) im_id = np.array([int(tree.find('id').text)]) - - objs = tree.findall('object') - im_w = float(tree.find('size').find('width').text) - im_h = float(tree.find('size').find('height').text) + pattern = re.compile('', re.IGNORECASE) + obj_tag = pattern.findall( + str(ET.tostringlist(tree.getroot())))[0][1:-1] + objs = tree.findall(obj_tag) + pattern = re.compile('', re.IGNORECASE) + size_tag = pattern.findall( + str(ET.tostringlist(tree.getroot())))[0][1:-1] + size_element = tree.find(size_tag) + pattern = re.compile('', re.IGNORECASE) + width_tag = pattern.findall( + str(ET.tostringlist(size_element)))[0][1:-1] + im_w = float(size_element.find(width_tag).text) + pattern = re.compile('', re.IGNORECASE) + height_tag = pattern.findall( + str(ET.tostringlist(size_element)))[0][1:-1] + im_h = float(size_element.find(height_tag).text) gt_bbox = np.zeros((len(objs), 4), dtype=np.float32) gt_class = np.zeros((len(objs), 1), dtype=np.int32) gt_score = np.ones((len(objs), 1), dtype=np.float32) is_crowd = np.zeros((len(objs), 1), dtype=np.int32) difficult = np.zeros((len(objs), 1), dtype=np.int32) for i, obj in enumerate(objs): - cname = obj.find('name').text.strip() + pattern = re.compile('', re.IGNORECASE) + name_tag = pattern.findall(str(ET.tostringlist(obj)))[0][ + 1:-1] + cname = obj.find(name_tag).text.strip() gt_class[i][0] = cname2cid[cname] - _difficult = int(obj.find('difficult').text) - x1 = float(obj.find('bndbox').find('xmin').text) - y1 = float(obj.find('bndbox').find('ymin').text) - x2 = float(obj.find('bndbox').find('xmax').text) - y2 = float(obj.find('bndbox').find('ymax').text) + pattern = re.compile('', re.IGNORECASE) + diff_tag = pattern.findall(str(ET.tostringlist(obj)))[0][ + 1:-1] + try: + _difficult = int(obj.find(diff_tag).text) + except Exception: + _difficult = 0 + pattern = re.compile('', re.IGNORECASE) + box_tag = pattern.findall(str(ET.tostringlist(obj)))[0][1: + -1] + box_element = obj.find(box_tag) + pattern = re.compile('', re.IGNORECASE) + xmin_tag = pattern.findall( + str(ET.tostringlist(box_element)))[0][1:-1] + x1 = float(box_element.find(xmin_tag).text) + pattern = re.compile('', re.IGNORECASE) + ymin_tag = pattern.findall( + str(ET.tostringlist(box_element)))[0][1:-1] + y1 = float(box_element.find(ymin_tag).text) + pattern = re.compile('', re.IGNORECASE) + xmax_tag = pattern.findall( + str(ET.tostringlist(box_element)))[0][1:-1] + x2 = float(box_element.find(xmax_tag).text) + pattern = re.compile('', re.IGNORECASE) + ymax_tag = pattern.findall( + str(ET.tostringlist(box_element)))[0][1:-1] + y2 = float(box_element.find(ymax_tag).text) x1 = max(0, x1) y1 = max(0, y1) if im_w > 0.5 and im_h > 0.5: @@ -149,6 +187,7 @@ class VOCDetection(Dataset): 'gt_class': gt_class, 'gt_bbox': gt_bbox, 'gt_score': gt_score, + 'gt_poly': [], 'difficult': difficult } voc_rec = (im_info, label_info) diff --git a/paddlex/cv/models/hrnet.py b/paddlex/cv/models/hrnet.py index 3a000feee5fe6a2b6a93662e1dc65754d6e1cd68..d3af363ceac925d40552da22360759553c0090f7 100644 --- a/paddlex/cv/models/hrnet.py +++ b/paddlex/cv/models/hrnet.py @@ -1,11 +1,11 @@ # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# +# # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -24,11 +24,12 @@ class HRNet(DeepLabv3p): Args: num_classes (int): 类别数。 - width (int): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64]。 + width (int|str): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64, '18_small_v1']。 + '18_small_v1'是18的轻量级版本。 use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。 use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。 当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。 - class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为 + class_weight (list|str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为 num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重 自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1, 即平时使用的交叉熵损失函数。 @@ -168,6 +169,6 @@ class HRNet(DeepLabv3p): return super(HRNet, self).train( num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, - pretrain_weights, optimizer, learning_rate, lr_decay_power, use_vdl, - sensitivities_file, eval_metric_loss, early_stop, + pretrain_weights, optimizer, learning_rate, lr_decay_power, + use_vdl, sensitivities_file, eval_metric_loss, early_stop, early_stop_patience, resume_checkpoint) diff --git a/paddlex/cv/models/load_model.py b/paddlex/cv/models/load_model.py index 87b30ac47c206f0b3723ffcf353d95078feeb892..2d24abf4c75f1ff4b503138b3e18341da0b665c5 100644 --- a/paddlex/cv/models/load_model.py +++ b/paddlex/cv/models/load_model.py @@ -108,6 +108,7 @@ def load_model(model_dir, fixed_input_shape=None): logging.info("Model[{}] loaded.".format(info['Model'])) model.trainable = False + model.status = status return model diff --git a/paddlex/cv/models/slim/prune.py b/paddlex/cv/models/slim/prune.py index ad4dec23b8e3b29eda30fa873f4baa625a004884..f1e5f98a23c0d352bbf00dbb6b9b8fb60655fed3 100644 --- a/paddlex/cv/models/slim/prune.py +++ b/paddlex/cv/models/slim/prune.py @@ -158,6 +158,7 @@ def prune_program(model, prune_params_ratios=None): prune_params_ratios (dict): 由裁剪参数名和裁剪率组成的字典,当为None时 使用默认裁剪参数名和裁剪率。默认为None。 """ + assert model.status == 'Normal', 'Only the models saved while training are supported!' place = model.places[0] train_prog = model.train_prog eval_prog = model.test_prog @@ -235,6 +236,7 @@ def cal_params_sensitivities(model, save_file, eval_dataset, batch_size=8): 其中``weight_0``是卷积Kernel名;``sensitivities['weight_0']``是一个字典,key是裁剪率,value是敏感度。 """ + assert model.status == 'Normal', 'Only the models saved while training are supported!' if os.path.exists(save_file): os.remove(save_file) diff --git a/paddlex/cv/models/slim/prune_config.py b/paddlex/cv/models/slim/prune_config.py index 49430e9bfb1dcc47fb93aa9fc7d05ceb21e2b9e8..4ca4215cd31dcf47bed7d3ae25c9ccae3c9a3dc8 100644 --- a/paddlex/cv/models/slim/prune_config.py +++ b/paddlex/cv/models/slim/prune_config.py @@ -19,6 +19,8 @@ import paddle.fluid as fluid import paddlex sensitivities_data = { + 'AlexNet': + 'https://bj.bcebos.com/paddlex/slim_prune/alexnet_sensitivities.data', 'ResNet18': 'https://bj.bcebos.com/paddlex/slim_prune/resnet18.sensitivities', 'ResNet34': @@ -41,6 +43,10 @@ sensitivities_data = { 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_large.sensitivities', 'MobileNetV3_small': 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_small.sensitivities', + 'MobileNetV3_large_ssld': + 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_large_ssld_sensitivities.data', + 'MobileNetV3_small_ssld': + 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_small_ssld_sensitivities.data', 'DenseNet121': 'https://bj.bcebos.com/paddlex/slim_prune/densenet121.sensitivities', 'DenseNet161': @@ -51,6 +57,8 @@ sensitivities_data = { 'https://bj.bcebos.com/paddlex/slim_prune/xception41.sensitivities', 'Xception65': 'https://bj.bcebos.com/paddlex/slim_prune/xception65.sensitivities', + 'ShuffleNetV2': + 'https://bj.bcebos.com/paddlex/slim_prune/shufflenetv2_sensitivities.data', 'YOLOv3_MobileNetV1': 'https://bj.bcebos.com/paddlex/slim_prune/yolov3_mobilenetv1.sensitivities', 'YOLOv3_MobileNetV3_large': @@ -143,7 +151,8 @@ def get_prune_params(model): if model_type.startswith('ResNet') or \ model_type.startswith('DenseNet') or \ model_type.startswith('DarkNet') or \ - model_type.startswith('AlexNet'): + model_type.startswith('AlexNet') or \ + model_type.startswith('ShuffleNetV2'): for block in program.blocks: for param in block.all_parameters(): pd_var = fluid.global_scope().find_var(param.name) @@ -152,6 +161,28 @@ def get_prune_params(model): prune_names.append(param.name) if model_type == 'AlexNet': prune_names.remove('conv5_weights') + if model_type == 'ShuffleNetV2': + not_prune_names = ['stage_2_1_conv5_weights', + 'stage_2_1_conv3_weights', + 'stage_2_2_conv3_weights', + 'stage_2_3_conv3_weights', + 'stage_2_4_conv3_weights', + 'stage_3_1_conv5_weights', + 'stage_3_1_conv3_weights', + 'stage_3_2_conv3_weights', + 'stage_3_3_conv3_weights', + 'stage_3_4_conv3_weights', + 'stage_3_5_conv3_weights', + 'stage_3_6_conv3_weights', + 'stage_3_7_conv3_weights', + 'stage_3_8_conv3_weights', + 'stage_4_1_conv5_weights', + 'stage_4_1_conv3_weights', + 'stage_4_2_conv3_weights', + 'stage_4_3_conv3_weights', + 'stage_4_4_conv3_weights',] + for name in not_prune_names: + prune_names.remove(name) elif model_type == "MobileNetV1": prune_names.append("conv1_weights") for param in program.global_block().all_parameters(): diff --git a/paddlex/cv/models/utils/pretrain_weights.py b/paddlex/cv/models/utils/pretrain_weights.py index af8a6aa2af452914462bb305e6a03fadc7f2836c..97018acb827c41381f2e3e29df87ee0620ee2f40 100644 --- a/paddlex/cv/models/utils/pretrain_weights.py +++ b/paddlex/cv/models/utils/pretrain_weights.py @@ -81,7 +81,7 @@ coco_pretrain = { 'YOLOv3_MobileNetV1_COCO': 'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar', 'YOLOv3_MobileNetV3_large_COCO': - 'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v3.pdparams', + 'https://bj.bcebos.com/paddlex/models/yolov3_mobilenet_v3.tar', 'YOLOv3_ResNet34_COCO': 'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_r34.tar', 'YOLOv3_ResNet50_vd_COCO': diff --git a/paddlex/cv/nets/hrnet.py b/paddlex/cv/nets/hrnet.py index a7934d385d4a53fd936410e37d3896fe21cb17ee..561c7594da2904632386c0d88e9d841c047fb2d2 100644 --- a/paddlex/cv/nets/hrnet.py +++ b/paddlex/cv/nets/hrnet.py @@ -51,15 +51,38 @@ class HRNet(object): self.width = width self.has_se = has_se + self.num_modules = { + '18_small_v1': [1, 1, 1, 1], + '18': [1, 1, 4, 3], + '30': [1, 1, 4, 3], + '32': [1, 1, 4, 3], + '40': [1, 1, 4, 3], + '44': [1, 1, 4, 3], + '48': [1, 1, 4, 3], + '60': [1, 1, 4, 3], + '64': [1, 1, 4, 3] + } + self.num_blocks = { + '18_small_v1': [[1], [2, 2], [2, 2, 2], [2, 2, 2, 2]], + '18': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]], + '30': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]], + '32': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]], + '40': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]], + '44': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]], + '48': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]], + '60': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]], + '64': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]] + } self.channels = { - 18: [[18, 36], [18, 36, 72], [18, 36, 72, 144]], - 30: [[30, 60], [30, 60, 120], [30, 60, 120, 240]], - 32: [[32, 64], [32, 64, 128], [32, 64, 128, 256]], - 40: [[40, 80], [40, 80, 160], [40, 80, 160, 320]], - 44: [[44, 88], [44, 88, 176], [44, 88, 176, 352]], - 48: [[48, 96], [48, 96, 192], [48, 96, 192, 384]], - 60: [[60, 120], [60, 120, 240], [60, 120, 240, 480]], - 64: [[64, 128], [64, 128, 256], [64, 128, 256, 512]], + '18_small_v1': [[32], [16, 32], [16, 32, 64], [16, 32, 64, 128]], + '18': [[64], [18, 36], [18, 36, 72], [18, 36, 72, 144]], + '30': [[64], [30, 60], [30, 60, 120], [30, 60, 120, 240]], + '32': [[64], [32, 64], [32, 64, 128], [32, 64, 128, 256]], + '40': [[64], [40, 80], [40, 80, 160], [40, 80, 160, 320]], + '44': [[64], [44, 88], [44, 88, 176], [44, 88, 176, 352]], + '48': [[64], [48, 96], [48, 96, 192], [48, 96, 192, 384]], + '60': [[64], [60, 120], [60, 120, 240], [60, 120, 240, 480]], + '64': [[64], [64, 128], [64, 128, 256], [64, 128, 256, 512]], } self.freeze_at = freeze_at @@ -73,31 +96,38 @@ class HRNet(object): def net(self, input): width = self.width - channels_2, channels_3, channels_4 = self.channels[width] - num_modules_2, num_modules_3, num_modules_4 = 1, 4, 3 + channels_1, channels_2, channels_3, channels_4 = self.channels[str( + width)] + num_modules_1, num_modules_2, num_modules_3, num_modules_4 = self.num_modules[ + str(width)] + num_blocks_1, num_blocks_2, num_blocks_3, num_blocks_4 = self.num_blocks[ + str(width)] x = self.conv_bn_layer( input=input, filter_size=3, - num_filters=64, + num_filters=channels_1[0], stride=2, if_act=True, name='layer1_1') x = self.conv_bn_layer( input=x, filter_size=3, - num_filters=64, + num_filters=channels_1[0], stride=2, if_act=True, name='layer1_2') - la1 = self.layer1(x, name='layer2') + la1 = self.layer1(x, num_blocks_1, channels_1, name='layer2') tr1 = self.transition_layer([la1], [256], channels_2, name='tr1') - st2 = self.stage(tr1, num_modules_2, channels_2, name='st2') + st2 = self.stage( + tr1, num_modules_2, num_blocks_2, channels_2, name='st2') tr2 = self.transition_layer(st2, channels_2, channels_3, name='tr2') - st3 = self.stage(tr2, num_modules_3, channels_3, name='st3') + st3 = self.stage( + tr2, num_modules_3, num_blocks_3, channels_3, name='st3') tr3 = self.transition_layer(st3, channels_3, channels_4, name='tr3') - st4 = self.stage(tr3, num_modules_4, channels_4, name='st4') + st4 = self.stage( + tr3, num_modules_4, num_blocks_4, channels_4, name='st4') # classification if self.num_classes: @@ -139,12 +169,12 @@ class HRNet(object): self.end_points = st4 return st4[-1] - def layer1(self, input, name=None): + def layer1(self, input, num_blocks, channels, name=None): conv = input - for i in range(4): + for i in range(num_blocks[0]): conv = self.bottleneck_block( conv, - num_filters=64, + num_filters=channels[0], downsample=True if i == 0 else False, name=name + '_' + str(i + 1)) return conv @@ -178,7 +208,7 @@ class HRNet(object): out = [] for i in range(len(channels)): residual = x[i] - for j in range(block_num): + for j in range(block_num[i]): residual = self.basic_block( residual, channels[i], @@ -240,10 +270,11 @@ class HRNet(object): def high_resolution_module(self, x, + num_blocks, channels, multi_scale_output=True, name=None): - residual = self.branches(x, 4, channels, name=name) + residual = self.branches(x, num_blocks, channels, name=name) out = self.fuse_layers( residual, channels, @@ -254,6 +285,7 @@ class HRNet(object): def stage(self, x, num_modules, + num_blocks, channels, multi_scale_output=True, name=None): @@ -262,12 +294,13 @@ class HRNet(object): if i == num_modules - 1 and multi_scale_output == False: out = self.high_resolution_module( out, + num_blocks, channels, multi_scale_output=False, name=name + '_' + str(i + 1)) else: out = self.high_resolution_module( - out, channels, name=name + '_' + str(i + 1)) + out, num_blocks, channels, name=name + '_' + str(i + 1)) return out diff --git a/paddlex/cv/nets/segmentation/hrnet.py b/paddlex/cv/nets/segmentation/hrnet.py index 6c7d8d93692e40047fa4ceb2f4153c18cee06ccd..209da9b507ba8e59a073fab616418c378a1e7cd5 100644 --- a/paddlex/cv/nets/segmentation/hrnet.py +++ b/paddlex/cv/nets/segmentation/hrnet.py @@ -82,7 +82,8 @@ class HRNet(object): st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=shape) out = fluid.layers.concat(st4, axis=1) - last_channels = sum(self.backbone.channels[self.backbone.width][-1]) + last_channels = sum(self.backbone.channels[str(self.backbone.width)][ + -1]) out = self._conv_bn_layer( input=out, diff --git a/paddlex/cv/transforms/cls_transforms.py b/paddlex/cv/transforms/cls_transforms.py index dbcd34222daf71c05c8f26a2a38c94faacb526f2..6b11a0839d2b2bc891f8eb29dfe666a69d0c8f5d 100644 --- a/paddlex/cv/transforms/cls_transforms.py +++ b/paddlex/cv/transforms/cls_transforms.py @@ -70,8 +70,8 @@ class Compose(ClsTransform): if isinstance(im, np.ndarray): if len(im.shape) != 3: raise Exception( - "im should be 3-dimension, but now is {}-dimensions". - format(len(im.shape))) + "im should be 3-dimension, but now is {}-dimensions".format( + len(im.shape))) else: try: im = cv2.imread(im).astype('float32') @@ -100,7 +100,9 @@ class Compose(ClsTransform): transform_names = [type(x).__name__ for x in self.transforms] for aug in augmenters: if type(aug).__name__ in transform_names: - logging.error("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__)) + logging.error( + "{} is already in ComposedTransforms, need to remove it from add_augmenters().". + format(type(aug).__name__)) self.transforms = augmenters + self.transforms @@ -139,8 +141,8 @@ class RandomCrop(ClsTransform): tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据; 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。 """ - im = random_crop(im, self.crop_size, self.lower_scale, - self.lower_ratio, self.upper_ratio) + im = random_crop(im, self.crop_size, self.lower_scale, self.lower_ratio, + self.upper_ratio) if label is None: return (im, ) else: @@ -270,14 +272,12 @@ class ResizeByShort(ClsTransform): im_short_size = min(im.shape[0], im.shape[1]) im_long_size = max(im.shape[0], im.shape[1]) scale = float(self.short_size) / im_short_size - if self.max_size > 0 and np.round(scale * - im_long_size) > self.max_size: + if self.max_size > 0 and np.round(scale * im_long_size) > self.max_size: scale = float(self.max_size) / float(im_long_size) resized_width = int(round(im.shape[1] * scale)) resized_height = int(round(im.shape[0] * scale)) im = cv2.resize( - im, (resized_width, resized_height), - interpolation=cv2.INTER_LINEAR) + im, (resized_width, resized_height), interpolation=cv2.INTER_LINEAR) if label is None: return (im, ) @@ -490,13 +490,15 @@ class ComposedClsTransforms(Compose): crop_size(int|list): 输入模型里的图像大小 mean(list): 图像均值 std(list): 图像方差 + random_horizontal_flip(bool): 是否以0.5的概率使用随机水平翻转增强,该仅在mode为`train`时生效,默认为True """ def __init__(self, mode, crop_size=[224, 224], mean=[0.485, 0.456, 0.406], - std=[0.229, 0.224, 0.225]): + std=[0.229, 0.224, 0.225], + random_horizontal_flip=True): width = crop_size if isinstance(crop_size, list): if crop_size[0] != crop_size[1]: @@ -512,10 +514,11 @@ class ComposedClsTransforms(Compose): if mode == 'train': # 训练时的transforms,包含数据增强 transforms = [ - RandomCrop(crop_size=width), RandomHorizontalFlip(prob=0.5), - Normalize( + RandomCrop(crop_size=width), Normalize( mean=mean, std=std) ] + if random_horizontal_flip: + transforms.insert(0, RandomHorizontalFlip()) else: # 验证/预测时的transforms transforms = [ diff --git a/paddlex/cv/transforms/det_transforms.py b/paddlex/cv/transforms/det_transforms.py index 0b96d6b4d32f245ec4315851d8edd221776bb6a0..55b3daa85b6380068597e6de9946e6e4641216d6 100644 --- a/paddlex/cv/transforms/det_transforms.py +++ b/paddlex/cv/transforms/det_transforms.py @@ -160,7 +160,9 @@ class Compose(DetTransform): transform_names = [type(x).__name__ for x in self.transforms] for aug in augmenters: if type(aug).__name__ in transform_names: - logging.error("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__)) + logging.error( + "{} is already in ComposedTransforms, need to remove it from add_augmenters().". + format(type(aug).__name__)) self.transforms = augmenters + self.transforms @@ -220,15 +222,13 @@ class ResizeByShort(DetTransform): im_short_size = min(im.shape[0], im.shape[1]) im_long_size = max(im.shape[0], im.shape[1]) scale = float(self.short_size) / im_short_size - if self.max_size > 0 and np.round(scale * - im_long_size) > self.max_size: + if self.max_size > 0 and np.round(scale * im_long_size) > self.max_size: scale = float(self.max_size) / float(im_long_size) resized_width = int(round(im.shape[1] * scale)) resized_height = int(round(im.shape[0] * scale)) im_resize_info = [resized_height, resized_width, scale] im = cv2.resize( - im, (resized_width, resized_height), - interpolation=cv2.INTER_LINEAR) + im, (resized_width, resized_height), interpolation=cv2.INTER_LINEAR) im_info['im_resize_info'] = np.array(im_resize_info).astype(np.float32) if label_info is None: return (im, im_info) @@ -268,8 +268,7 @@ class Padding(DetTransform): if not isinstance(target_size, tuple) and not isinstance( target_size, list): raise TypeError( - "Padding: Type of target_size must in (int|list|tuple)." - ) + "Padding: Type of target_size must in (int|list|tuple).") elif len(target_size) != 2: raise ValueError( "Padding: Length of target_size must equal 2.") @@ -454,8 +453,7 @@ class RandomHorizontalFlip(DetTransform): ValueError: 数据长度不匹配。 """ if not isinstance(im, np.ndarray): - raise TypeError( - "RandomHorizontalFlip: image is not a numpy array.") + raise TypeError("RandomHorizontalFlip: image is not a numpy array.") if len(im.shape) != 3: raise ValueError( "RandomHorizontalFlip: image is not 3-dimensional.") @@ -736,7 +734,7 @@ class MixupImage(DetTransform): gt_poly2 = im_info['mixup'][2]['gt_poly'] is_crowd1 = label_info['is_crowd'] is_crowd2 = im_info['mixup'][2]['is_crowd'] - + if 0 not in gt_class1 and 0 not in gt_class2: gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0) gt_class = np.concatenate((gt_class1, gt_class2), axis=0) @@ -785,9 +783,7 @@ class RandomExpand(DetTransform): fill_value (list): 扩张图像的初始填充值(0-255)。默认为[123.675, 116.28, 103.53]。 """ - def __init__(self, - ratio=4., - prob=0.5, + def __init__(self, ratio=4., prob=0.5, fill_value=[123.675, 116.28, 103.53]): super(RandomExpand, self).__init__() assert ratio > 1.01, "expand ratio must be larger than 1.01" @@ -1281,21 +1277,25 @@ class ComposedRCNNTransforms(Compose): min_max_size(list): 图像在缩放时,最小边和最大边的约束条件 mean(list): 图像均值 std(list): 图像方差 + random_horizontal_flip(bool): 是否以0.5的概率使用随机水平翻转增强,该仅在mode为`train`时生效,默认为True """ def __init__(self, mode, min_max_size=[800, 1333], mean=[0.485, 0.456, 0.406], - std=[0.229, 0.224, 0.225]): + std=[0.229, 0.224, 0.225], + random_horizontal_flip=True): if mode == 'train': # 训练时的transforms,包含数据增强 transforms = [ - RandomHorizontalFlip(prob=0.5), Normalize( + Normalize( mean=mean, std=std), ResizeByShort( short_size=min_max_size[0], max_size=min_max_size[1]), Padding(coarsest_stride=32) ] + if random_horizontal_flip: + transforms.insert(0, RandomHorizontalFlip()) else: # 验证/预测时的transforms transforms = [ @@ -1325,9 +1325,14 @@ class ComposedYOLOv3Transforms(Compose): Args: mode(str): 图像处理流程所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' shape(list): 输入模型中图像的大小,输入模型的图像会被Resize成此大小 - mixup_epoch(int): 模型训练过程中,前mixup_epoch会使用mixup策略 + mixup_epoch(int): 模型训练过程中,前mixup_epoch会使用mixup策略, 若设为-1,则表示不使用该策略 mean(list): 图像均值 std(list): 图像方差 + random_distort(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机扰动图像,默认为True + random_expand(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机扩张图像,默认为True + random_crop(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机裁剪图像,默认为True + random_horizontal_flip(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机水平翻转图像,默认为True + """ def __init__(self, @@ -1335,7 +1340,11 @@ class ComposedYOLOv3Transforms(Compose): shape=[608, 608], mixup_epoch=250, mean=[0.485, 0.456, 0.406], - std=[0.229, 0.224, 0.225]): + std=[0.229, 0.224, 0.225], + random_distort=True, + random_expand=True, + random_crop=True, + random_horizontal_flip=True): width = shape if isinstance(shape, list): if shape[0] != shape[1]: @@ -1350,12 +1359,18 @@ class ComposedYOLOv3Transforms(Compose): if mode == 'train': # 训练时的transforms,包含数据增强 transforms = [ - MixupImage(mixup_epoch=mixup_epoch), RandomDistort(), - RandomExpand(), RandomCrop(), Resize( - target_size=width, - interp='RANDOM'), RandomHorizontalFlip(), Normalize( + MixupImage(mixup_epoch=mixup_epoch), Resize( + target_size=width, interp='RANDOM'), Normalize( mean=mean, std=std) ] + if random_horizontal_flip: + transforms.insert(1, RandomHorizontalFlip()) + if random_crop: + transforms.insert(1, RandomCrop()) + if random_expand: + transforms.insert(1, RandomExpand()) + if random_distort: + transforms.insert(1, RandomDistort()) else: # 验证/预测时的transforms transforms = [ diff --git a/paddlex/cv/transforms/seg_transforms.py b/paddlex/cv/transforms/seg_transforms.py index 9ea1c3bdc2159dbc1f33ac5f15dc710e12ccb83c..4932a7002983ff62c04ec0ed992efac323ee546b 100644 --- a/paddlex/cv/transforms/seg_transforms.py +++ b/paddlex/cv/transforms/seg_transforms.py @@ -116,7 +116,9 @@ class Compose(SegTransform): transform_names = [type(x).__name__ for x in self.transforms] for aug in augmenters: if type(aug).__name__ in transform_names: - logging.error("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__)) + logging.error( + "{} is already in ComposedTransforms, need to remove it from add_augmenters().". + format(type(aug).__name__)) self.transforms = augmenters + self.transforms @@ -401,8 +403,7 @@ class ResizeByShort(SegTransform): im_short_size = min(im.shape[0], im.shape[1]) im_long_size = max(im.shape[0], im.shape[1]) scale = float(self.short_size) / im_short_size - if self.max_size > 0 and np.round(scale * - im_long_size) > self.max_size: + if self.max_size > 0 and np.round(scale * im_long_size) > self.max_size: scale = float(self.max_size) / float(im_long_size) resized_width = int(round(im.shape[1] * scale)) resized_height = int(round(im.shape[0] * scale)) @@ -1113,25 +1114,35 @@ class ComposedSegTransforms(Compose): Args: mode(str): 图像处理所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test' - train_crop_size(list): 模型训练阶段,随机从原图crop的大小 + min_max_size(list): 训练过程中,图像的最长边会随机resize至此区间(短边按比例相应resize);预测阶段,图像最长边会resize至此区间中间值,即(min_size+max_size)/2。默认为[400, 600] + train_crop_size(list): 仅在mode为'train`时生效,训练过程中,随机从图像中裁剪出对应大小的子图(如若原图小于此大小,则会padding到此大小),默认为[400, 600] mean(list): 图像均值 std(list): 图像方差 + random_horizontal_flip(bool): 数据增强方式,仅在mode为`train`时生效,表示训练过程是否随机水平翻转图像,默认为True """ def __init__(self, mode, - train_crop_size=[769, 769], + min_max_size=[400, 600], + train_crop_size=[512, 512], mean=[0.5, 0.5, 0.5], - std=[0.5, 0.5, 0.5]): + std=[0.5, 0.5, 0.5], + random_horizontal_flip=True): if mode == 'train': # 训练时的transforms,包含数据增强 transforms = [ - RandomHorizontalFlip(prob=0.5), ResizeStepScaling(), + ResizeRangeScaling( + min_value=min(min_max_size), max_value=max(min_max_size)), RandomPaddingCrop(crop_size=train_crop_size), Normalize( mean=mean, std=std) ] + if random_horizontal_flip: + transforms.insert(0, RandomHorizontalFlip()) else: # 验证/预测时的transforms - transforms = [Normalize(mean=mean, std=std)] - + long_size = (min(min_max_size) + max(min_max_size)) // 2 + transforms = [ + ResizeByLong(long_size=long_size), Normalize( + mean=mean, std=std) + ] super(ComposedSegTransforms, self).__init__(transforms) diff --git a/paddlex/interpret/visualize.py b/paddlex/interpret/visualize.py index c1b013d04b9b21a49ecf7eeb6dd65b6d6c578069..6c3570b05d99f359452116542c82cb9a8cbc555b 100644 --- a/paddlex/interpret/visualize.py +++ b/paddlex/interpret/visualize.py @@ -70,8 +70,10 @@ def normlime(img_file, normlime_weights_file=None): """使用NormLIME算法将模型预测结果的可解释性可视化。 - NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会提前计算一定数量的测 - 试样本的LIME结果,然后对相同的特征进行权重的归一化,这样来得到一个全局的输入和输出的关系。 + NormLIME是利用一定数量的样本来出一个全局的解释。由于NormLIME计算量较大,此处采用一种简化的方式: + 使用一定数量的测试样本(目前默认使用所有测试样本),对每个样本进行特征提取,映射到同一个特征空间; + 然后以此特征做为输入,以模型输出做为输出,使用线性回归对其进行拟合,得到一个全局的输入和输出的关系。 + 之后,对一测试样本进行解释时,使用NormLIME全局的解释,来对LIME的结果进行滤波,使最终的可视化结果更加稳定。 注意1:dataset读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。 注意2:NormLIME可解释性结果可视化目前只支持分类模型。 diff --git a/setup.py b/setup.py index 44aca0f9dc2a214ff4bcf4e2817d06423c26812b..1f42da4da4099b6b651a41b65aaedde7b76093ca 100644 --- a/setup.py +++ b/setup.py @@ -15,11 +15,11 @@ import setuptools import sys -long_description = "PaddleX. A end-to-end deeplearning model development toolkit base on PaddlePaddle\n\n" +long_description = "PaddlePaddle Entire Process Development Toolkit" setuptools.setup( name="paddlex", - version='1.0.6', + version='1.0.7', author="paddlex", author_email="paddlex@baidu.com", description=long_description, diff --git a/new_tutorials/train/segmentation/fast_scnn.py b/tutorials/train/segmentation/fast_scnn.py similarity index 98% rename from new_tutorials/train/segmentation/fast_scnn.py rename to tutorials/train/segmentation/fast_scnn.py index 53f1a528a090d6d4f278e47b54b2660dccde2e0d..9c48d31eda7b612243e65df124b51722c4ea59e4 100644 --- a/new_tutorials/train/segmentation/fast_scnn.py +++ b/tutorials/train/segmentation/fast_scnn.py @@ -35,7 +35,7 @@ eval_dataset = pdx.datasets.SegDataset( # 浏览器打开 https://0.0.0.0:8001即可 # 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP -# https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#hrnet +# https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#fastscnn num_classes = len(train_dataset.labels) model = pdx.seg.FastSCNN(num_classes=num_classes) model.train(